David Roodman's Microfinance Open Book Blog

 

The Rapid Rise of the Randomistas and the Trouble with the RCTs

March 3, 2009


As I mentioned, “randomized control trials” (RCTs) are proliferating in development economics, being used to study such questions as whether microcredit puts more girls in school.

I have spent much time challenging non-experimental studies of the causes of economic growth. Trained as a mathematician, I am less skeptical of the proposition that foreign aid or financial system development typically speeds national progress than I am of the number-crunching economists have done to back such claims. I’m more of an aid regression skeptic than an aid skeptic. (Though my skepticism of the evidence does lead me toward agnosticism on the substance.) The biggest problem is that technical efforts to rule out reverse causality usually fail, so that we can’t be sure about what is causing what.

By contrast, randomized studies compel respect. A researcher flips a coin to determine who gets a service; a year later, those who got it are happier, or healthier, or more stressed, or whatever, than those who didn’t. Short of the supernatural, the only explanation for the correlation is that the intervention caused the outcome. What’s to argue with?

So I like RCTs, and I think it is good that randomized trials of microfinance are underway.

Still, the rapid rise of the “randomistas” feels like a fad. Will a healthy movement overshoot? Already, grand men of economics such as Nobelist James Heckman and Angus Deaton are asking tough questions. Such as: Are RCT researchers doing science if they treat people and households as black boxes—things to be experimented on and observed—without modeling or studying what goes on inside the black boxes? If you learn that pushing this button turns on that light, what have you really learned about electricity? The concern is practical because it gets to whether researchers gain insight into human behavior that can lead to improvements in programs meant to help people.

There are other impediments and disadvantages, which I want to think through. Can you add to my list of Troubles with the RCTs (apologies)?

  • Arbitrarily (randomly) offering a service to some people and not others is immoral and goes against the professional grain of the people providing the service. The randomistas accommodate this concern by only randomizing in less-fraught ways. Karlan and Zinman randomly offered credit to South Africans who would not otherwise qualify. Banerjee, Duflo, and Glennerster worked with Spandana to randomize which neighborhoods get microcredit first as the group expands across Hyderabad, India.
  • The placement of randomized trials is non-random. You can’t perform one on microcredit in Bangladesh, where arguably it’s been most successful, because just about everyone already has access to it. Only certain groups will allow researchers to “interfere” with their decision-making. (See above.) So interventions only get studied in certain contexts, which may not be globally representative.
  • Any study is a test of both an intervention and an intervenor, so how do you attribute success or failure to the intervention per se? Two groups might do, say, after-school tutoring that looks the same on paper put works out quite differently in practice. And there is variation over time too: Maybe Spandana won’t get the kinks out of its Hyderabad operations until after the researchers have finished evaluating them.
  • RCTs only tell us the average effect. Whether you’re talking about loans or pills, the same “treatment” affects different people in different ways, and that diversity around the average matters as much as the average itself. But you can’t get at it with RCTs. You cannot randomize whether subjects in a randomized trial of penicillin are allergic to it.

Update 1: After I clicked “Publish” on this post, I walked into a talk given by Jeremy Weinstein, in which he did what I had just written was impossible. In fact, if you know who is allergic to penicillin, then you can do a control-treatment comparison just among them. To correct myself, RCTs falter on unobserved dimensions of diversity. If we can’t tell who is entrepreneurial, then even an RCT cannot tell us how entrepreneurial ability mediates the impact of microcredit.

Update 2: See Should Randomistas Rule? in The Economists’ Voice (free with registration) by Martin Ravallion, director of the World Bank’s research department, and another respected figure questioning the enthusiasm for RCTs.

You can skip to the end and leave a response. Pinging is currently not allowed.

  AddThis Social Bookmark Button

4 Comments on “The Rapid Rise of the Randomistas and the Trouble with the RCTs”

  1. Merrick Zwarenstein Says:

    I come at this as a triallist from the health arena, specifically from that world attempting (and largely succeeding) in evaluating complex health interventions using randomised controlled trials. We are an offshoot of the much more common practice of evaluating new drugs, devices and procedures using randomised trials. Approaching half a million randomised trials of healthcare interventions have been published so far, mostly pharmaceutical trials, with thousands per year being added to this list.

    I would like to respond to some of your points:
    Useful science:
    “Are RCT researchers doing science?…The concern is practical because it gets to whether researchers gain insight into human behavior that can lead to improvements in programs meant to help people.”

    If science is systematic inquiry, then it is also doing science to demonstrate that there is a casual link between an intervention and an outcome of importance. I don’t need to know why the switch creates light in order to benefit from the light which results reliably from my switching on the switch! Thus a black box finding of a reliable causal connection is very useful indeed. (And suggests that the intervention (switching on the light) should be widely used to obtain the desired result (more light). (Please don’t extend this metaphor further, and we all know that the switch will not work absent power.

    The immorality of randomisation:

    When we do trials of drugs we hope that the benefits of the new drug will outweigh the harms. But we dont know this until the trial is complete, and often until the drug has been used for some time after widespread implementation. So randomisation is also the best way to find out effects while exposing the minimum number of patients, people or communities to harm. And hence is ethical.

    Widespread applicability:
    (the name increasingly used for this issue in health) Applicability is not ever knowable by deduction, but can be judged on the basis of similarities or differences between the setting in which the published study was done, and the setting in which the reader wishes to apply the intervnetion.

    If there is certainty that microcredit works in Bangladesh, but we want to apply it in Africa, we may need to do another trial. On the other hand, if it works in Bangladesh, it might be reasonable to argue that it will also work in similar rural areas of India and so a repeat trial may not be needed there.

    Confounding of intervention and intervenor:
    These of course cannot be separated. It therefore becomes very important that both the intervention and the the intervenor are fully described, so that readers of the published trial can tell whether or not the intervenor that they would use, should they implement the intervention in their own setting is similar enough to the intervenor described in the published trial, that the same results are likely.

    There is a series of published guides to reporting randomised trials in healthcare, describing the elements needed in the final paper in order for readers to make sense of it:

    A group of us have recently published one of these series on Pragmatic trials, trials designed to support real world decisionmaking. In healthcare we talk about a spectrum of randomised trials, extending from explanatory (a trial which explains the mechanism by which something happens) to pragmatic (a trial which answers the question under real world conditions of whether or not an intervneiton reliably results in an outcome of interest) (Schwartz and Lellouch, two French statisticians created this distinction in 1967 and it has stood the test of time, being in near continual (albeit infrequent) use since that time.

    Our paper describing the reporting of pragmatic trials is in the British Medical Journal, and you may obtain a free PDF from me.

    Average effect:
    The published average effect is the best predictor of the likely effect of using that intervention in some other setting or patient group without actually doing so. No study design can directly predict the effect in your own patient or community, or even for any single patient or community int he original study, so in this sense randomised trials are the best solution to an impossible dilemma.

    Merrick

  2. Merrick, thanks very much for these thoughtful comments. As I hope is clear, I was playing devil’s advocate in order to elicit precisely this sort of helpful response. I agree with almost everything you say. My only quibble is that I think the intervenor often cannot be “fully described.” It’s not always easy to determine what makes a good teacher better than a bad teacher when implementing the same curriculum, or what distinguishes a well-run school from a poorly-run one. Perhaps there is a difference here between the clinical and social program settings? Of course the point stands that one should describe the intervenor as well as possible.

    I believe your BMJ article is available free at bmj.com/cgi/reprint/337/nov11_2/a2390. I am chasing down the Schwartz and Lellouch (1967).

  3. Matteo Valenza Says:

    Dear David,

    First of all thanks for this open resource, great idea.

    Quick comment on your “update 1″ point in your original post. One way of getting around the unobservability of entrepreneurial ability is to use fixed effects, as described in Armendariz and Murdoch (2005). But is entrepreneurial ability genuinely non-variant? I am not only thinking of learning by doing, but also of other unobserved factors (e.g. access to markets, dynamics in local productive networks, etc.). So using fixed effects may still be problematic.

    I would also like to bring an additional consideration to the table. Studies like Ypeij (the one in Peru, 1997) are convincing me that most times micro-borrowers do not really borrow with the idea of maximising microenterprises’ profit. I believe this is especially true of female borrowers: it is as if their traditional obligations of looking after the house and the family first and foremost would assign them a different objective function to be pursued – an objective function radically different from profit maximisation.

    Bottom line: what do we put on the left-hand side of the intent-to-treat regression, if we believe that profits are not the appropriate variable?

    If you have the chance, see page 94 of her book “Producing against poverty

    Finally, If I may suggest another very good contribution from a RCT skeptical, see http://www.chrisblattman.org/DFID.talk.Feb2008.pdf

    Thanks a lot,

    Matteo Valenza

  4. Matteo,
    Thanks so much for these sources. I have ordered the Ypeij book and read Chris Blattman’s thoughts on RCTs. As it happens, Chris was a post-doc fellow here when he gave that talk at DFID.

    I completely agree with you that fixed effects estimation can still be problematic since village and individual effects are in general not “fixed” over time. This is directly relevant to understanding Khandker’s 2005 study of the impact of microcredit in Bangladesh (World Bank Economic Review), because he uses a fixed effects method on a panel with data from 1991-92 and 1999. In time, I’ll have more to say about that.

    –David

Post a Comment

We value frank and constructive exchanges and encourage you to use your real name in your comments.

Spam protection by WP Captcha-Free