David Roodman's Microfinance Open Book Blog

 

I Failed to Seriously Consider the Limitations of Microfinance as a Poverty Reduction Approach

August 17, 2011


…that according to the newest review of the evidence on the impact of microcredit (p. 73). The review was commissioned by the British aid agency DFID and carried out by British academics, all but one of whom (James Copestake) were based at the University of East Anglia.

I should explain that I wrote the title for this post, which personalizes my reaction to the report, with some humor. The bits of the report that are about me are not nearly as important as what it says about our knowledge of the impacts of microcredit.

I haven’t read all 184 pages of the report with equal care. My reaction at this point is conflicted, as you might sense. On the one hand, I really like the executive summary. With a couple of exceptions, it concisely corroborates my thinking. As Jonathan Morduch and I wrote in 2009:

We assert, however, that decisive statistical evidence in favor of [claims of positive impact] is absent from these studies and extraordinarily scarce in the literature as a whole.

I say something similar in the Tom Heinemann documentary (notice the Norwegian subtitles in the screenshot at 40:24, where I opine that “35 years into the microfinance movement, we don’t have any clear evidence that microcredit…reduces poverty on average.”).

On the other hand, when I examined the parts of the report that overlap most with my own expertise—those on the Pitt and Khandker studies of the impact of microcredit in Bangladesh and the randomized studies—I found them to be problematic in certain ways, even mildly offensive (in making confident assertions about what is going on in my head).

How I feel about the text is unimportant in itself, but I do think my reaction points to a problem with the work that matters for the public and for the public agency that funded it. It seems to ally itself with the current stream of vociferous criticism of microfinance, led by another Brit, Milford Bateman—whose book “has very little time” for academic research. Strange that the authors seem to make common cause with someone who views with nihilism the work to which they are devoting their careers. Meanwhile the report seems to distance itself from researchers, notably Jonathan and me, whom the report portrays as wanting to believe that microcredit reduces poverty despite the lack of evidence.

Similarly, the report perceives a “high risk of bias” in the Karlan and Zinman randomized study of microcredit in the Philippines. Here too the argument seem so illogical that I can’t help wondering what animosity and bias lie behind it. This is mildly unfortunate in a government-sponsored report.

The fundamental conclusions of this report are that a) we have almost no credible evidence on the average impact of microcredit on poverty and b) what little we have puts the impact at 0. In the current battle royale over whether microcredit is good or bad, that seemingly puts the report right in the middle. Yet in naming intellectual allies and opponents, the report appears to pick sides in a way that departs from the evidence it so thoroughly critiques. This invites the public to spin the report in a certain way, to confuse absence of evidence with evidence of absence, as has already happened in the Bangladeshi press (“Microcredit is a mirage, says UK study“).

Still, this is an intelligent critique of the evidence and anyone interested enough to read it will learn from it. I particularly like the point that “advanced econometric techniques will not be able to control for poor quality data,” which wisely summarizes my experience replicating the complex Pitt and Khandker study.

Three specific comments:

  1. Although “microfinance” is in the title, the report and its negative conclusions are really about microcredit. The latter term is actually in the title of the study protocol. The important Dupas & Robinson study of microsavings gets a tiny mention (p. 48), and it is incorrect (“no impacts on well-being”).
  2. In the conclusion of our 2009 working paper, Jonathan and I accentuate the negative in one place (quoted above) and the positive in the previous sentence, quoted below and in the report:

    In our view, nothing in the present paper contradicts those [the view that microcredit is effective in reducing poverty generally, that extremely poor people benefit most especially so when women are borrowers] ideas.

    These two statements are compatible: lack of evidence means lack of evidence of help and lack of evidence of harm. But the new report quotes only that positive half of this symmetric pair, and from that conjectures about what Jonathan and I think:

    some prominent academics involved in microfinance seem to have preferred to not reject the alternate hypothesis.

    (A footnote makes clear the reference is mainly to us.)

    Here’s the logic:

    Failing to contradict the alternate hypothesis encourages one to believe there is a positive effect and therefore to tend to (continue to) reject the null (no effect) hypothesis even though it (no effect) may be true. This of course depends on the decision procedure (see Neyman and Pearson 1933, for a detailed discussion on decision rules) and weighing the costs and benefits of an intervention.

    Ergo:

    Even for critics of these evaluations the absence of robust evidence rejecting the null hypothesis of no impact has not led to a rejection of belief in the beneficent impacts of microfinance (Armendáriz de Aghion and Morduch 2010, p310; Roodman and Morduch 2009, p39-40), since it allows the possibility that more robust evidence (from better designed, executed and analysed studies) could allow rejection of this nul. However, given the possibility that much of the enthusiasm for microfinance could be constructed around other powerful but not necessarily benign, from the point of view of poor people, policy agendas (Bateman 2010, Roy 2010), this failure to seriously consider the limitations of microfinance as a poverty reduction approach, amounts in our view to a failure to take seriously the results of appropriate critical evaluation of evaluations.

    As hinted in the earlier-mentioned footnote, I debated this issue with report author Richard Palmer-Jones by e-mail last November while I was in India. I explained that just because I see no evidence on the impact of microcredit in Pitt and Khandker one way or the other does not mean that I have decided to presume the impact is positive. “Failing [to] contradict the idea that microcredit [helps] is not the same as asserting that it does.” Despite providing this direct evidence, I apparently did not change his view of my view.

  3. The critique of the Karlan and Zinman Philippines study seems mistaken. (I put this last because it is technical. See this post for background.) The DFID report’s errors in this regard hardly affect its conclusions, which are actually quite compatible with Karlan and Zinman’s failure to find impact. Still, they seem worth flagging.

    In particular, the report lists a series of threats to “internal validity,” which is how reliably the study measures what it sets out to study; this concept is distinct from “external validity,” which is how representative the report’s findings are for the rest of the world:

    • The report points out that loan officers randomly prompted by their computers to offer loans to marginally qualified applicants sometimes ignored those prompts, and probably did so for applicants they knew to have poorer prospects: “While analysis was on an intention-to-treat basis using the original allocations, it seems likely that the sample of marginally creditworthy people actually being offered and taking up loans…would have been biased by selection by loan officers and by self-selection.” I don’t get it. We’d expect the randomly rejected and the randomly unrejected to have equal numbers of people with poor prospects, and that is the basis for comparison.
    • The report argues that the unrejected, marginally qualified applicants might be systematically less reliable than applicants with higher credit scores (or more so if loan officers compensate by visiting them more). But the comparison is not between these two different treatment groups. Well-qualified applicants are not part of the study. If anything, this is a critique of external validity, noted by K&Z.
    • The report worries that attrition in the K&Z study is high: surveyors only found 70% of subjects for follow-up. It is possible that some borrower characteristic is correlated with both outcomes and attrition in a way that biases results. But this is harder to believe if outcomes intent to treat and attrition are uncorrelated with each other, as is the case.
    • Finally, the report points out that some of the Philippine corner store operators may have realized that they shouldn’t really have qualified for microcredit and must therefore be part of an experiment, which might have caused them to behave differently than they would have ordinarily. I suppose there is something to this…but again it goes to external, not internal validity.

Possibly Related Posts

  AddThis Social Bookmark Button

18 Comments on “I Failed to Seriously Consider the Limitations of Microfinance as a Poverty Reduction Approach”

  1. Andrew Sprung Says:

    Alas, the humor in your title is lost on Twitter… 140 characters makes nuance a challenge.

  2. Milford Bateman Says:

    As usual, David, you choose to deliberately misrepresent my views.

    I have a lot of time for academic research, but only of the sort that is genuine, independent, truthful, evidence-based, and not simply the sort of bought-and-paid-for ‘here are the conclusions we would like you to find’ type of ideologically-driven contract research that it becoming all too common in research institutions today. Unfortunately, even a cursory glance of the literature shows that the bulk of the academic research on microfinance impacts is horrendously weak and deliberately biased. This is perfectly understandable, of course; this research was mainly undertaken by individuals and institutions effectively required, pressured, or simply paid to massage the results to make it look as if microfinance has a positive impact when there might actually be no impact. I hate this type of ‘research’; reminds me too much of what I used to hear went on under Communism after arriving in the Balkans in the late 1980s as a young PhD student. And before we get carried away, note that this type of fraudulent and shoddy ‘research’ (in reality advocacy masquerading as ‘research’) seems to be quite prevalent here in the western economies today, notably in the World Bank, as the largely forgotten Deaton Report revealed (see Deaton, Banerjee, Lustig, and Rogoff, 2006).

    The refreshing thing about the Duvendack et al review, to me anyway, is that it was brave enough to genuinely and seriously evaluate the available evidence and conclude accordingly, no matter how unpopular it might make them, and even though they might be decried by you and others as agreeing with horrible me on some points!

    Milford

  3. Milford, if you do indeed take academic research seriously, then I stand corrected. The reason I thought otherwise is that you commented that your book, whose premise is that microfinance doesn’t work, “has little time” for academic research testing this premise.

  4. Maren Duvendack Says:

    Dear Readers of David’s blog – and David,
    Thanks to David for the initial impressions of our systematic review (SR). We seem to have touched a raw nerve for which we apologise; as we make clear in the SR and in other work we have the highest regard for David’s work on microfinance (specifically his paper with Jonathan Morduch 2009 – henceforth RnM), especially the transparency with which it has been conducted. But we do have some differences with David.
    This initial response to David’s comment is limited because both of us are on holiday in remote places with limited PC and internet access. We only address two issues, one point of clarification, and a curious absence. The two issues are our position on the RnM’s overall conclusion with regard to the beneficence (for the poorest) of the MF schemes originating (arguably) in the iconic MF projects in Bangladesh, and our assessment of Karlan and Zinman’s two papers (henceforth KnZ). The clarification is that we exclude the Dupas and Robinson study for reasons given in our section 2.1 – that it has no microcredit component and therefore falls outside our already over-extensive inclusion criteria. The absence is David’s curious implication that we differ from him significantly in our interpretation of the state of knowledge about the Pitt and Khandker study (henceforth PnK). In general we are happy for readers to draw their own conclusions without further clarification from us at this time.
    We have tried to clarify our disagreement about the overall conclusion with David – while agreeing with RnM that the evidence of impacts of MF on the poorest is inconclusive we suggest that they continue to believe in its (MF) beneficence in this regard while we suggest it might be more appropriate to conclude that the evidence is more or less equally consistent with the hypothesis of little (direct) beneficence for the poorest and that therefore what one concludes depends on other evidence and arguments. We come down on the side that suggest that MF has been oversold and it might have been better to either seek more conclusive evidence and or devote some more attention to search for other means to alleviate the poverty of the poorest.
    As far as the Karlan and Zinman oeuvre is concerned we are not sure we follow all that you blog, and do not want at this point to get involved in deep exegesis of the texts. Briefly we think KnZ have problems of internal and external validity. Indeed the South African study was excluded because it does not remotely concern the poorest (in a global sense), but there are some insights into the KnZ methodology to be gained from that paper. The main concerns were with Hawthorne and John Henry type effects and “attrition” bias. Given our circumstances we would prefer readers to look carefully and critically at our review (p.158 onwards in particular) where we discuss this, and comment as clearly as possible without casting aspersions on our intentions.
    In this regard perhaps we should comment that the SR methodology does not guarantee freedom from bias, including the self acknowledged or indeed unacknowledged priors of the authors. This is one of the reasons why in the medical arena the highest validity is awarded to knowledge about interventions which have been subject to multiple high quality primary studies and several systematic reviews. We tried hard to be objective in our assessments, but there was no doubt that is was hard to divorce our interpretation from our assessment of the PnK study. Following Morduch’s questioning of PnK and Pitt’s (1999) unchallenged demolition, we too embarked on a replication of PnK, using propensity score matching (PSM) rather than the instrumental variables (IV) methods used by PnK and RnM. While in the end RnM under Pitt’s tutelage (2011a &b) ended up achieving the same empirical results, while continuing to doubt causality, we found mixed (positive and negative) but generally weak impacts, highly vulnerable to unobservables (Duvendack and Palmer-Jones, 2011). We further comment that the data are not really suitable for PSM because of the absence of a large control population from which to draw matches. Nevertheless, our finding does not seem to us to warrant any strong conclusions about impact. We are not sure why this differs from David’s conclusions except that we prefer a prior which is sceptical of the hype around MF (as a magic bullet for poverty reduction for the poorest). Perhaps we should add that one of us (RPJ) has another perspective on poverty reduction in Bangladesh (see for example Palmer-Jones, 1993, and Palmer-Jones and Sen, 2003) which is based on acquaintance dating back to the early 1980s, and this may also be a source of bias. But as we have said, we would prefer to be judged on a careful reading of our text rather than condemnation by innuendo.

    Maren Duvendack and Richard Palmer-Jones

    References:
    Duvendack M, Palmer-Jones R (2011). High Noon for Microfinance Impact Evaluations: Re-investigating the Evidence from Bangladesh. Working Paper 27, DEV Working Paper Series, The School of International Development, University of East Anglia, UK.

    Karlan, D. & Zinman, J. (2010). Expanding Credit Access: Using Randomized Supply Decisions to Estimate the Impacts. Review of Financial Studies, 23(1), 433-464. doi: 10.1093/rfs/hhp092.

    Karlan, D. & Zinman, J. (2009). Expanding Microenterprise Credit Access: Using Randomized Supply Decisions to Estimate the Impacts in Manila: Available at: http://karlan.yale.edu/p/expan....._jul09.pdf.

    Palmer-Jones, R. W. (1992). Sustaining Serendipity? Groundwater Irrigation, Growth of Agricultural Production, and Poverty in Bangladesh. Economic and Political Weekly, 28(39), A-128-140.

    Palmer-Jones, R. W. & Sen, K. K. (2003). What has luck got to do with it? A regional analysis of poverty and agricultural growth in rural India. Journal of Development Studies, 40(1), 1-31.

    Roodman, D. & Morduch, J. (2009). The Impact of Microcredit on the Poor in Bangladesh: Revisiting the Evidence. New York: Center for Global Development, Working Paper No 174. Retrieved from http://www.cgdev.org/content/p.....il/1422302.

  5. Milford Bateman Says:

    David, academic research that is biased, as so much ‘research’ associated with the microfinance industry clearly is, I have no time for: it is merely propaganda and advocacy parading as ‘research’, and so shame on those who willfully produce it. But academic research that genuinely sets out to explore something important for the community or for the poor, like local economic development and financial policy, and bravely lets the chips fall where they fall, I’m totally in favour of.

    Having said that, there are still serious problems even with genuine research on microfinance impacts since the focus is on short term impact, whereas building a sustainable local economy requires long-term engagement and a suitable mixture of enterprises, and assessing whether or not this trajectory is underway is not something the sort of evaluation techniques used today can easily do. Milford

  6. Thanks, Maren and Richard.

    The point about the lack of coverage of microsavings, was not meant as a criticism per se, just a clarification for the audience.

    It’s not clear to me what you referring to by “David’s curious implication that we differ from him significantly in our interpretation of the state of knowledge about the Pitt and Khandker study.” My main reference to Pitt and Khandker is that your review “wisely summarizes my experience replicating the complex Pitt and Khandker study.”

    Likewise, I still don’t understand the evidentiary basis for the perceived “disagreement about the overall conclusion with David.” As I wrote, with a few caveats, I think you nailed it in the summary. In Roodman & Morduch, we state that there is a lack of evidence (in the papers we check) to contradict the hypothesis of harm and the hypothesis of help. I share your skepticism of the grand claims about microfinance, which were rooted in part in the studies we have both questioned. My main beef with Bateman, in case that is part of what is creating an impression of me as a defender of microcredit as an anti-poverty tool, is less with his skepticism (though I do not share its degree) than with the quality of his arguments, including his understanding and use of evidence. What the report refers to as his review of the impact of microfinance contains no reference to RCTs and consists of a misinterpretation of Roodman & Morduch—ironically in a way that accentuates the negatives in our results, in contrast with this report, which accentuates the positives.

    Perhaps at some point you’ll choose to engage in the discussion of K&Z, whether not in such a public forum. I think my blogging about it was a bit hasty, so I can imagine it was not entirely clear.

    Thanks again for a good report.

    –David

  7. Milford Bateman Says:

    Just saw on Phil Mader’s blog his take on the Duvendack et al review. I rather like it…! (Disclosure: I’m on Phil’s PhD committee). Go to:

    http://governancexborders.com/.....as-landed/

  8. Maren and Richard, a thought on K&Z: can you create simple simulations in Stata supporting the claims I dispute about threats to internal validity?
    David

  9. Maren Duvendack Says:

    David
    Our circumstances are not ideal circumstances to answer you r criticisms of our appraisal of Karlan and Zinman. Nevertheless, here are some points. Your first point is that you do not see why intention to treat should not deal with problems of selective rejection by loan officers. Would not the randomly selected who were rejected by loan officers based on characteristics unobservable to the data but observable to the loan officer not be likely to have performed worse that those marginally un-qualified who were seen as acceptable to the loan officers (and similarly for those randomly selected marginally un-qualified applicants who did not take up offers)?
    We would have to read our report carefully to respond to your second point to see exactly what you are referring to, and we are not in a position to do this now.
    Also, we would have to read K&Z again carefully to answer your third point; we would think that outcomes cannot be known to be uncorrelated with attrition since the outcomes are not known for those who drop out. Only first wave characteristics can be shown to be uncorrelated with outcomes. Hence we are not clear what you are referring to without re-reading K&Z.
    Your fourth point – Hawthorne effects of those who might expect not to have been offered loans – seems to us an issue of internal validity – behaviour is affected by knowledge of being experimented upon undermining the internal validity of comparisons between treated and untreated.
    As to simulating our views in Stata, we are not sure which of your points you are thinking of. We imagine that it is the process by which loan officers perceive some randomly chosen marginally-unqualified applicants as indeed marginal and therefore behave differently towards them? (and a similar pattern by which randomly chosen marginally un-qualified offered loans perceive this of themselves and behave accordingly?). Ww will think about this – or are you offering?

    Maren and Richard

  10. Maren, I hadn’t thought to offer, but I should have. I suppose simulation is most relevant for the first point. The code below shows what I guess you want it to show, which is that if there is an effect of credit, selection on the part of loan officers can reduce the average effect when measured on an intent-to-treat basis, because it reduces the number of people who actually get credit. However, if there is no effect, as appears to be the case in the Philippines, then selection makes no difference. And IV estimation, instrumenting treatment with intent-to-treat, is also consistent throughout (though I guess K&Z don’t do this, for independent reasons).

    At any rate, I’m not sure this constitutes a threat to internal validity. Everything in the chain of causation from the random number generator to the ultimate outcomes is the object of study. That includes the loan officers. So if they are selecting, then that is part of what is measured, not bias. Yes, this means K&Z are not measuring the pure effect of credit offers. But if they did, their study would have less external validity. The practical question is not whether credit offers have impact in themselves, but whether organizations designed to offer credit have impact. A standard (but too-often forgotten) observation about field experiments is that what is being evaluated is both the intervention and the intervenor, and the two can be statistically inseparable. That is the case here, and I think this is the right way to think about the matter, rather than as a threat to internal validity.

    clear
    set obs 1000
    gen IT = _n>_N/2 // intent to treat
    gen e = rnormal() // determinants of outcome unobserved by researchers

    * first without selection on part of loan officers
    gen T = IT * exp(rnormal()) // Actual treatment always positive
    gen O = 2*T + e // outcome. 2 is impact coefficient
    regress O IT, nocons
    ivregress 2sls O (T = IT), nocons

    * now with selection on part of loan officers–changes effect of intent-to-treat, as it should, but not treatment effect
    replace T = IT * exp(rnormal()) * (e>0)
    replace O = 2*T + e
    regress O IT, nocons
    ivregress 2sls O (T = IT), nocons

    * same as above except with 0 treatment effect
    * first without selection on part of loan officers
    replace T = IT * exp(rnormal())
    replace O = 0*T + e // outcome. 0 is impact coefficient
    regress O IT, nocons
    ivregress 2sls O (T = IT), nocons

    * now with selection on part of loan officers
    replace T = IT * exp(1*e + rnormal()) * (e>0)
    replace O = 0*T + e
    regress O IT, nocons
    ivregress 2sls O (T = IT), nocons

  11. Maren Duvendack Says:

    Dear David and (any) Readers
    Thanks for the helpful clarification. You are right that ITT if strictly adhered to would deal with selection by loan officers and self-selection. But as you will be aware ITT only works if you include ALL the to-be-treated and untreated. If there is any attrition (failure to include all those originally selected for each arm of the trial) there is risk of bias. Neither equivalent attrition rates nor lack of differences in characteristics at selection guarantees freedom from bias which is why ITT estimates usually involve imputation of outcomes for missing cases. The problem ITT has with attrition bias that is not random is equally easily simulated (now we are back in civilisation):
    [I have posted the code in a more usable form here.--David]

    While superficially it might appear that we were mistaken in criticising K&Z on the grounds that ITT failed to deal with the selection biases suggested, you will notice that we do not actually say this; the sentence refers to the design as whole. Note that, while the two sentences immediately preceding the sentence which suggests lack of internal validity (p44) link ITT and selection, this is predicated on the ability of loan officers (who were also those who conducted the creditworthiness survey and therefore may have had some knowledge of the quite detailed creditworthiness relevant characteristics of potential clients) to identify surprising creditworthiness. In following paragraphs we go on to suggest that this could well have affected their behaviour towards those “marginally” un-creditworthy by normal criteria, giving rise to failure to adhere to treatment effects (section 3.2.2), and attrition biases (3.2.3). These are reasons to question the internal validity of these studies. Nevertheless, you are probably right to point to some lack of felicity in our text which we can correct in further work. We doubt that, taken as a whole, the text is seriously misleading as to the internal validity of this design in these contexts. The general problem is, in our view, the lack of ethnographic experience of many who are involved in this type of experimental work which leads them to underestimate the challenges social intervention RCTs pose and to be unrealistic as to their validity.
    The point you make about comparison of the marginally unqualified with those fully qualified may also be partly poor drafting. The offending text seems to be in 3.2.2; however this refers to adherence to treatment not comparison of the marginally un-creditworthy with those normally creditworthy. If the experiment is to assess the effects of marginal relaxation of creditworthiness criteria then we still think that attrition bias and possible failure to adhere to treatment are the main threats to internal validity in this design. Internal validity is barely mentioned in K&Z’ Philippines paper but gets a paragraph in the South African one; this shows that attrition (survey completion) is uncorrelated with treatment – the same attrition rates – and that a few covariates do not differ statistically between experiment arms. Educational status, to mention but one variable one might consider relevant, is not among the covariates compared, and of course there are no unobserved variables for which comparisons can be made between the treatment arms.
    The difference in our overall conclusion from RnM surely relates to differences over the implications to be drawn from the evidence not differences over the evidence per se.

  12. Thank you, Maren (and Richard?). This is a very constructive response.
    A few points:

    • As far as I can see, this response doesn’t directly confront my simulation showing that selection does not bias results; here, there are no new specifics. (As far as I can see, the simulation of attrition bias is conceptually orthogonal to the selection issue, since the attrition bias can occur even when adherence to intention to treat is perfect.) Yet the response does still assert that “failure to adhere to treatment effects [is a reason] to question the internal validity of these studies.” What is this based on?
      The comment above says that in raising this issue the report was referring to a class of studies, not just K&Z. But this class includes K&Z right? Are you now agreeing that selection bias is not a threat to the internal validity of K&Z?
    • The new simulation rightly shows that attrition can bias results under certain circumstances. In thinking about this simulation, I realized I made a mistake in what I wrote in the post:

      It is possible that some borrower characteristic is correlated with both outcomes and attrition in a way that biases results. But this is harder to believe if outcomes and attrition are uncorrelated with each other, as is the case.

      This is nonsense because we don’t know whether attrition is correlated with outcomes since we don’t know outcomes for attritors. What I should have written, and what I think still buttresses my argument, is that attrition is uncorrelated with intent to treat and with a large set of borrower characteristics. These include sex, age, marital status, number of dependents, characteristics of the business, and, contrary to the assertion just above, education. See Table 2 of the working paper. (A nice feature of the K&Z design is that gives them a lot of information about individuals before attrition.) It is certainly possible that attrition is nevertheless correlated with unobserved determinants of outcomes. But in light of the above, I think the burden is you and your coauthors to show why we should believe this is a “high risk.” Do I understand correctly now that the assertion that K&Z is at high risk of bias is now based primarily on the attrition argument?

    • The last sentence above, “The difference in our overall conclusion from RnM surely relates to differences over the implications to be drawn from the evidence not differences over the evidence per se,” in my view merely repeats the error of the report. What implications do RnM draw? I just reread the conclusion. This is what I found:

      In our view, nothing in the present paper contradicts those ideas [about the effectiveness of microcredit]. We assert, however, that decisive statistical evidence in favor of them is absent from these studies and extraordinarily scarce in the literature as a whole. The principle difficulties for studying the effects of microfinance have been a lack of clean quasi-experiments and an absence until recently of randomized trials.

      Our prior is that exclusive reliance on one type of study is not optimal. But the present analysis suggests that for non-randomized studies to contribute to the study of causation in social systems where endogeneity is pervasive, the quality of the natural experiments must be very high. And it must be demonstrated. We also believe that longitudinal surveys like the ones in Bangladesh are worthwhile even when they fail to enlighten us about the impacts of outside interventions. In the Lowess plots in this paper, for instance, one can glimpse a trove of information about how poor households manage money and use financial services. Because of the eagerness to study important questions of impact, this trove remains substantially unexplored.

      I don’t see how you can infer from this that we are interpreting the lack of evidence asymmetrically, in defense of microcredit. And I don’t see how you can maintain, in direct contradiction to a message I sent both of you last November, such confidence in your assertions about what I think.

  13. Maren Duvendack Says:

    Dear David

    We clearly say “ITT if strictly adhered to would deal with selection by loan officers and self-selection” – so we have no problem accepting this – but add “[ITT] only works if you include ALL the to-be-treated and untreated.” This is well known, and is what we go on to address.

    Thanks for re-posting the reformatted Stata code – that is exactly how it appears in our “do” file and the text we posted before pasting into the blog. For future reference, it might be helpful to know how we can we format our submissions (see below where we request emhpasis)?

    The “failure to adhere to treatment” issue relates to the sequence from loan officer perception of marginal un-creditworthiness to different behaviour towards those so identified to what loan officers would do relative to those not so perceived and compared to what is likely to happen if the creditworthiness conditions were permanently relaxed. By the way, as you know, loan officers also offer loans to some randomly rejected marginally un-creditworthy, but this is easily included in your simulation – thanks to Sunil Kumar for pointing this out and other improvements to our code – but does not substantively alter the argument. This does not change the sign of differences between treatment and control, but it and the estimated size of impact does affect the estimated (size of) difference between treated and controls. It might be considered moot whether this is not an internal validity issue if one is concerned with the size of as well as direction of impact, which one surely is in a cost-benefit context; it is also an issue of external validity as you point out. Ergo, internal in-validity and external in-validity.

    We are not sure which working paper you are referring to since the link you provided does not work; for the Philippines working paper (http://karlan.yale.edu/p/expan.....y10_v4.pdf) Table 2 does indeed test whether attrition is orthogonal to characteristics (apologies for our oversight), but there is a difference in the number of cases included in the relevant regression (column 3), where only 1113 cases are included in the estimation sample while in columns 1 and 2 the estimation sample reported for assigned and surveyed are 1598 and 1601 respectively. (1600 in the 2009 version of the Philippines WP). The 2010 WP reports “1601 … compromise our sample frame” (p8). “Surveyors completed 1113 follow up surveys … Table 2, Column 2 shows that survey completion was not significantly correlated with treatment assignment ” (p9). This seems inaccurate since column 2 has a single coefficient for a variable named “Randomized loan decision” which seems actually to mean “surveyed” and reports the attrition coefficient; there is no interaction between loan assignment (and are no covariates interacted with “surveyed” or “surveyed * loan assigned”). Columns 3-5 test characteristics but since the sample size is 1113 we don’t see how it tests characteristics against attrition in a satisfactory way. What columns 3-5 test seems to be the characteristics of those assigned loans compared those not so assigned among the surveyed. This does not test the characteristics of those not surveyed – it only shows the comparison of characteristics for the surveyed. We would have tested the characteristics of those in each treatment arm between those surveyed and those who drop out. Maybe we have overlooked something which justifies confidence in these results?

    We were referring, as is clear from the context, to the peer reviewed version of this oeuvre, namely the South Africa paper published in RFS, where Table 2 Panel A does not report education. In a working paper version (http://www.consumerservicealli.....202007.pdf) Table 2 Panel A again does not report education.

    David, it is the “nothing” word in the sentence “In our view, [emph]nothing[end emph] in the present paper contradicts those ideas [about the effectiveness of microcredit].” (emphasis added) that does it for us – you can make a case that failure to reject the null (no effect) does not mean much for contradiction of the alternate (beneficent impact) if you like – no doubt the estimated coefficient of impact is not significantly different from a wide range of positive (and negative) values), but it (failure to contradict the null) surely does something (not nothing) to question belief in impacts of MF estimated in PnK?

    Maren and, indeed, Richard

  14. Maren and Richard,

    If you can post files on your own site, you can include links to them in your comments here using standard HTML syntax.

    Most of my comments have to do with attrition.

    First, I believe I created confusion for both of us interpreting the K&Z working paper Table 2. I believe you were originally right that the only correlation reported involving attrition is with ITT (and it is 0).

    Second, I think there is a small but crucial error in your simulation. The “nocons” option is not appropriate in the last regression because its error distribution is asymmetrically truncated. When you fix this, the bias goes away.

    Third, the K&Z data and code are now publicly available. I modified the data file by cutting all variables whose names begin with “bmxz” in order to bring the variable count below 2,048 and make the file accessible in Stata/IC.

    Now, if I understand right, the concern about attrition is that while total attrition is the same in the treatment and control groups, attrition patterns might be systematically different in the two groups. Presumably it would be treatment rather than intent to treat that would interact with subject characteristics to affect attrition. I see that the K&Z’s .do files tests for this by regressing the attrition dummy on the interactions between actual treatment and borrower characteristics, controlling for those for treatment as and those characteristics. I don’t think this was reported in any version of the paper. The key line is this (the variables ending in T are the treatment interactions):

    xi: reg surveyed female i.css_civilstatus dependents age i.edhi i.css_businesslocation1 i.css_businessspace1 i.css_businesstype1 employ0 employ1 yrsbiz_l cashflow_l femaleT _Icss_civil_2T _Icss_civil_3T dependentsT ageT _Iedhi_2T _Iedhi_3T _Iedhi_4T _Icss_busin_2T _Icss_busin_3T _Icss_busina2T _Icss_busina3T _Icss_businb2T _Icss_businb3T _Icss_businb4T _Icss_businb5T _Icss_businb6T employ0T employ1T yrsbiz_lT cashflow_lT lower_window if custid!=406710 & custid!=716898 & custid!=899999, robust plus

    Here is a simple code fragment that can be run after opening the data file to do the same thing. The F test for the joint explanatory power of the interaction terms returns a p value of 0.2897. So there doesn’t seem to be a lot of evidence here of actual treatment affecting who attrits.

    At any rate, I’m still hoping you can produce a credible simulation of how internal validity could be compromised in K&Z. Until you do, I will be unconvinced that there is a “high risk.”

    As for the controversy over what inferences Roodman and Morduch draw. I think we’d agree on some commonsense statements and interpretations: 1) Lack of evidence of causality means we have no evidence about whether it helps and no evidence about whether it hurts. 2) If your prior is that microcredit had been proven to reduce poverty, R&M should reduce your confidence in that prior. 3) The text is amenable to being interpreted as supporting 1 or opposing 2. 4) All of this argument is based on two sentences. 5) The key word in those two sentences is, I think, “contradict.” For a logician, contradict means “disprove.” That’s how I thought of it. For you, I think it means “reduce confidence in.” I think that is not as close to the dictionary definition. 5) Evidently, subtle wording changes or interpretations could have changed the overall characterization of R&M’s conclusions in the systematic review. Ergo: the systematic review’s estimator of R&M’s inferences was fragile and based on a very small sample (two sentences). In the langue of the review, it is at high risk of bias and ought not be asked to carry much weight.
    –David

  15. maren Duvendack Says:

    David
    The simulation: OK, hands up – the first simulation was not appropriate. Nevertheless ITT is generally thought to require following up all those in treatment and control groups (Lachin, 2000), so the problem is in the simulation not the intuition. Here is another go. In this simulation there is a sort of “confirmation effect”; i.e. for those who are treated and get a higher outcome (whether there is an impact or not) there is a greater tendency to believe the (better than expected) outcome is due to treatment and this leads them to be more available for interview (falsely attributing their relatively outcomes (top 70%) to the treatment), while for the untreated availability for interview is random. The following simulation applies this intuition.

    [I have posted the simulation program here. But see below on some edits I needed to make. --David]

    To contradict: the first definition in the (Shorter) OED (1963) is “to speak against”. We did not realise you were writing as a strict logician, and in this context we would find it surprising that you did. Anyway, enough said – we can all assert what it was reasonable to mean but only we know what we actually did mean, and there is only so far we can go in believing people might have meant in the circumstances. Let it be agreeable to (perhaps) disagree over this.
    On another point you make, we would suggest that not all words in a text are of equal value. We, and we think most people, are inclined to think that words which are in the conclusion to a paper are of greater weight – intentionally – than those in the middle as regards the significance of the meaning they entail.
    We will be checking out the K&Z data in due course, but one immediate point from a replication point of view is that the data are already “final”, and so no cleaning or derivations of variables from the raw data are yet available. As you will know the AER Data Availability Policy suggests this; (“…plus a description of how previous intermediate data sets and programs were employed to create the final data set(s). Authors are invited to submit these intermediate data files and programs as an option; if they are not provided, authors must fully cooperate with investigators seeking to conduct a replication who request them. “ (http://www.aeaweb.org/aer/data.php)), and this was indeed most excellently practiced by yourself(ves) in your PnK replication.
    With best wishes

    Maren and Richard (with thanks to Sunil Kumar for clarification of why “,nocons” was inappropriate)

  16. Excellent, Maren and Richard. I think this simulation-based discussion is really productive.

    It looks to me like a couple of lines of the code got mangled in posting. I made an educated guess to fix them. I changed what showed up as

    gen `dropC’ = rnormal() <-0.4
    gen `dropT' = `e' -0.4, nocons

    to

    gen `dropC’ = rnormal() <-0.4
    gen `dropT' = `e' < -0.4
    ivregress 2sls `O4' (`T4' = `IT'), nocons

    Is that right? Assuming it is, then I agree you’ve demonstrated a scenario in which attrition would bias the results. But I consider the scenario improbable. It posits two separate attrition processes, one for those who were supposed to be treated and one for those who weren’t, but which yet causes the same rates of attrition in both groups. This shows up in the code in the use of –0.4 for both groups. Wouldn’t it be coincidence for the two groups to have the same threshold of –0.4? This is precisely what I had in mind when I wrote in my original post:

    It is possible that some borrower characteristic is correlated with both outcomes and attrition in a way that biases results. But this is harder to believe if outcomes and attrition are uncorrelated with each other, as is the case.

    …except that I should have written “intent to treat” instead of “outcomes” in the second sentence.

    If I understand right, you’ve posited a coincidence; the evident need to do so, you might say, contradicts the attrition story. So I would not agree with you if you went from this to a “high risk” of bias. To repeat an earlier question, is this the story on which you primarily base the assertion of high risk of bias? Can you show it with a simulation that does not require a coincidence?

    –David

  17. Maren Duvendack Says:

    David
    Yes – our programme should have read:
    * attrition dependent on determinants of outcomes for treated
    gen `dropC’ = rnormal() < -0.4
    gen `dropT' = `e' <-0.4
    * now do regressions
    ivregress 2sls `O4' (`T4' = `IT'), nocons
    return scalar one=_b[`T4']
    ….
    Not sure how it got mangled because it is correct in the Word file from which it was pasted.
    We also thought about the coincidence of equal attrition rates (and about the lack of differences in characteristics between treatment and controls both in the original sample, and in the surveyed sample). It could be that this is how it worked out by chance; or it could have been due to quota sampling by enumerators – "go out and get equal numbers of treatments and controls" – or some variant of this that intentionally ended up with equal attrition. This is not a completely unlikely scenario. Or it could be "real" – this is how, expending equivalent effort to locate both treatments and controls, it really turned out. You pays your money and makes your choice – unless there is further information from the authors.
    The lack of difference in characteristics between treatments and controls in both original and surveyed samples in observed characteristics leaves out, of course, unobserved/unobservable variables which might be correlated with outcome and attrition. There will have been unobservables, but whether they were so correlated is unknown, although suspected – entrepreneurial ability for example. When we get time we may replicate K&Z’s test for differences in observed covariates, but as noted previously, without access to the “raw” data, we don’t know what happened in data cleaning and variable construction, so such a replication would leave lingering uncertainties.
    We don’t think “confirmation bias” is at all unrealistic and is very well documented (see Lachin, 2000, or the irrepressible Ben Goldacre for further reading), and it, or something like it, may well go to the heart of much of the bias in evaluations of well-meaning NGO interventions, including MF – and is of course close to placebo and related effects. Things get better and this is attributed to the new kids on the block – the NGOs (who also of course dish out or provide access to subsidised scarce resources and patronage).
    References
    Lachin, J.M., 2000, Statistical Considerations in the Intent-to-Treat Principle, Controlled Clinical Trials, 21:167-189.
    Goldacre, B., 2008, Bad Science, London, Fourth Estate.

  18. Maren and Richard,
    Perhaps this debate has come to an end. Your last message offers some theories but no evidence as to why the stories represent high risks. One can always assemble stories about unobservables or “lingering uncertainties” and so on. (Not quite sure what you mean by “confirmation bias” as the term does not appear in the systematic review, K&Z basically find no effect, and did not work with an NGO.) Although K&Z took reasonable steps to rule out attrition bias, you continue to focus on it, seemingly because you consider this the greatest weakest. Is this because a few of the coefficients in the regression I posted earlier actually are significantly different from 0, enough so that they could plausibly generate large bias? Or it is based on some evidence-based prior you have about unoboservable behavior of subjects or researchers? For example, I believe that nearly all non-randomized studies are at high risk of bias because of my experience examining several of them closely, because of studies performing head-to-head comparisons of randomized and non-randomized methods on the same data, because of Rossi’s Rules, which sum up his experience, etc. What are the counterparts for your concerns? E.g., are there good examples of RCTs with balanced attrition masking significant bias?
    –David

Post a Comment

We value frank and constructive exchanges and encourage you to use your real name in your comments.

Spam protection by WP Captcha-Free