David Roodman's Microfinance Open Book Blog

 

Response to Pitt’s Response to Roodman and Morduch’s Replication of…, etc.

March 31, 2011


[Note: A full response to Mark Pitt is now available.]

On Saturday night, Jonathan Morduch and I learned second-hand that Brown University economist Mark Pitt had circulated a paper via blast e-mail that challenges our replication of Pitt and Khandker, which was for a decade the leading study of the impact of microcredit on poverty. Here’s the abstract of the new paper:

This response to Roodman and Morduch seeks to correct the substantial damage that their claims have caused to the reputation of microfinance as a means of alleviating poverty by providing a detailed explanation of why their replication of Pitt and Khandker (1998) is incorrect. Using the dataset constructed by Pitt and Khandker, as well as the data set Roodman and Morduch constructed themselves, the Pitt and Khandker results standup extremely well, indeed are strengthened, when estimated with Roodman’s cmp program, after correcting for the Roodman and Morduch errors.

History has repeated itself. Back in 1999, Pitt wrote a similar response to Jonathan’s original attempt to understand Pitt and Khandker.

The timing is terrible for me as I am in the midst of a big push to finalize the book. So here is a quick response, pending the investment of time needed for a more thorough reply. Take these thoughts as preliminary.

Pitt’s response has exposed an important mistake in our work. With his fixes, we now match their key results extremely well. This is good news. But in truth we did not dwell on the sign mismatch—Pitt and Khandker found a positive association between borrowing and household spending where we found a negative one—but on whether cause and effect had been proved. In announcing the study, for instance, I wrote, “Seemingly, lending to women makes families poorer…but I just told you how much credence we put on such claims about cause and effect.” And our statical analysis on the causality claims has not changed (though we have more to do).

As a result, Pitt’s response:

  1. Validates our open approach to research, in which we, in contrast to Pitt and Khandker, have freely shared our data and code.
  2. Strengthens our main conclusion, which is about lack of proof of causality.

I’ll elaborate on those points in turn.

1. How does it validate our open approach? In the mid-1980s, the editor of the Journal of Money, Credit and Banking set out to perform replications (i.e., rerun the original math on the original data) of articles recently published in his journal. Roughly a third of authors responded to his request for data. From these, he determined that “inadvertent errors in published empirical articles are a commonplace rather than a rare occurrence.” This is not surprising. Most econometric work involves computer programming. All programs have bugs, and we’d expect more bugs in code written by amateur programmers, a group that includes most economists. Invariably, such mistakes look dumb when exposed. Think “I used the GDP of Switzerland instead of Swaziland.” But there was good news: rarely did the mistakes overturn the authors’ conclusions.

We have found the same pattern in our work. We unearthed small errors in Pitt and Khandker and in our own code. For example, Pitt and Khandker appear to treat all students in school as having no years of education. Separately, in one place (Table A2) they report 1,461 girls and 1,589 boys of school age in the sample while in another (Table 4) they report roughly double those figures, which could signal a problem in their analysis. The formulas at the end of the paper contain typographic errors. And so on.

Most of Pitt’s substantive points we would either quibble with or are secondary, in that they don’t affect the results much. But one is not: it points to one of those mistakes, on our side, that matters a lot. It is the omission of the dummy for a household’s target status (a zero-one indicator of whether a household formally qualifies for microcredit based on whether its landholdings are below half an acre). Pitt is right that including this missing control flips the sign on key coefficients so that they match the original. He is also right that this missing variable was in fact listed in a table in his paper and mentioned in text. We are grateful to him for uncovering and pointing out this mistake (which, to be clear, is essentially mine). The relationship between borrowing and household spending (a measure of affluence or poverty) now appears to be positive in our replication too.

Before discussing what this means for our understanding of the impact microcredit, I will finish my point about the research process. Replicating a complex statistical study without access to the original code and data is a scientific whodunnit. You look for clues, generate hypotheses about what the original authors did, then pick the hypothesis that best fits the observable data. Since the replicatees certainly made mistakes, you cannot take any particular piece of evidence as gospel.

To save space, the final Pitt and Khandker paper did not list complete regression results, only results for the microcredit variables of greatest interest. So we turned to the more-detailed working paper version, whose Table B2 does provide complete results for what would seem to be the key regression. The variable list there excludes the variable we should have included and it includes a variable Pitt says we should have excluded. Now we see that the list was for a slightly different regression, one restricted to target households, for which an indicator of target status would have been superfluous since it is uniformly 1. This difference from the headline regression (in Table 4.1C of the working paper) appears not to be signaled in the text. It is signaled by the very last row of the Table B2, which shows the number of households in the sample for this regression as smaller than in the headline one. But see above (on number of kids in school) for an example of where we didn’t know whether to take such numbers at face value. Meanwhile, the variable list we copied conflicted with the table, also in the working paper, that Pitt now says we actually should have copied (over the “participated but did not take credit” variable); again, it was not obvious which to believe.

Update: I discovered that the data set Pitt sent us embodies the same misleading variable list. Variables xb1–xb25 are the explanatory variables for household spending. None of these is the erroneously omitted variable, but the last is the erroneously included variable. The erroneously omitted one, nontar, is elsewhere in the file but clustered with variables that are not explanatory variables, suggesting that it is not either.

The point is not to excuse our mistake but to illustrate how closed research, meaning research in which data and code are kept secret, inhibits replication (which is a sine qua non of science). Our open research, in contrast, while exposing us to vehement attack, allowed for the detection of error and helped us improve our work. That serves the greater good.

Moreover, I must emphasize that the authors have been less open with us than it would appear. In our correspondence, and in reviewing our paper for the Journal of Political Economy Pitt emphasized that the data set he provided us was not the exact one used in Pitt and Khandker, and that therefore our work is not a true replication. The new paper now refers categorically to the same rectangle of numbers as “the dataset constructed by Pitt and Khandker.” And the copy now made public is better than what was made available to us, having informative variable names and descriptions, more columns (including some Pitt wrongly says were in there before), and a missing row restored. Meanwhile, Khandker fought our efforts to obtain the later round of survey data from the World Bank. (A more senior official ultimately shared it, according to the World Bank’s policies on data openness.)

2. How does Pitt’s work strengthen our conclusions? Our main conclusion is not about the failure to match the signs of the key coefficients. It is about the failure to convincingly demonstrate cause and effect. For example, read Pitt’s quote from my congressional testimony:

A couple of years ago I spent time scrutinizing what was then the leading academic study finding that microcredit reduces poverty. To decide whether I believed this crucial study, I replicated it—rerunning all the original math on the original data. The math and computer programming were really complex. In time, with my coauthor Jonathan Morduch, I would conclude that the study does not stack up. We’re not saying microcredit doesn’t help people, just that you cannot judge the matter with this data.

we have little solid evidence that microcredit, the dominant form of microfinance, reduces poverty.

This is not about our inability to match the sign. It is about whether we can interpret the sign as an effect of borrowing on poverty. (It could be an effect of poverty on borrowing, with better-off people capable of borrowing more.) This is the standard problem with inferring causality from non-experimental data, and is one reason that the randomized approach has caught on.

More to the point, when we fix our regressions, they continue to fail tests of the assumptions needed to infer causality. So improving the match to the original greatly strengthens our conclusion that this study does not convincingly demonstrate an impact of microcredit on poverty. (Non-econometricians can read Roodman’s Law of Instrumental Value to understand the underlying issues.)

Now that we are able to replicate the original much better, Jonathan and I will spend some time studying it. (And you can too: just add the nontar dummy to the second stage.) For now, here are a couple of preliminary tables (updated April 4) . In the first, under “WESML-LIML-FE” you’ll see that the “PK” and “new” columns now match well. That’s Pitt’s main point. But at the bottom of the last column is the result of a statistical test, a “0.022.” (For experts, this is a Hansen test on a parallel 2SLS regression, as explained in our paper.) That says that if the assumptions Pitt and Khandker make in order to infer causality from correlations are correct, then there is only a 2.2% chance that a certain test statistic would be as large as it actually is.

And this updates Table 4 of our working paper. This shows the parallel 2SLS regressions still failing Sargan/Hansen tests, and (in the right half) excluded instruments having clear explanatory power when included:

One of the questions we will investigate in coming weeks is why the 2SLS regressions produce such weak results compared to the LIML ones. In theory 2SLS is less efficient, but has the advantage of robustness to heteroskedasticity. (The LIML regressions model the Tobit censoring of credit and so require homoskedasticity.)

More technical notes: The regressions reported above incorporate two other changes from Pitt. One is his distinction between what I call the censoring threshold of log(1000) and the censoring value of log(1). cmp can actually handle this fine. The other is that in the instrumenting stage, observations for all three survey rounds now use the first round’s data. In addition to homoskedasticity, the Pitt-Khandker LIML regressions assume no cross-household error correlation; thus the only deviation from sphericity that they allow is serial correlation. This makes errors i.i.d. within survey rounds and makes the Sargan test valid for 2SLS regressions restricted to individual rounds. The Hansen test is required for regression that pool all three rounds. [Update: this is not quite right. The regressions are weighted for sampling---"pweights" in Stata parlance---which effectively introduces heteroskedasticity and invalidates the Sargan test even in cross-sections. But Hansen, now shown for all columns, remains valid and corroborates Sargan.]

Code and data are here.

Possibly Related Posts

  AddThis Social Bookmark Button

23 Comments on “Response to Pitt’s Response to Roodman and Morduch’s Replication of…, etc.”

  1. Weak instruments? LIML performs much better than 2SLS when the instruments are weak…

  2. “The LIML regressions model the Tobit censoring of credit and so require homoskedasticity.”

    This will go down as one of my favorite Roodman sentences ever.

  3. Good point, Hisaki. We will look into it…

  4. I am very very far from understanding all this (I am just a journalist!), but I am really impressed by David’s open response to a critique of a piece of his own work, and willingness to accept some inadvertent errors may have been made. This exchange also shows up, of course the importance of open data, as David mentions.

    Where, however, does this leave the cause and effect argument about the positive effects of otherwise of microcredit?

  5. Maren Duvendack Says:

    As one of the recipients of Mark Pitt’s email I awaited your reply with much anticipation. As you know I also worked with the Pitt and Khandker (PnK) data using propensity score matching (PSM) and sensitivity analysis. In my paper with Richard Palmer-Jones (http://www.uea.ac.uk/dev/publications/WP27) we report mixed results which appear to be highly vulnerable to unobservables casting doubts on sizes, directions and significances of the main PnK findings (and those of Mathieu Chemin, reported in the Journal of Development Studies, 2008).
    Like you, we encountered, at least initially, not inconsiderable difficulties in accessing a complete data set, and found major problems in reconstructing the variables used in PnK and in Chemin. Your prior engagement with Pitt and Khandker and your publically available data and STATA code set helped immeasurably. As you know our reconstructed variables almost exactly match yours although produced using different software (our code will be available shortly). We apply PSM to our reconstruction of the PnK data, investigate effects of the gender of the borrower and the role of borrowing from other sources (a point that is neglected by almost all microfinance evaluations) on microfinance impacts; we apply sensitivity analysis (Rosenbaum bounds) to draw conclusions as to the robustness and limitations of the PSM results, and arrive at what it seems reasonable to conclude.
    While our PSM results suggest that microfinance participation has some statistically significant impacts on the outcome variables used in PnK and Chemin (negative as well as positive), they are in general not distinguishable from those of other sources of finance; i.e. MF cannot be shown to be more effective than either other formal or informal sources of finance. Moreover, sensitivity analysis shows that all impacts are very vulnerable to unobservables, which are quite likely to have confounded the results given that unobservable entrepreneurial skills and aptitudes are quite likely not captured in the data and yet are known from well conducted qualitative studies to affect selection into and benefits from MF (e.g. Fernando, 1996). We conclude that, properly applied with sensitivity analysis, PSM resolves the particular problems in the PnK study by showing that it cannot generate robust conclusions of impact with these outcome variables. However, PSM may or may not be an appropriate tool in this context because the PnK data do not contain a suitable, large and relatively homogeneous control group as is recommended (Rosenbaum, 2002, 2010).
    This leads me to raise a general point about the limitations of econometric techniques in the context of weak research designs. In a landmark paper Leamer (1983) urged us to take the “con” out of econometrics and complains about ‘the whimsical character of econometric inference’ (p.38). A recent symposium revisits this debate (“Con out of Economics”, Journal of Economic Perspectives, Vol. 24, No. 2, Spring 2010 – http://www.aeaweb.org/issue.ph.....38;issue=2 ). Outside the economics profession it seems more accepted that sophisticated analytical approaches such as LIML, 2SLS, PSM and so on, cannot compensate for weak research designs and poor data (Meyer and Feinberg, 1992; Rosenbaum, 2002). Time for economists to take data production more seriously (and not just RCTs – but that is another matter)?

    References
    Chemin, M., 2008. The Benefits and Costs of Microfinance: Evidence from Bangladesh. Journal of Development Studies, 44 (4), p.463-484.
    Fernando, J. L., 1997. Nongovernmental Organizations, Micro-Credit, and Empowerment of Women. The ANNALS of the American Academy of Political and Social Science, 554 (1), p.150-177.
    Leamer, E. E., 1983. Let’s Take the Con Out of Econometrics. The American Economic Review, 73 (1), p.31-43.
    Meyer, M. M. & Feinberg, S. E. eds., 1992. Assessing Evaluation Studies: The Case of Bilingual Education Strategies. Washington D.C.: National Academy Press.
    Rosenbaum, P. R., 2002. Observational Studies. New York: Springer.
    Rosenbaum, P. R., 2010. Design of Observational Studies. New York: Springer.

  6. David, in my view, again somewhat provisionally, the claims to have shown causality are undermined by our work. That’s the conclusion that is strengthened.

  7. Martin Ravallion Says:

    This goes a long way toward solving the “Microfinance Mystery” posed by the conflicting results of Pitt-Khandker and Roodman-Morduch (see my posting and following debate on the World Bank blog: http://blogs.worldbank.org/dev.....omment-219 ). It was their opposite claims—ostensibly using the same data and methods—that grabbed our attention (not the well-rehearsed issues of “transparency” or “identification” raised by David). Thanks to David’s forthright admission of error we now know that the main mistake was in the RM paper, and that replication is not an issue. Phew! That is progress. But I can’t help but think that this should all have been sorted out much earlier, and in private amongst the researchers concerned. It was not prudent of RM (especially R) to make so much of their inability to replicate PK when it was clear that someone had goofed up. And it sounds like PK could have done a much better job in facilitating the replication efforts. Lessons learnt.

  8. Martin, can you substantiate the statement that I “make so much of their inability to replicate PK”? I believe that I have not in general emphasized the sign issue. Pitt presumably spent a bit of time searching for the most critical quotes from me, but only one sentence in the four passages he chose even mentions the sign issue or inability to replicate. That one sentence, from our abstract, seems like appropriate disclosure and is immediately deemphasized by what follows it. For me, the central issue has always been about whether impact could be shown.

  9. Martin Ravallion Says:

    You may claim now that your main concern was that GB borrowing in the PK data had not been randomly assigned, but that is hard to reconcile with your past writings. Here is an example from your testimony to US Congress:

    “A couple of years ago I spent a good deal of time scrutinizing what was then the leading academic study of the impacts of microcredit. To decide whether I believed the conclusion that microcredit in Bangladesh had helped families, especially when the loans were made to women, I decided to replicate the study, applying the original statistical methods to the original data. The math and computer programming were really complex. In time, with my coauthor Jonathan Morduch, I would conclude that the study does not stack up.” http://www.house.gov/apps/list......28.10.pdf

    “Math and computer programming” are not needed to establish that there was non-random placement of GB lending in the PK data, which has never been claimed by anyone. Other examples in which you are clearly referring to the replication issue can be found on this web site, including at http://blogs.cgdev.org/open_bo.....search.php and http://blogs.cgdev.org/open_bo.....at-cgd.php.

    I really doubt if your paper with RM would have attracted much attention if all it did was reiterate the virtues of randomized control trials—though, as an aside, I think you overstate those virtues—or transparency, which we probably all agree is desirable. The headline issue was clearly your failure to verify the principle claims of PK on the household welfare gains from BG borrowing by women.

  10. Martin none of these links, as far as I can see, emphasize the failure to replicate. They do assert that we believe the reasoning of the original study is incorrect, that impact has not been shown. That’s the interpretation we wanted. Indeed we continue to offer (provisionally now) evidence against the “principle claims of PK on the household welfare gains from GB borrowing by women.”

    Part of that is based on math and computers, as exhibited above. Now that we have a good replication, we will use an interplay of economic thinking and math and computers to understand the estimates as well as we can and try to get to the bottom of some old debates in a way that was hard (at least for me) without a replication. It seems reasonable to hope that the use of math and computers will enhance our understanding. The issue of course is not whether placement was random, but whether there was a valid quasi-experiment. Along with sound economic reasoning, math and computers can help us understand that issue.

    You know, I don’t think of our paper as having gotten much attention. It came out in the same few months as the Karlan and Zinman and the Banerjee et al RCTs, as well as Portfolios of the Poor, all of which deserved and got far more attention. Ours did get blogged by Newsweek…in a post that also did not mention failure to replicate. I don’t think we can be held responsible for the tenor of what little coverage we did get, though certainly we could be held responsible for behaving in a way that distorted that coverage, if so we did. But I’ve already made the case that we deemphasized failure to replicate.

  11. Ravallion writes: “It was not prudent of RM (especially R) to make so much of their inability to replicate PK when it was clear that someone had goofed up. And it sounds like PK could have done a much better job in facilitating the replication efforts. Lessons learnt.”

    This kind of dismissiveness is so disappointing.

    Note to self: In future empirical work, make sure your code and assumptions are as impenetrable as possible, obstruct any efforts to understand the work, and use any means available to discredit those who attempt to validate it. You can count on your funders to support you in this, and with any luck, in the academic community to look the other way while you do this.

  12. Martin Ravallion Says:

    Arnold, I was not trying to be “dismissive.” Replication work is important, and we do not do enough of it in economics. I agree that openness is crucial, and here things seem to be improving. But the replication effort must also come with a responsibility to be meticulously careful before going public. To say that is hardly dismissive of replication, but ultimately supportive.

    David claims that PK were not as open about their data and code as they should have been. But I also have empathy for PK (and not just because the World Bank funded their research). For two years now, the RM paper has been circulating, with the authors drawing ample attention to it, saying that PK’s main findings could not be replicated–that PK got it wrong. In the last week we have learnt that the key mistakes were in RM’s replication effort. This is not harmless stuff. It should not have happened this way.

  13. Believe me, I also have empathy for PK. For two years, the credibility of their work has been in doubt. I appreciate your defense of the damage that’s been done to their reputation (and I apologize for my implication about World Bank funding).

    But frankly, I think they bear equal if not greater responsibility for the two-year “microfinance mystery” you’re blasting RM over. As someone who has done empirical economics work, both in an academic setting and in litigation, I have very little doubt the uncertainty could have been resolved two years ago or earlier if PK had simply provided their code to RM.

    Your opinion carries a lot of weight, and intended or not, the fact that you’re not blasting PK for their role in perpetuating this uncertainty seems dismissive of the need for openness and it really bothers me.

    Yes, some hard questions need to be asked here. You’re already asking them of Roodman. Good. But they need to be asked of PK as well. Why did PK choose not to provide their code to RM? Is it possible they preferred to leave the issues in doubt for two years rather than risk having a potential mistake confirmed?

    I agree this is not harmless, and that it should not have happened this way. But I think that preventing this from happening in the future means looking very carefully at who needs to take responsibility for which mistakes.

  14. Martin, you write, “It should not have happened this way.” To the extent you mean that I should not have been so publicly assertive, we’ve already batted that one back and forth so I’ll let it rest.

    To the extent you mean that the replication process—our attempts to obtain data and replicate, our communications with Pitt and Khandker, their efforts to help, our decisions about when to go public, their efforts to check our work—should have gone differently, that prompts to me to elaborate on how it went:

    • Khandker was almost entirely uncooperative, as mentioned before. He tried to obstruct the release of second-round data (used in Khandker (2005)), contrary to World Bank policy. In response to queries about methods, he referred me to a former World Bank research assistant who seemed as helpful as he could be given that he no longer had access to the working files.
    • While working on the Pitt-Khandker replication, I had a pretty good e-mail dialogue with Pitt. It was not, I presume, easy for either of us. He answered questions. He said the original data and code were no longer accessible. He eventually offered the household-level data set referred to in his new paper while emphasizing that it was probably not the exact one used in the original. It lacked one observation, as well as outcomes other than household consumption and dummies defining the samples for the first-stage equations. Still, it was quite useful. There came a point though when one of my sets of queries received essentially a one-line reply. This query raised a couple of themes Pitt slams us for in his new paper. One was the log(1)-log(1000) subject. The other was the lack in the data set he sent of dummies for whether households had the option to borrow (credit choice), contrary to his emphatic statement now that “They did not use the variable in the PK estimation dataset that I sent them” (emphasis in original). It seemed I had reached the end of the road in learning from him…at least for a couple of years.
    • Before posting the working paper, we sent it to Pitt and Khandker, offered to send data and code too, and waited a month. We heard nothing back.
    • We posted the paper, along with data and code so that others could check our work.
    • We submitted the paper to JPE. Nine months later we got back a sort of “reject-and-resubmit” decision. The sole reviewer was Pitt, who identified none of the issues in his new paper. The points he made were helpful but ultimately minor.
    • Pitt wrote the new paper and did not discuss any of his findings with us before making it public. What prompted him to do this now, after two years of access to the data and code, and to find these problems on his second shot at it, I don’t know.

    So I think we did a pretty good job of running this process to maximize opportunities to find errors sooner rather than later.

  15. Richard Palmer-Jones Says:

    Martin Ravallion says he has empathy for PK, but complains that RM’s paper has been circulating for 2 years and that things should not have happened this way. Surely the two years since RM became available provided plenty of time for Pitt and, or Khandker to test RM’s claims and put the record straight? Surely PK should shoulder some of the blame for the length of time RM’s claims have been circulating without correction, and the damage it may have caused, because of their failure to respond?

    Ignorance of RM cannot be the cause of this delay as RM were apparently in communication with both Pitt and Khandker, who were aware of the RM project. Difficulty in accessing RM data or code is not an explanation either since, as David writes, and I can attest, both code and data have been and are readily and fully available and easy to use – indeed RM are exemplary in this respect.

    Replication is a crucial and legitimate part of scientific practice that should be more widely practiced among economists because errors do occur (Dewald et al., 1986; McCullough et al., 2006; Hamermesh, 2007). Authors should be willing to engage in open and fair comment, and indeed be happy to promote replication.

    Dewald, W. G., J. G. Thursby, et al. (1986). Replication in Empirical Economics: the Journal of Money, Credit and Banking Project. American Economic Review 76: 587-603.

    McCullough, B., K. A. McGeary, et al. (2006). Lessons from the JMCB Archive. Journal of Money, Credit and Banking, 38(4): 1093-1107.

    Hamermesh, D. S. (2007). “Viewpoint: Replication in Economics.” Canadian Journal of Economics 40(3): 715-733.

  16. Jenny Aker Says:

    David,

    Thanks for this posting. I have followed this debate now for 2 years, with much interest, and teach your paper (and PK’s) in class.

    First, I commend you and Jonathon for posting your datsset and code, and for openly admitting your mistake. This takes a great deal of courage.

    Second, I find it ironic that you are now being criticized for your mistake when it has been publicly available for some time. I can only imagine how many mistakes are made in economics that are not brought to light because our datasets and code aren’t posted (which makes me realize that I need to post mine). It seems as if this issue could have been resolved earlier if PK had shared their data and code, and/or offered to work with you.

    Third, in reading the paper, book and blogs, it has been clear that you and Jonathon were focusing on causality, rather than the sign. The punchline was never that the sign was negative. The punchline was that: 1) to understand whether MF works, we need rigorous research; 2) such research needs to clearly establish causality; 3) studies of the link between MF and welfare often suffer from identification problems; and 4) here is an example of a study whereby we have some doubts that causality has been convincingly determined; and 5) this is unfortunate, because policymakers, donors, NGOs and others have made some decisions about investing in MF based upon such studies. So, let’s do more of them, and do them well (and not necessarily just RCTs).

    Academic debates of this kind are so important because they are more than academic. They have real-world policy implications.

  17. I would like to set the record straight on the claims by RM that their mistakes were due to PK not being more open and forthcoming , resulting in a “scientific whodunit.” I would like to set the record straight.
    • It has now been established that RM wrongly assigned log(1000) of credit to all those in the sample who did not borrow. However, the original PK paper in equation (A3) clearly lays out that the value of credit for those without credit is zero. This is not some attribute of the model that is buried in my computer code – it is laid open in the paper in black-and-white.
    • It has also been established that RM mistakenly omitted the nontarget variable and included the variable crcensored. This is the mistake that reversed the signs of the estimated credit effects. The nontarget variable is listed in the PK table of independent variables (Table A1 of PK (p.993), as are all of the independent variables used in the PK analysis. The crcensored variable is not. Adding to the confusion of any reader of RM is that the mistakenly included variable is not in their table of independent variables, but the mistakenly excluded variable nontarget is in that table. This is an internal inconsistency between their work and how they describe it. Khandker and I cannot be responsible for finding an error when it is misrepresented in Roodman and Morduch’s written work.
    • Roodman also claims above that “We submitted the paper to JPE. Nine months later we got back a sort of “reject-and-resubmit” decision. The sole reviewer was Pitt, who identified none of the issues in his new paper. The points he made were helpful but ultimately minor.” The facts on the review are this. The RM manuscript was sent to me for review by the Editors of The Journal of Political Economy (JPE) in September of 2009. I was traveling out of the country when I received the manuscript and could not write a review until after I returned around New Years 2010. By then, I had received an email from the Editor specifically telling me that, based upon the review already received from “a highly qualified referee”, I should only focus on the issue of replication. In the time I had to prepare my review, I did not find the econometric and data errors detailed in this manuscript. Indeed, I was puzzled by the RM replication results since I had sent Roodman what I considered to be data actually used in PK, or something very close to it, and Roodman had a reputation as being skillful in technical aspects of econometrics and in writing very useful and popular add-on code for Stata. Since I did not find the serious mistakes in RM at the time that I wrote the review, although I had every incentive to do so, I offered difference in data as one hypothesis for the difference in the PK and RM estimates, and provided some evidence pointing in this direction. It is difficult to tell econometric error from data error.

    Various claims have been made that it was in part a lack of cooperation or openness on my part that caused the failure to replicate and the mystery that followed. This is not so. I address each claim in turn.
    • First, on the issue of my cooperation in the RM replication effort. After my return from extensive travel abroad, we had a cordial exchange of emails concerning his replication beginning 1/4/2008. He sent me his dataset on 1/14/2008 at my request so that I could see if I could find any obvious errors. On 1/20/2008, he sent me revised estimates based on corrections that he had made. On 2/26/2008 he emailed me to say that he had found an error in his latest dataset but that correcting it did not change the results very much. Two days later, on 2/28/2008, I sent him my estimation data which I told him was either the exact estimation sample for the household consumption regression, or something very close to it. In that email, I wrote that the data file that I attached may “very well be exactly the same as that used to estimate the expenditure equation.” (There was a problem with the archive media from 1996 that led to my uncertainty). In a short message he sent five days later, David Roodman noted that he was able to reconcile “some of the data differences,” but that “a few first-order questions/issues” remained. I expected to hear more from Roodman about how the comparison of the data sets was going, but did not. I did not receive another message from Roodman concerning any aspect of the data, his replication attempt, or any other topic until Roodman sent me via email a draft of the complete RM paper (in its third revision) about twelve months later (2/25/2009). Apparently, Roodman and Morduch believed that they had enough information from me to proceed with their full paper, in spite of the “first-order questions/issues” that he said remained in his last email, and my continued willingness to respond to his requests for information. My responses to David Roodman during the short period of our exchange were in every case prompt, friendly, and responsive.
    • Second, with regards to sharing the data. He already possessed the 1991/92 dataset used by PK when he first contacted me. Morduch, the co-author of the RM paper, had the data since 1998. These data are not secret – everyone who ever sent me an email requesting them received a cd-rom from me with the data.
    • Third, as regards to sharing code. The log-likelihood function that is programmed in my code is presented in detail in the appendix to the published version of PK, and it seems that Roodman’s cmp program computes the same log-likelihood. That is not the source of the replication errors in RM. My estimation code was written in the FORTRAN77 programming language in 1994/95, and was compiled on an IBM RS/6000 workstation running the AIX operating system, a computing platform that has been extinct for quite some years. The code made use of many subroutines and functions from the IMSL, LINPACK, GQOPT and IBM AIX FORTRAN link libraries. In 1994/95, it was not possible to estimate maximum likelihood models with 250+ parameters on a PC, and, unlike Roodman’s cmp, one could not make use of the powerful environment that Stata provides. My estimation program contained many hundred of lines of FORTRAN code and was specific only for the task at hand at the time.
    • Maren Duvendack has weighed in with her own allegation of attempts to repress replication, writing above “Like you, we encountered, at least initially, not inconsiderable difficulties in accessing a complete data set, and found major problems in reconstructing the variables used in PnK and in Chemin.” Duvendack first emailed me just last year, April 6, 2010, asking for a special sub-survey of anthropometric data from the 1991/92 BIDS survey, referring specifically to a paper in the International Economic Review (Pitt et al., 2003, “Credit Programs for the Poor and the Health Status of Children in Rural Bangladesh”) that uses those data. I am listed as the corresponding author on this publication and only my email address is provided in the article, so I am the appropriate person for her to contact. She identified herself as “a PhD researcher at the University of East Anglia in the UK.” Eight days later, I emailed her the complete dataset that she requested in the form of a Stata file. Within a few days (but less than eight days) of sending her request to me, she sent an email to the Vice-President of the World Bank complaining about the difficulty in getting this data. This letter of complaint filtered its way through the Bank until it got to my colleague Shahidur Khandker days later, who then passed it on to me quite some days after the data were sent. Duvendack never sent me an email about constructing variables or any other matter except for her request for the anthropometric data, which was promptly provided. Duvendack is certainly familiar with the PK data as RM thank her and Richard Palmer Jones for their “scrutiny of our data set construction” in their acknowledgements footnote in their 2009 paper.
    • I will respond to the more substantive comments that RM make concerning the overidentification tests and causality shortly.

  18. I understand Mark’s need to defend his personal conduct. For the most part, my description of the saga was meant to defend my own rather than attack his; that is, to show that Jonathan and I had worked hard in the conduct of our replication process to create opportunities to detect error before it became public. My only criticism of Mark’s personal conduct was that a message from me that raised several substantive issues, including one in Mark’s new paper, received a one-line reply, seeming to signal to me that the dialog was over.

    My broader point, as far as judging process goes, is about the value to the public of the kind of code and data sharing we did, and which Pitt and Khandker did not do (as is the norm now and was even more so then). The code for transforming raw data into data for analysis, and the code for the analysis, were never made public. Something at least close to the data set prepared for analysis was not shared until three years ago, after some diplomacy on my part. The copy that Pitt has just released is much easier to understand than the one I received, and is more complete.

    Mark, I look forward to your comments on the substance. I always learn from you.

  19. Maren Duvendack Says:

    Dear Mark Pitt
    It is correct that I never directly discussed the PnK data set with you (I never claimed otherwise) that is because I communicated with Khandker and Samad in March 2009 and with Chemin in the summer of 2009 on this. From June 2009 I also started communicating with David Roodman.
    With regard to the anthropometric data, I am afraid to say that your description of our communication is not entirely accurate. First of all the data should have been available on the WB website along with the PnK data. But it was not and hence I contacted Khandker on 29 March 2010 asking for this particular dataset. He never responded, hence I contacted not the VP but a research manager at the World Bank on 5 April 2010 to help me retrieve the data, he responded immediately copying Khandker and referring me back to Khandker who then responded on 6 April 2010 saying he does not have the data. Khandker asked me to get in touch with you which I promptly did on 6 April 2010. I did not hear from you for a week and hence I contacted the same research manager at the WB again on 14 April 2010 to get things moving. He responded immediately and then for the first time copied two senior management staff at the WB in his email as well as Khandker (you were never copied in this email). And magically you responded the next day (15 April 2010) producing the data. What a coincidence.
    Maren
    PS. I can produce the original emails as evidence should you have further questions or doubts.

  20. I have just posted a short paper responding to the Roodman and Morduch claims about tests of overidentification and causality. The abstract reads:

    Abstract: After Pitt (2011) pointed out the flaws in the RM replication effort, Roodman subsequently notes “that when we fix our regressions, they continue to fail tests of the assumptions needed to infer causality. So improving the match to the original greatly strengthens our conclusion that this study does not convincingly demonstrate an impact of microcredit on poverty.” This claim is based on RM’s tests of overidentifying restrictions, which the current response demonstrates are fundamentally flawed. New results presented below provide strong support to the hypothesis that microfinance causally improves the lives of participants.

    The paper is available here:

    http://www.pstc.brown.edu/~mp/.....cation.pdf

    Comments are most welcome.

  21. Hi David,

    Pitt’s note above ‘New results presented below provide strong support to the hypothesis that microfinance causally improves the lives of participants.’is in variance with what you have been suggesting all along. I quote your earlier conclusion “I cannot dismiss traditional microcredit. But it is hard for me to defend it as a strategy for helping poor people”.

    I am sure you will counter Pitt’s work over the next few weeks and the debate amongst academics will continue for several months or maybe even years to come.

    But overall all this has left my slightly disillusioned with the way academic research has been done and the way results have been presented. I almost get a feeling that several academics jumped the gun and presented findings to the larger world in a hurry.

    As I had commented earlier on your blog “.. in my opinion academics and experts need to truly weigh in and mull the conclusions they draw. Their words carry weight. They need to transcend human nature in so many ways. More than anybody else they have an even bigger fiduciary responsibility towards the poor in this world.”

    http://blogs.cgdev.org/open_bo.....estone.php

    I hope we will see frank and open discussions amongst all academics and then in one voice you will present your findings on microfinance. While today the debate is on microfinance, tomorrow it could be on another topic.

    Most importantly I think academics should recognize their fiduciary responsibility in evaluating social programs and come up with suitable academic rigor and processes so that no single researcher jumps the gun and publishes information that confuses funders, government etc. and thereby denies beneficiaries from what would have been good programs.

    Bhalchander Vishwanath

  22. Thank you, Bhalchander.

    But I must argue with you on a few points. First, that quote from me is based on far more than my lack of confidence in this particular study. It is also based on the randomized studies, and on my survey of qualitative studies of the impact of microcredit on empowerment, which surprised me in their overall negativity.

    Second, for what it’s worth, I am not an academic.

    Third, I am not sure what you mean by “jumping the gun.” As recounted in an earlier comment (above), we went through a pretty long, multi-step process in doing our analysis and putting it in the public domain.

    Fourth, I think it would be a mistake for funders to flip their views about microfinance based on one paper.

    Finally, I believe Pitt’s latest paper is weak—it does not leave much of a dent like the last one—and will elaborate in due course.

  23. Bill Savedoff Says:

    Thanks to David for focusing on the content of the debate – the purpose of replication and re-analysis is not about settling who is “right” but rather subjecting our methods and interpretation to scrutiny so that we can get closer to understanding what’s really going on.

    Replication like this is frighteningly rare in social sciences. The number of policy papers that promote a particular approach based on a single study – even if well-conducted – is rather large, and I find this to be disconcerting.

    One sign that replication and re-analysis may begin to get more common is the decision by Journal of Development Effectiveness to encourage submissions of replicating studies (as well as those with negative or null findings). Michael Clemens blogged about it here: http://bit.ly/hljZq2

    Keep up the good work.

Post a Comment

We value frank and constructive exchanges and encourage you to use your real name in your comments.

Spam protection by WP Captcha-Free