Kenyan Economist Offers First Independent Evaluation of Millennium Villages Project
November 28, 2011
A remarkable study reached the public last week. It is the first independent, rigorous, firsthand evaluation of the Millennium Villages Project (MVP), an effort by the United Nations and Columbia University whose admirable goal was to show that “the poorest regions of rural Africa can lift themselves out of extreme poverty in five year’s time.” The new study shows that the MVP is far from reaching that goal at its flagship site.
Working on her own, without the collaboration or endorsement of the MVP, Kenyan economist Bernadette Wanjala of Tilburg University collected data on households in or near the site at Sauri, Kenya, where the project was launched in 2005. She interviewed 236 randomly-selected households that had been exposed to the MVP’s large package of agriculture projects, education programs, infrastructure improvements, and health/sanitation works. She also interviewed 175 randomly-selected households from an area of the same district (called Gem) that was not exposed to the intervention. She wanted to compare the two groups to see for herself whether or not the project had done what it promised: to lift the treated households out of poverty in a few years’ time and spark “self-sustaining economic growth”.
In their just-released paper, Wanjala and her colleague Roldan Muradian of Radboud University use the new survey data to measure the project’s impact on poverty. They carefully compare treated and untreated households that were otherwise similar in many ways—such as household composition, adults’ education, fertility, economic sector, and land holdings. Because this project is large and intensive, spending on the order of 100% of local income per capita, it is reasonable to hope that it might substantially raise recipients’ incomes, at least in the short term.
Wanjala and Muradian find that the project had no significant impact on recipients’ incomes.
How is this possible? While Wanjala and Muradian find that the project caused a 70% increase in agricultural productivity among the treated households, tending to increase household income, it also caused less diversification of household economic activity into profitable non-farm employment, tending to decrease household income. These countervailing effects are precisely what one might expect from a large and intensive subsidy to agricultural activity. On balance, households that received this large and intensive intervention have no more income today than households that did not receive the intervention.
Wanjala and Muradian’s independent results contrast sharply with the MVP’s own impact evaluation, which is carried out strictly internally and is based on confidential data.
The director of the MVP, economics professor Jeffrey Sachs of Columbia University, states that a top priority of the project is to “raise community incomes” and to meet the Millennium Development Goals, the first of which is to cut income poverty in half. Sachs asserts that “incomes are rising” and that this “enormously successful” effort is “achieving its goals”. But these statements are not supported by scientific impact evaluation. To date the project has not released any numerical data on the impact of the project on recipients’ incomes. It has released extensive data on the sites apart from income, and is collecting data on incomes, so it is noteworthy that the project releases no analysis of impacts on incomes.
The project’s internal impact evaluation also describes very large positive effects on agricultural productivity, stating that crop yields “doubled or even tripled across the sites”, while Wanjala and Muradian independently find more modest effects on agricultural incomes, which are directly linked to agricultural productivity. Part of this difference can be attributed to the fact that the MVP’s evaluation measures one thing and describes it as something else. Although the MVP describes the impact it calculates as occurring “across the sites”, it is in fact calculated by comparing farm plots that received inputs to “plots within the MVP area but where inputs were not applied.” In other words, the doubling or tripling of yields happens when farms receive the intervention and fully cooperate with and correctly implement the intervention. (In economics this is called the “treatment-on-treated” effect.) Wanjala and Muradian’s lower estimate compares farm productivity across entire villages where the intervention was attempted to entire villages where it was not attempted, accounting for all limitations to the de-facto degree of cooperation and implementation by individual farmers. (In economics this is called the “intent-to-treat” effect.) Wanjala and Muradian’s method is a more meaningful measure of the impact of the project. The two different effects would only be equal if farmers were interested, willing, and able to do exactly what is wanted by outside technical experts.
The results of Wanjala and Muradian are even more striking for another reason. The Millennium Villages’ intervention sites were chosen specifically because the project’s designers thought the project would work better in those villages than in other villages. The project’s evaluation protocol states, “Issues of feasibility, political buy-in, community ownership and ethics also featured prominently in village selection.” This selection bias alone might have caused incomes at the treated sites to be higher, many years into the project, than incomes at the untreated sites—even if the project itself hadn’t caused that difference. Instead, incomes today are typically the same in the two groups.
My colleague Gabriel Demombynes and I have described several failings of the MVP’s internal impact evaluation over the past year (see here, here, and here), and our concerns are widely shared in the development community (such as here, here, and here). The project has categorically rejected the need for any change at all. Wanjala and Muradian’s new study highlights the critical importance of independent and transparent impact evaluation in development work. It also calls into question whether the Millennium Villages Project, the latest in a decades-old tradition of village-level package antipoverty interventions that have failed to reduce poverty in the long run, is capable of reducing poverty even in the short run.
Possibly Related Posts
- The Millennium Villages Evaluation Debate Heats Up, Boils Over
- The Millennium Villages Project Continues to Systematically Overstate Its Impacts
- When Rigorous Impact Evaluation Is Not a Luxury: Scrutinizing the Millennium Villages
27 Responses to “Kenyan Economist Offers First Independent Evaluation of Millennium Villages Project”
Post a Comment
We value frank and constructive exchanges and encourage you to use your real name in your comments.






November 28th, 2011 at 5:41 pm
Wow, this must be why Sachs & co. doesn’t want to release internal data. The importance of the distinction between the intent-to-treat effect and treatment-on-treated effect is hard to oversell
November 28th, 2011 at 6:07 pm
O, snap.
… sorry for the econ jargon.
November 29th, 2011 at 9:08 am
Michael — thanks for sharing this. Jane Shevstov made this comment at Marginal Revolution: “…according to the calculator at http://www.dssresearch.com/Kno.....ators.aspx gives a power of 10.7% for alpha=0.1 and 5.5% for alpha=0.05. And you cannot draw a negative conclusion from a study with low power!” I haven’t repeated the power calculation, but the conclusion seems intuitive to me based on the means and SD. What do you think?
November 29th, 2011 at 10:27 am
What I find most interesting is that recipients of this “aid” had to give up more lucrative jobs in order to implement their “gift.” Just how was that supposed to help these Kenyans, exactly? Good thing Kenyan economists weren’t shy about publishing their research. Thank you for sharing!
November 29th, 2011 at 11:49 am
Thanks Brett. Several things.
First, a question of primary interest is whether the project achieves what is says it achieves, since this is what it tells its funders they are getting for their money, backing those claims with the sterling research reputation of Columbia University.
The project has asserted its ability to achieve the Millennium Development Goals, which implies a 50% reduction in poverty rates. It has also stated that its package intervention provides “a solution to extreme poverty”, which implies a 100% reduction in poverty rates. Those are the null hypotheses to be rejected if we are assessing whether or not, as the project’s director states, it is “achieving its goals”; those are its goals. And there is much more than sufficient power in this test to reject both of those null hypotheses for any reasonable degree of confidence. This may be why the project has, notably, never released any data on income despite its numerous claims to cause huge decreases in poverty.
Also interesting is a different question: whether or not the project has caused any decline in poverty whatsoever. Again, this is not an important question if we are assessing the project’s ability to meet its own goals and achieve what it says it achieves. But aside from that it might be interesting to know whether or not the project achieves, say, a 2% decrease in poverty rates. There is not enough statistical power in Wanjala and Muradian’s analysis to distinguish a 2% decrease in poverty rates from no decrease in poverty rates, nor is there power to distinguish it from a 2% increase in poverty rates. You’re certainly right that any of those outcomes are compatible with Wanjala’s data, to a reasonable degree of statistical confidence.
Note that Wanjala’s sample size is not that small; it’s of the same order as the Project’s sample sizes in its internal evaluation. For example the MVP study described here uses data on 376 children at baseline; Wanjala studies 411 households.
But the reason this study is noteworthy is not that it’s flawless. It’s noteworthy because it is the first time a qualified economist not employed by the project has been able to use microdata collected at the sites to statistically evaluate the project’s impacts. It’s noteworthy because it uses credible methods to carefully consider what would have happened without the intervention, as none of the project’s internal efforts at evaluation have so far done. It’s noteworthy because it is transparent: Wanjala and Muradian show you exactly how they matched treated and untreated households, and provide the matching equations plus matching results; the MVP internal evaluation has never published information in this detail on how it matched its own comparison sites, though it has been collecting data at comparison sites since before Wanjala collected her own data. And it’s noteworthy because it addresses a critically important question—whether or not an antipoverty project reduced poverty—on which the project itself has released many statements but no scientific analysis.
It’s furthermore noteworthy that it was conducted by a Kenyan. That’s not noteworthy for scientific reasons; the scientific validity of Wanjala and Muradian’s analysis stands apart from who wrote it. Rather, that’s noteworthy for other reasons. The project’s leaders have repeatedly criticized my own work with Gabriel Demombynes as “secondhand” analysis by “armchair critics”. Ad hominem criticism like that is an unfortunate attempt to distract readers from the substance of our analysis, whose validity has absolutely nothing to do with who we are. It’s especially ironic given that Gabriel lives and works in Africa. But because Wanjala is Kenyan, they won’t be able to use that unscientific rhetorical tactic to distract people from Wanjala and Muradian’s analysis. So I’ll be curious to see what tactic they do choose.
November 29th, 2011 at 12:43 pm
Agreed on all your points. However, I do think that it would be better (and maybe this will be done since it’s only a working paper) to be explicit about what reductions the study was powered to detect. It is true that the study shows that the income effect predicted by the MVP itself did not happen, which — if the main concern is comparing their promotional materials with independently measured outcomes. But saying ‘this study shows no effect on income’ is less informative than saying ‘this study shows there was not a 10% increase in income’ or something like that; ie, the study would be strengthened if it were more explicit in this regard, as it would head off one line of potential criticism. But this is a very minor point compared to the broader conclusions of the study.
November 29th, 2011 at 1:04 pm
Brett, thanks for pointing me here. The authors are in fact careful to say “Nevertheless, the overall household income effect was insignificant.”
I am a physician/scientist, so maybe my expectations of RCT reporting are bit higher. They could have said that they were able to detect differences larger than 20%, but for what appears to be a unique study in your field this seems a strong paper.
November 29th, 2011 at 3:20 pm
First of all, the study looked at mean income, not poverty rates. These are related, of course, but not the same. Second, the authors used a lack of significance from a low-powered study to draw a conclusion about lack of effectiveness. This is never valid.
Second, as I commented at Marginal Revolution, the total income in control villages have a standard deviation that’s much larger than the mean, which indicates highly skewed data. (Both data sets are probably skewed, but the problem is particularly acute in the control group.) But t-tests are only appropriate for normal distributions. This data isn’t even remotely normal, so t-tests on untransformed data are out. The thing to do is to describe the datasets using medians, not means, and do a bootstrap test. Surely an economist should know about skewed distributions!
Since this is only a working paper, I really hope the authors reanalyze their data before publishing.
November 29th, 2011 at 4:23 pm
I am interested in the data that is missing in the paper, and what that might signify as to the conclusions.
I am pasting part of a comment I made on MRev in the hope of data or a response:
“What this study really shows is a large increase in farm productivity constrained by small farm size AND the failure to develop a market for the surplus, which then gets consumed on the farm. One of my neighbors is a retired Ag Economists, who goes to the god awful places in the world, often contracting diseases. Under AID contracts, he basically teaches teachers on ag improvement. BUT IT DOESN’T stop there: He often talks about how you have to develop the markets for what is produced to be successful and how what is produced may have to change. For example, re recently did some work in South America visiting an area that was very suitable for dairy, and there could have been higher dairy production had certain ag practices changed or been improved. But, he didn’t recommend dairy: instead he recommended goats milk–milk that he found could be transported to the US or condensed in that country, whereas dairy was a losing proposition. So, what they did was work on a milk processing facility that could do both dairy and goats milk! The farmers are now figuring out that goats milk is a winner.
In this case the authors pointed out that that the cereal storage facilities had not been developed, and those that were, were IN THEIR OPINION and without support, were unsuccessful. Go back and parse that short sentence again on p 29 of the report: few created, those that were operational “few” were successful. Now, go back and ask: did the study tell you how many were planned? How many were operational? How long they had been in operation? What were the criteria they had for suceess? And how did they determine there was a lack of trust? Did the paper tell you about “MARKETS”–in other words, linkage to the market between the storage facility and a market? No to all counts”
November 30th, 2011 at 2:21 am
Thank you for this report. It seems highly unethical to carry out research and then refuse to publish the data collected, yet to use purported data for a continuation and promotion of the MVP strategy. Aside from the wisdom of allowing development to be dominated by economists, money is in short supply. Why continue to pour it down the toilet.
What do you think are the biggest errors of the project and how should similar amounts of money be spent, to better effect?
November 30th, 2011 at 12:36 pm
Michael:
The evaluation is using a Propensity Scoring Matching which is not appropriate for Causal Effects, let alone attempting the attribution question. Moreover the use of the Matching based as it is on the canned Psmatch2 is past it sell-by date. PSM is useful for descriptive analysis and thats about it.
The core issue if causal mechanisms in any case, which are processes not a mean effect per se.
Chirp
Ron
So quit cheering.
November 30th, 2011 at 1:18 pm
Thanks for your comment, Jane. The burden of proof lies on the Millennium Villages project to show that its project can accomplish its goals, variously described by the project’s leaders as a 50% decline in poverty rates or a 100% decline in poverty rates. Obviously, if treated and untreated households have indistinguishable incomes, the burden of proof shifts even further onto the project to prove that it can cause African families to “lift themselves out of poverty in five years time”, which is what it promised to its funders.
On your second point, yes the data used by Wanjala and Muradian are noisy, and one study does not prove zero impact. The failure to detect any meaningful income effect in a project that claims massive and lasting income effects is notable, but of course it requires replication with larger samples and at other sites. I agree, and that’s why I wrote that Wanjala and Muradian’s findings call into question the project’s short-term impacts, not that they prove one way or the other what those impacts are. More research is needed. Unfortunately the project, which is collecting income data at the sites, and has released copious data over the last two years about changes in non-income indicators, has not released any analysis of changes in income. (Why not?)
Note also that the MVP itself does not hold itself to the excellent standard of evidence you suggest. For example, when they tested for changes in three different indicators of child malnutrition across nine of their sites, in the paper we describe here, they only found a statistically significant decline in one of the three indicators. In the other two indicators, standard errors are huge and the indicator increased. But you won’t find any mention of that in the paper’s abstract, nor in the description of the results contained in the project’s fundraising reports. Those materials describe large, quantitatively precise effects of the project on child malnutrition. And that article is already published in a scientific journal.
November 30th, 2011 at 1:22 pm
Bill, thanks for this comment. I agree that it’s of critical importance how many interventions were attempted versus how many were successful. It’s notable that the MVP, in its evaluation reports, releases data on some of the sites and not other sites that are identified in its evaluation protocol. Is the reason that things are not going as well at those other sites? Or some other reason? Hard to tell, because the evaluation is strictly internal and all data confidential.
November 30th, 2011 at 1:39 pm
Simon, thanks for your comment. I don’t think that the project’s refusal to share any of its data, even those that form the basis of published journal articles, reflects an ethical failing. It certainly prevents independent and objective analysis of the project’s impacts, which can lead to costly errors in the allocation of resources, but I don’t now believe that any of the project’s leaders are unethical researchers. I do think that by evaluating this costly project’s impacts in ways that are not rigorous, and impeding independent analysis, they are wasting a massive opportunity for the development community to learn what does and does not work in development policy.
I want to clarify that I don’t know whether the money spent on the Millennium Villages Project was wasted or not. No one can know that without careful and independent impact evaluation. I do have skeptical prior beliefs about the effectiveness of such money, given that many billions of dollars have been spent on similar village-level package interventions over several decades, with no lasting effects, as we describe here and less detail here. But that per se is not a reason not to try an intervention; sometimes it’s a reason to try an intervention and then carefully evaluate its impacts and cost-effectiveness. It’s the latter part on which the MVP is falling short. A careful and scientific impact evaluation may yet find that the MVP is a wonderfully effective and inexpensive way to eliminate poverty, if the project chooses to go in a more scientific direction at a later date. The problem is that the project has already asserted that it can and does end poverty, and asserted that such claims are backed by “peer-reviewed science”, when this is not the case.
So what would a better use be for the tens of millions that have gone to the MVP? Here I subscribe to Tim Harford’s view in his sharp book Adapt (which also discusses the Millennium Villages). A wide variety of things should be tried, and rigorously evaluated with regard to their impact and cost-effectiveness. Many completely different interventions are happening in the same countries, such as John Githongo’s Ni Sisi project in Kenya and Rakesh Rajani’s Twaweza in Tanzania. The leaders of both of those organizations are interested in rigorous impact evaluation. The only way forward is to try many different things, from a humble standpoint of profound ignorance about the causes of poverty, and evaluate our efforts at each step. One of the best descriptions of this approach is to be found in Poor Economics.
November 30th, 2011 at 2:41 pm
Michael,
Burden of proof has nothing to do with my comment. It’s simply about a statistical analysis that contains mistakes any Stats 101 undergrad should be able to spot. If the authors describe their data using the median instead of the mean and do a bootstrap test, they’ll be fine. (They may also compare, say, the 10th percentile of each data set to see how the intervention affects the poorest community members.) This can easily be done by the authors with the data they already have. Heck, I’m willing to help them!
While I like the idea of the MVP, I’m an ecologist, not an economist or policy wonk. I have no dog in this fight. But the stakes are too high for too many people to allow conclusions to be drawn from inappropriate statistical tests.
November 30th, 2011 at 2:41 pm
Thank you Michael for your reply and thank you also for the reading list. I can follow up the online ones. Perhaps ‘unethical’ is the wrong word but I thought failing to provide data that would allow an independent evaluation and making claims without supplying supporting data when these claims could be used to raise funding for this and similar projects would be unethical. I wouldn’t wish to claim that increasing productivity leads to increased income when this doesn’t appear to be the case. Yet many funding proposals do make this claim, I’ve made it myself. But I don’t wish to do so in the future. I read the draft and the authors do supply some good suggestions so overall, this has been a very interesting and informative exchange. Like yourself, I am also skeptical of such projects as MVP. I find claims made about it so far to be highly questionable and wonder how much money is being spent on publicity, damage limitation and sheer vanity. Perhaps that’s not unethical either, but something to be avoided.
November 30th, 2011 at 5:26 pm
Thanks Jane. Every statistical test tests a null hypothesis; the placement of that null hypothesis is therefore an intimate part of the test. If the null hypothesis were a 2% difference in poverty rates between treated and untreated, then the large standard errors in this study should not shift our priors much about the parameters in question. But since the relevant null hypothesis is a large and dramatic difference in income poverty between the two groups–the explicit claim of the project, from day one–even parameters with large standard errors should shift our priors somewhat. Speaking of hypothesis testing without reference to a burden of proof, that is, what is required to shift a reader’s prior beliefs, does not make any sense to me.
As I wrote in my earlier response, if you apply the same rigorous standards you advocate here to the research produced by the MVP’s internal evaluation, you will find much more egregious statistical weaknesses in their analysis. Since that analysis, unlike Wanjala and Muradian’s study, is already published in scientific journals, it deserves your scrutiny even more. I urge you to check it out, and I’m puzzled why so many people are ready to jump on the smallest weakness of Wanjala and Muradian’s work while I see no similar scrutiny of the analysis that the MVP uses to make much stronger claims of ‘impact’ for itself.
November 30th, 2011 at 5:48 pm
Ron, your anonymous smear that I’m \cheering\ is inappropriate and incorrect. I’m profoundly sad that so many millions of dollars have been spent on an antipoverty project that may or may not have increased anyone’s average income, and I’ve never expressed any happiness about it. I suggest that you spend more time reflecting on the profound sadness of this situation, and less time sending anonymous insults to people whose thoughts and feelings you know nothing about.
Propensity score matching is not as rigorous a measure of impact as a randomized controlled trial, a fact of which I’m well aware, which is why we recommend randomized evaluation in our paper. Propensity score matching is, however, vastly more rigorous than the impact evaluation methods that the project itself has used in several \scientific\ publications, which are so far limited to before-and-after analysis without a meaningful comparison group (such as the just-published paper we describe here), and comparison of treated and untreated farm plots without any matching method at all, such as in the fertilizer study I discuss in the post. The project has systematically negated and denied that any better evaluation methods are needed or even desirable. Wanjala and Muradian took things into their own hands, and did better. I do cheer that.
November 30th, 2011 at 5:53 pm
“I’m puzzled why so many people are ready to jump on the smallest weakness of Wanjala and Muradian’s work while I see no similar scrutiny of the analysis that the MVP uses to make much stronger claims of ‘impact’ for itself.”
This is exactly what I have been wondering as I see the replies stack up in my inbox. It feels like people have just completely missed the point. At no point in time did anyone say, “and now we have definitive proof that the villages are a total waste of time and resources, Sachs is a hack.”
November 30th, 2011 at 6:10 pm
Matt, thanks very much for your reply. Wanjala and Muradian’s study obviously does not prove beyond any doubt that the project had zero poverty impact at any of its sites. It is also not a huge, expensive, gold-plated randomized trial, and parts of its method are debatable, as in any impact evaluation study. But it is carefully done, and gives a more careful consideration to what would’ve happened without the project than any evaluation report released so far by the project’s internal evaluation.
It’s notable for that reason, and several others as well. It’s the first quantitative information we have about incomes at the sites (since the project, weirdly, has released none). It’s the first independent impact evaluation using primary survey data. And it’s notable because it finds no difference in incomes between treated and untreated households, a counterinuitive result. None of those things make it perfect or definitive.
But because the project’s internal evaluation has produced nothing like Wanjala and Muradian’s paper so far, W&M’s work highlights the important role that independent evaluation can play in all development work. That is the principal lesson of the paper at this point. We may learn later that the paper’s findings are held up by subsequent analysis, in which case there’ll be more lessons. But that remains to be seen.
December 1st, 2011 at 3:09 am
Michael, thanks for your reply. While I am strongly in favor of Bayesian statistics, the Wanjala and Muradian paper is entirely frequentist and I am evaluating it on frequentist terms. Mixing Bayesian and frequentist approaches is, I believe, likely to lead to confusion, so let’s stick to the latter for now.
I’m going to put aside the question about low power and try once again to explain the narrow but crucial technical point I was making about t-tests. The household incomes in both the MVP and control villages had very large standard deviations; in the case of the control villages, the standard deviation was several times larger than the mean. This points to a highly skewed income distribution. (http://www.cochrane-net.org/op.....odA1-5.htm) This is not surprising, as income distributions are often skewed and have outliers.
Wanjala and Muradian use t-tests to test for significance. However, t-tests ONLY work for normally or almost normally distributed data. (http://www.basic.northwestern......_viol.html) The data in question is not even remotely normal, so the p-value generated by the t-test is essentially meaningless.
There are several approaches to dealing with such data. Many researchers use transformations, but this makes interpretation difficult. There are nonparametric tests, but the ones described at the link above assume that the groups being compared have equal variances. Finally, you can do a bootstrap comparison, which has the advantage of letting you use the median of your data instead of the mean. (http://people.revoledu.com/kar.....amples.htm) The mean is much more sensitive to outliers than the median, which is why the median is widely recommended for describing skewed data.
The authors of this paper need to use one of these options, preferably the bootstrap, to have meaningful results. The power of the test would also increase.
Of course, I agree that the papers coming out of the MVP project need to be examined just as critically. I looked at the child stunting paper and didn’t see any critical stats errors, although it would have been useful to separate rural and urban populations in the nationwide data they use for comparison. (If there are any statisticians here, please feel free to correct me.) They even mention power! Are there any others I should look at?
December 1st, 2011 at 5:53 am
Michael:
The bias from PSM is really worse than its putative cure for selection effects. A particular attractive feature of both RCTs and its closest counterpart RDDs is that the rules governing who gets into the program and who doesn’t is actually set out in the design. Effectiveness of the interventions can then be explored by examining i) eligibility criteria was actually honored int he actual program ii) examining using both quantitative and qualitative instruments and techniques how the intervention did in fact affect outcomes. Not just if things work, but why things work at all (or as a developing country based national, who actually shows up to work and what have they got to show for it).
To give the authors credit they actual posit some channels by which the intervention might affect outcomes (page 5). However the PSM, specfically the construction of the PSM speaks to none of that since we do not know the selection rules into the program. Hence the selection has to be simulated. This in turn has led to a great deal of rough and ready rules about how to carry out the first step, the selection into the program. As the authors observe (pg. 16) there is considerable uncertainty, let alone best practices to guide this first step. The authors use a logit but do not discuss why their selection equation in terms of its functional form was appropriate. It would have been useful at least to discuss a range of possible criterion such as maximiziing log likelihoods or minimizing AIC. Alternatively the authors could have discussed discrimination and calibration techniques to examine if their selection equation is a good predictor of those in and those out of the program (model validation techniques for discrete modeling).
Assuming CIA and SUTVA holds (not testable by definition) from a programming perspective if do we obtain a ATET based on the estimated PSM, what does that have to do with actual program implementation. Restricting comparisons to the common support to obtaining net impact has no literal meaning for program staff in making course corrections. As such if the authors did do a qualitative investigation among treatment and comparison groups the information gathered on program design would neither validate or invalidate or explain the quantitative results. Takeway if the statistical model for quantitative analysis is not based on what is actually known about the social program’s implementation i.e. assignment into treatment for allocating groups into treated and non-treated groups, not much for the program staff to do except gawp at the applied hacks!
Ron
PS:
Restricting themselves to STATA and canned estimations the authors should at the very least present some robustness checks such downloadable tools such as SENSATT or MBOUNDS to check to see if their results would hold in the possible presence of unobserved heterogeneity.
The state of the art is to use R, and genetic matching.
Better yet to talk with program staff to understand their targetting criteria for the actual treatment (Methinks a fuzzy RDD would be the appropriate design) villages and carry out a less glamorous and more useful process evaluation. By the way one point of good practice by the MVP folks, publishing their protocol for their prospective evaluation. A registry of prospective and on-going evaluations with full protocols would be an important public good.
December 1st, 2011 at 10:23 am
Ron, I have no idea why you’re telling me these things. I’m an evaluation expert and I am well aware that RCTs and RDDs provide more robust basis for causal inference than propensity score matching. The Wanjala and Muradian study is not notable because its method is absolutely flawless. It is notable, as I’ve now said several times, because its method provides a superior method of causal inference to anything used so far in a study released by the MVP.
Wanjala and Muradian take seriously the need for a transparent counterfactual; no MVP study published so far does that. In fact, the MVP has bent over backwards to say that no counterfactual is needed or even possible. I cannot imagine why you’re reserving all of your technical competent criticism for this Wanjala and Muradian study, which is a big improvement on anything the MVP has done yet.
December 1st, 2011 at 10:48 am
Thanks for your reply, Jane. Null hypothesis choice is not just a part of statistical method in Bayesian statistics, it is a part of statistical method in all statistics. Every frequentist test tests a hypothesis that was chosen somehow, and that choice reflects people’s opinion of where the burden of proof lies. There is no exception to this. In the case at hand, the project has claimed vast economic effects, and the burden of proof lies on them to support those assertions, and therefore vast effects should be the null hypothesis. If the goal is to test claims of vast effects, it’s irrelevant to run tests where the null hypothesis is zero effect.
In this this blog post at the World Bank’s Africa blog, Gabriel and I explain a few of the profound errors in the child stunting paper. The most glaring is that they choose their “comparison group” as nationwide trends between 1986 and 2006 (Figure 3), a long period when—on average over the whole period—child stunting was not going down nationwide, and compare that to trends at the village sites between 2005/6 and 2008/9, a period when child stunting was going down nationwide in all but one of these countries. That’s a completely inexplicable decision, since nationwide data during the project were available, and they authors of the paper knew that they were available.
Note that their decision to use a comparison group during an irrelevant time period substantially exaggerates the effect of the project. This is great for fundraising purposes, but reflects poor scientific practice, and that article is published in a peer-reviewed scientific journal. It is much more egregious a decision than anything done by Wanjala and Muradian, who are independent analysts with no incentive whatsoever to make the project look good or bad. Wanjala and Muradian are engaged in an independent and technically competent exercise that does the best they could with the available data. Their findings are not final and definitive, but they are noteworthy and worrisome.
There are numerous other problems with that child stunting study, some of which we mention in that blog post.
December 1st, 2011 at 3:54 pm
Of course, the choice of a null hypothesis is crucial in hypothesis testing, but the paper used a zero effect null hypothesis. They just tested whether the effect was different from zero. In this situation, the MVP’s goals and claims are clearly irrelevant. And any null hypothesis needs to be tested with appropriate methods! I’m sure Wanjala and Muradian mean well, but their study is NOT “technically competent”. If you don’t believe me, look up t-tests in any introductory statistics textbook — or Wikipedia.
You make a good point about the child stunting paper; thanks.
December 6th, 2011 at 6:25 pm
Aha … I see a flaw in the study: Wanjala interviewed randomly selected households instead of ones that MVP picked.
January 13th, 2012 at 2:50 pm
Well the point raised cannot be over emphasized the same observations are more pronounced in Malawi this is the reason why Government failed to partner the project. There are more resources going into the project than results. Alot of waste more funds going to salaries than actual implementation. I wonder why the results are not being made public to comment on. Jeff sachs isn’t bothered and why his vocal belay is never on the picture. Is this a way we help the poor or away to wate on the poor