What Do Policies Value?
What this paper finds — and why it matters
This paper asks a fundamental question about policy design: when a program prioritizes one group over another, is that because the group benefits more from the intervention, or because the policy assigns them higher intrinsic welfare weight? Björkegren, Blumenstock, and Knight develop a two-stage method to decompose observed allocation decisions into their underlying components: (i) welfare weights assigned to different types of people, (ii) heterogeneous treatment effects of the intervention, and (iii) relative weights on different outcomes. The key insight is that the same allocation rule can be consistent with very different value systems depending on how much each group actually benefits.
The method works as follows. In a first stage, the analyst estimates heterogeneous treatment effects — how much each individual benefits on each outcome dimension — using OLS or machine learning methods (e.g., causal forests). In a second stage, the analyst reconciles the observed ranking of beneficiaries with an implicit welfare function using an exploded logit likelihood, recovering welfare weights (who is valued), impact weights (how different outcomes are valued), and a base value for treatment independent of measured outcomes. Identification requires an exclusion restriction: the covariates used to estimate treatment effect heterogeneity must include variables excluded from the welfare weight specification, allowing the analyst to compare households with similar welfare weights but differential treatment effects. Variants of the method that impose known welfare weights or known impact weights can be used without the exclusion restriction.
The paper demonstrates the method using PROGRESA, Mexico’s large conditional cash transfer program launched in 1997. PROGRESA ranked households by a proxy means test poverty score and transferred approximately 197 pesos per month (roughly $20 USD) to eligible poor households, conditional on school attendance and doctor visits. The analysis uses endline survey data on 7,767 households and focuses on three outcomes emphasized in program documents: log per-capita consumption, child sick days (ages 0-5), and school days missed (ages 6-16).
The program’s average treatment effects were: a 0.149 log point increase in monthly consumption (SE=0.015), a 0.165 reduction in sick days per child (SE=0.051), and a near-zero effect on school days missed (-0.0053, SE=0.028). These effects were heterogeneous: indigenous households, for instance, benefited substantially more from the program.
The paper’s central empirical finding inverts the naive interpretation of PROGRESA’s targeting. Indigenous households were ranked 60.6 log points higher in the program’s priority order. A simple regression suggests the program favored them. But after accounting for the fact that indigenous households benefit substantially more from treatment, the method finds that the program’s implied welfare weight on indigenous households is, if anything, lower by 17.4% relative to non-indigenous households — not higher. The program’s prioritization of indigenous households is thus explained by efficiency, not by preferential welfare weighting.
Because PROGRESA cash transfers relax household budget constraints and outcomes like consumption reflect household choices, the impact weights capture the difference between how the policy values outcomes and how households value them. The estimates strongly reject non-paternalism: the policy implicitly values consumption and potentially health differently from household decision-makers. Of the total welfare impact, approximately 55% is attributed to the base value of the transfer itself (independent of measured outcomes), approximately 45% to consumption impacts, and less than 1% to health and schooling impacts combined. The implied value of providing the transfer independent of outcomes corresponds to 0.16 log points of consumption, or about 23.1 pesos per person per month — slightly below the average transfer of 33.9 pesos per person per month.
The paper also runs counterfactual exercises showing how alternative preference structures would have changed the allocation. A policy maximizing only educational impacts would have prioritized richer, smaller households; one maximizing only consumption impacts would have further prioritized indigenous households. These counterfactuals are mapped onto a Pareto frontier across the three outcomes. The estimated welfare weights from the implemented policy align closely with preferences elicited in a 2023 survey of 429 Mexican residents, though residents placed higher value on child health relative to what the policy implied.
Q: What is the core identification challenge the paper addresses? A: When a policy prioritizes a group, it could be because the group benefits more (efficiency) or because the policy assigns them intrinsically higher value (preference). These two explanations are observationally equivalent from the allocation alone. The paper separates them by first estimating heterogeneous treatment effects and then inverting the allocation to recover residual welfare weights.
Q: What is the exclusion restriction required for full identification? A: The covariates used to estimate treatment effect heterogeneity (x-tilde) must include at least some variables excluded from the welfare weight specification (x). This allows the analyst to compare households with similar welfare weights but different predicted treatment effects, pinning down how much of the ranking reflects efficiency versus preference. Without this restriction, one can still recover conditional preferences by imposing known values for either welfare weights or impact weights.
Q: How does the exploded logit likelihood work in this setting? A: The analyst observes a single full ranking of all households, rather than partial orderings from multiple decision-makers. The welfare impact of treating household i is modeled as a linear function of predicted treatment effects scaled by welfare and impact weights, plus an extreme-value-distributed shock. The likelihood of observing household i ranked above household i-prime is the ratio of their exponentiated welfare scores, summed over all households ranked below i. Maximum likelihood recovers the welfare weights, impact weights, and base value simultaneously.
Q: What were PROGRESA’s average treatment effects on the three focal outcomes? A: Average treatment increased log monthly consumption by 0.149 (SE=0.015), reduced child sick days by 0.165 (SE=0.051), and had a near-zero effect on school days missed (-0.0053, SE=0.028). The consumption and health effects are statistically significant; the schooling effect is not distinguishable from zero.
Q: What does the analysis find about the welfare weight assigned to indigenous households? A: In the raw ranking regression, indigenous households are ranked 60.6 log points higher, suggesting the program favored them. After accounting for the fact that indigenous households benefit substantially more from treatment, the method finds the implied welfare weight on indigenous households is lower, not higher — specifically, about 17.4% lower than non-indigenous households. The program’s higher ranking of indigenous households is explained entirely by their larger treatment effects, not by preferential weighting.
Q: How are the impact weights on consumption, health, and schooling interpreted given that outcomes reflect household choices? A: Because PROGRESA relaxes household budget constraints and outcomes like consumption result from household optimization, the estimated impact weights capture the difference between how the policy values outcomes relative to how households value them (internalities), rather than the absolute policy valuation. A nonzero weight implies the policy disagrees with household preferences — paternalism. The positive coefficient on log consumption implies the policy values this outcome more than households do.
Q: How much of PROGRESA’s welfare impact comes from the base transfer value versus measured outcomes? A: The base value of the transfer (independent of measured impacts on consumption, health, and schooling) accounts for approximately 55% of total implied welfare impact. The impact on consumption accounts for approximately 45%. Impacts on health and schooling together account for less than 1%. The implied value of the base transfer corresponds to 0.16 log points of consumption per capita, or about 23.1 pesos per person per month — somewhat below the average transfer amount of 33.9 pesos per person per month.
Q: Does the analysis reject egalitarian welfare weights and non-paternalism? A: Yes, using Wald tests with bootstrapped covariance matrices. The hypothesis of egalitarian weights (all gamma equal to one) is rejected. Non-paternalism (all beta equal to zero) is strongly rejected. The joint hypothesis of egalitarianism and non-paternalism is also rejected across all specifications tested.
Q: How do the estimated welfare weights compare to stated preferences of Mexican residents? A: The 2023 survey of 429 Mexican residents elicited preferences using multiple price lists over how to prioritize different household types. The welfare weights implied by the implemented policy are broadly similar to resident preferences, but the policy places relatively higher welfare weight on indigenous households than the median survey respondent does. Survey respondents value child health impacts more than household decision-makers and more than the implemented policy does, consistent with support for paternalism.
Q: What do counterfactual allocations reveal about the relationship between policy goals and targeting priorities? A: A policy maximizing only consumption impacts would further prioritize indigenous households with lower income. A policy maximizing only educational impacts would instead prioritize richer, smaller households. A policy maximizing only health impacts would largely preserve indigenous household prioritization while placing less emphasis on lower-education households. These three extreme policies map to the corners of a Pareto frontier, and the implemented PROGRESA policy lies close to the allocation consistent with surveyed resident preferences.
Q: What changed when Mexico reformed PROGRESA’s poverty score in 2003? A: The 2003 reform increased the priority of older and smaller households. Applying the method to the new poverty score reveals that it implicitly switched to assigning a positive welfare weight to indigenous households (compared to the negative implied weight under the original score), and placed less welfare weight on lower-income and younger households relative to the original design.
Q: What are the main limitations and scope conditions of the method? A: Full identification requires an exclusion restriction (some treatment effect heterogeneity predictors excluded from welfare weights) and sufficient variation in treatment effects across household types. If treatment effects are homogeneous, welfare weights and impact weights cannot be separately identified. If correlated unobservables drive the ranking but are not modeled, the method recovers preferences consistent with included variables only, analogous to omitted variable bias in OLS. The method also requires a way to estimate treatment effect heterogeneity, which is most credible with a randomized pilot, though non-experimental methods are in principle applicable.
Q: How does this paper relate to the inverse optimum public finance literature? A: The inverse optimum literature (Bourguignon and Spadaro 2012; Saez and Stantcheva 2016; Hendren 2020) recovers the redistribution preferences consistent with income tax schedules, conditioning on a single covariate (pre-tax income) affecting a single outcome (net-of-tax consumption). This paper generalizes that framework to arbitrary allocation policies conditioning on a vector of covariates and affecting a vector of outcomes, and extends it to settings beyond income taxation where heterogeneous treatment effects can be estimated.
Q: Can the method be applied when only a binary allocation is observed rather than a full ranking? A: Yes. A binary allocation corresponds to a ranking with only two levels, and the same exploded logit procedure applies, though with reduced statistical power. The paper provides an empirical illustration of this setting in Section 5.2.1.
Welfare weights (w(x_i)): The policy’s differential valuation of one household’s utility relative to another, expressed as a multiplicative function of household characteristics. Distinct from how much a household benefits — two households may be ranked identically despite different benefits if their welfare weights differ proportionally.
Impact weights (beta_j): The policy’s relative valuation of different outcome components (consumption, health, schooling). For outcomes that are household choices, impact weights capture the difference between how the policy values the outcome and how the household values it — an internality or paternalistic preference.
Base value (alpha): The value a policy assigns to providing a treatment independent of its measured impact on any specific outcome. Captures either a direct utility benefit of treatment or the value of relaxing household budget constraints when outcomes are choices.
Exclusion restriction: The requirement that the set of covariates used to estimate treatment effect heterogeneity includes at least some variables excluded from the welfare weight specification. Enables separate identification of efficiency-based and preference-based components of a ranking by comparing households similar in welfare weight but different in predicted treatment effects.
Exploded logit likelihood: The econometric procedure used in the second stage, adapted for a single complete ranking of all alternatives rather than partial orderings. Treats the observed ranking of household i as a choice from the set of all households ranked below it, with likelihood given by the softmax of welfare scores.
Value audit: A retrospective application of the method that reads the implicit values encoded in an implemented policy’s allocation decisions, enabling comparison against stated policy objectives, constituent preferences, or normative benchmarks.
Paternalism (in this paper’s sense): A policy is paternalistic if it assigns nonzero impact weight (beta_j ≠ 0) to outcomes that are household choices — meaning the policy values those outcomes differently from the households making the choices. The envelope theorem implies a non-paternalistic policy would place zero weight on choice outcomes beyond the general constraint relaxation.