De Gustibus and Disputes about Reference Dependence
What this paper finds — and why it matters
This paper examines whether heterogeneity in individual gain-loss attitudes — the degree to which people weigh losses more or less severely than equivalent gains — contaminates prior tests of expectations-based reference dependence (EBRD). The central question is: do prior experiments that appear to yield mixed or null evidence against EBRD actually reflect a failure of the expectations-based reference point, or instead reflect a methodological flaw — the implicit assumption that all individuals are uniformly loss averse?
All prior tests of EBRD models (e.g., Kőszegi and Rabin 2006, 2007) have proceeded under what the authors call “universal loss aversion,” the assumption that every individual weighs losses more heavily than commensurate gains (λ > 1). The authors argue that this assumption — a form of the classic De Gustibus conjecture — is empirically incorrect and theoretically distorting: within EBRD designs, loss-averse and gain-seeking subjects are predicted to respond in opposite directions to expectations manipulations, so aggregating across them suppresses or reverses treatment effects.
The authors run two pre-registered laboratory experiments totaling 1,524 subjects. The labor supply experiment (N = 500, UC San Diego) uses a two-stage design. Stage 1 elicits each subject’s gain-loss attitude parameter λ_i from their effort responses to fixed versus uncertain piece rates in a real-effort transcription task, exploiting the prediction that loss-averse workers reduce effort under wage uncertainty while gain-seeking workers increase it. Stage 2 manipulates expectations by varying the probability of a high outside payment (p = 0.05 in Condition Low vs. p = 0.45 in Condition High), holding the piece-rate probability constant at 50%; under EBRD, this shifts the reference point and should change effort in a direction governed by λ_i.
The exchange experiment (N = 1,024, University of Bonn, with a pre-registered 2018 replication of N = 417) uses Stage 1 preference statements over randomly endowed objects to estimate λ_i, and Stage 2 manipulates expectations via a 0% vs. 50% probability of forced exchange. Under EBRD, loss-averse subjects should become more willing to exchange in the High condition; gain-seeking subjects should become less willing.
Both experiments document substantial heterogeneity in gain-loss attitudes. In the labor supply study, approximately 70.6% of subjects exhibit loss aversion (λ̂ > 1) and 29.4% exhibit gain-seeking (λ̂ < 1), with an average structural estimate of λ̂ = 1.65 and median 1.66. In the exchange study, 76% are loss averse and 24% are gain-seeking, with mean λ̂ = 1.49 and median 1.34. Lottery-based elicitation in the labor supply experiment yields 28% gain-seeking, consistent with prior literature estimates of roughly 22% gain-seeking from Chapman et al. (2018).
Crucially, Stage 1 gain-loss attitudes are strongly predictive of Stage 2 treatment effects in both experiments. In the labor supply study, the aggregate treatment effect of approximately 26% greater effort in Condition High — reproducing Abeler et al. (2011) — masks strongly heterogeneous responses: higher λ̂ predicts larger positive treatment effects (raw correlation ρ = 0.18, p < 0.01), and controlling for heterogeneous gain-loss attitudes raises R² by more than a factor of 10. In the exchange study, the aggregate treatment effect is precisely zero (coefficient = 0.00, clustered s.e. = 0.03), a result that prior literature would interpret as contradicting EBRD; but once gain-loss heterogeneity is accounted for, treatment effects are strongly positive for loss-averse subjects and negative for gain-seeking subjects, again raising R² by more than a factor of 10.
Gain-seeking subjects exhibit negative treatment effects in the exchange study, consistent with EBRD predictions, but in the labor supply study the average treatment effect for gain-seeking subjects remains slightly positive, representing a partial deviation from the model’s quantitative predictions. The authors interpret this as evidence that expectations-based reference points are an important but likely incomplete determinant of behavior, with attention-based, status-quo-based, or anchoring-based reference points potentially playing supplementary roles.
Q: What is the central methodological problem with prior tests of expectations-based reference dependence?
A: All prior tests assumed universal loss aversion — that every individual has λ > 1, i.e., weighs losses more severely than equivalent gains. The authors show this is both empirically wrong (roughly 24–29% of subjects are gain-seeking across both studies) and theoretically distorting: within EBRD designs, gain-seeking individuals are predicted to respond in the opposite direction from loss-averse individuals, so averaging across heterogeneous types can suppress, zero out, or even reverse the true treatment effect. This makes standard aggregate tests of EBRD unreliable.
Q: How do the authors measure gain-loss attitudes in the labor supply experiment?
A: In Stage 1, subjects make 30 effort decisions across fixed piece rates and uncertain piece rates with the same mean. Under the Kőszegi-Rabin CPE model, a loss-averse individual reduces effort when the wage is uncertain (because outcomes can fall below the reference point), while a gain-seeking individual increases effort under uncertainty. The authors estimate individual-level parameters by regressing log(e_i + 10) on log(w) and Δw/w in a random-coefficients framework; the coefficient l̂_i on Δw/w is the reduced-form measure of gain-loss attitudes, with λ̂_i = 1 + 4·(l̂_i/ĝ_i) as the structural estimate. The correlation between the two measures is ρ = 0.85 (p < 0.01).
Q: How do the authors measure gain-loss attitudes in the exchange experiment?
A: In Stage 1, subjects are randomly endowed with one of two objects and provide three unincentivized preference statements (relative liking, relative wanting, and hypothetical choice) before any possibility of exchange is introduced. Under CPE, an individual endowed with object X will prefer X to the extent that (1 + λ_i) − 2(Y/X) > 0, so subjects with higher λ_i should more strongly favor their endowment. A principal components analysis reduces the three statements to one factor (capturing ~70% of variation), and residuals from regressing that factor on object assignment constitute the reduced-form measure l̂_i. The structural estimate λ̂_i is obtained via a mixed logit using a log-normal distribution for λ_i; the reduced form and structural measures are correlated at r = 0.95 (p < 0.01).
Q: What does the distribution of gain-loss attitudes look like across the two experiments?
A: In the labor supply experiment (N = 453 estimable subjects), 70.6% are loss averse and 29.4% are gain-seeking, with mean λ̂ = 1.65 and median λ̂ = 1.66. In the exchange experiment (N = 1,024), 76% are loss averse and 24% are gain-seeking, with mean λ̂ = 1.49 and median λ̂ = 1.34. A separate lottery-based elicitation in the labor supply study finds 28% gain-seeking subjects. These proportions are consistent with the weighted average of 22% gain-seeking found by Chapman et al. (2018) across seven prior lottery-choice studies.
Q: What is the aggregate treatment effect in the labor supply experiment, and what does it look like once heterogeneity is accounted for?
A: Without accounting for gain-loss heterogeneity, Condition High is associated with roughly a 26% increase in effort relative to Condition Low (individual-clustered s.e. = 0.03, p < 0.01), reproducing the Abeler et al. (2011) result and consistent with EBRD under universal loss aversion. However, R² = 0.03. Once interactions of Condition High with l̂_i and λ̂_i are included, R² rises to 0.40 and 0.39 respectively — more than a tenfold increase. Higher λ̂_i predicts larger positive treatment effects (raw correlation ρ = 0.18, p < 0.01), and the interaction of Condition High with λ̂_i is highly significant (F(1,452) = 49.14, p < 0.01).
Q: What is the aggregate treatment effect in the exchange experiment, and what does it look like once heterogeneity is accounted for?
A: Without heterogeneity, the treatment effect of Condition High on the probability of exchanging is precisely 0.00 (clustered s.e. = 0.03), which prior literature would read as a failure of EBRD. Once heterogeneity is introduced via interactions with l̂_i and λ̂_i, the pattern changes markedly: loss-averse subjects show positive treatment effects (greater willingness to exchange in High), while gain-seeking subjects show negative treatment effects (less willingness to exchange in High), consistent with Predictions 4–6. R² again rises by more than a factor of 10. In Condition Low, 38% of subjects exchange, reflecting a significant endowment effect (F(1,1022) = 25.66, p < 0.01).
Q: Why does the aggregate treatment effect in the exchange experiment equal zero?
A: The authors show in Appendix B.4 that the relationship between λ_i and exchange probability treatment effects can be concave — negative effects for gain-seeking subjects can be of greater absolute magnitude than positive effects for loss-averse subjects. With roughly 24% gain-seeking and 76% loss-averse subjects, aggregation can yield a near-zero average even when heterogeneous effects are substantial and directionally consistent with EBRD. This aggregation problem, not a failure of the expectations-based reference point mechanism, explains the null aggregate result.
Q: Do gain-loss attitudes measured in one domain predict behavior in another domain?
A: The lottery-based measure of gain-loss attitudes (from Multiple Price Lists administered after the real-effort task in the labor supply experiment) has mean λ̂ = 1.48 and median 1.42, with 28% gain-seeking subjects — proportions similar to the labor supply estimates. However, the correlation between the lottery-based and labor-supply-based structural estimates of λ̂ is only Pearson’s r = 0.091 (p = 0.03) and Spearman’s ρ = 0.084 (p = 0.075). Furthermore, the lottery measure has no predictive power for Stage 2 treatment effects. This suggests that while the prevalence of gain-seeking is similar across domains, gain-loss attitudes at the individual level are more domain-specific than prior work has appreciated.
Q: How do the authors address the “generated regressor problem” when using estimated λ̂_i as a regressor?
A: Since λ̂_i is itself estimated from Stage 1 data, using it directly as a regressor in Stage 2 regressions treats imprecise preference estimates as ideal data, which can distort inference (the Murphy-Topel problem). The authors address this by bootstrapping the entire pipeline — re-estimating gain-loss attitudes from Stage 1 in each of 500 bootstrap iterations and re-running the Stage 2 regressions — then reporting the average bootstrap coefficient and its standard deviation. The bootstrapped conclusions are qualitatively identical to the original regression results in both experiments.
Q: What limitations do the authors acknowledge in the EBRD model’s fit?
A: Even after accounting for heterogeneity, the EBRD model does not provide a complete quantitative account of behavior. In the labor supply experiment, gain-seeking subjects exhibit slightly positive average treatment effects (not negative as predicted), and loss-averse subjects’ empirical treatment effects fall short of theoretical predictions, despite a significant correlation between predicted and empirical treatment effects (ρ = 0.25, p < 0.01). The authors attribute these deviations to potential measurement error (which would attenuate estimated relationships), and to the possibility that reference points have multiple determinants — including status quo-based, attention-based, and anchoring-based factors — beyond expectations alone.
Q: What are the broader implications for other applications of gain-loss attitudes?
A: The paper’s findings have implications for any application that relies on universal loss aversion as a maintained assumption, including Rabin’s (2000) calibration argument for risk aversion at small and large stakes, insurance demand for small losses (Slovic et al., 1977), and preferences for bunched resolution of uncertainty (Kőszegi and Rabin, 2009). Admitting heterogeneity in gain-loss attitudes will require more nuanced predictions in each of these settings. The paper provides a methodology — measuring individual-level gain-loss attitudes within the experimental context of interest — for investigating and controlling for such heterogeneity.
Q: What design features prevent confounds between Stage 1 measurement and Stage 2 treatment in the exchange experiment?
A: Stage 1 uses a different pair of objects (USB stick and pens) than Stage 2 (picnic mat and thermos), or vice versa — each subject encounters each pair exactly once, with counterbalancing at the session level. Stage 1 preference statements are unincentivized and made before any possibility of exchange is introduced, so they do not contaminate the Stage 2 expectations manipulation. The random reassignment of objects at the end of Stage 1 generates exogenous variation in endowments, preventing mechanical confounds. The authors also verify that interpreting Stage 1 variation as reflecting heterogeneity in object valuations (rather than gain-loss attitudes) would predict zero heterogeneous treatment effects in Stage 2 — a prediction rejected by the data.
Expectations-Based Reference Dependence (EBRD): The formulation, due to Kőszegi and Rabin (2006, 2007), in which an individual’s reference point is the entire distribution of outcomes they rationally expected, rather than a fixed status quo. Behavior is governed by a Choice-Acclimating Personal Equilibrium (CPE) in which the chosen action is optimal given that the expectation of that action serves as the reference.
Gain-Loss Attitudes (λ_i): The individual-specific parameter governing how outcomes above versus below the reference point affect utility. Under piecewise-linear gain-loss utility, an outcome that falls short of the reference by z reduces utility by η·λ_i·z, while an outcome above it raises utility by η·z. Loss aversion is λ_i > 1; gain-seeking is λ_i < 1; loss neutrality is λ_i = 1. In this paper, λ_i is treated as heterogeneous across individuals rather than assumed uniform.
Universal Loss Aversion: The implicit homogeneity assumption maintained in all prior tests of EBRD — that every individual has λ > 1. The authors characterize this as a form of the De Gustibus Non Est Disputandum conjecture applied to gain-loss attitudes, and document that it fails empirically in both experimental settings.
Choice-Acclimating Personal Equilibrium (CPE): The rational expectations equilibrium concept from Kőszegi and Rabin (2006, 2007) used throughout the paper to derive comparative statics. A choice is a CPE if its expected utility given its own expectation as the reference exceeds the expected utility of any alternative given that alternative’s expectation as the reference.
Reduced-Form Gain-Loss Measure (l̂_i): In the labor supply context, the individual-level OLS coefficient on Δw/w in a log-effort regression — capturing how strongly a subject reduces (or increases) effort under wage uncertainty relative to a fixed wage of equal mean. A positive l̂_i identifies loss aversion; negative identifies gain-seeking. In the exchange context, the analogous measure is the residual from regressing the first principal component of Stage 1 preference statements on object assignment.
Aggregation Problem: The paper’s central methodological contribution — when gain-loss attitudes are heterogeneous and the EBRD treatment effect is non-linear in λ_i, the average treatment effect across a heterogeneous population need not equal the treatment effect at the average λ. In the exchange experiment, the aggregate treatment effect is precisely zero even though loss-averse and gain-seeking subjects each respond in the theoretically predicted (opposite) direction, because the concave relationship between λ_i and the exchange probability treatment effect causes negative gain-seeking effects to dominate in the aggregate.