Forthcoming [American Economic Review] doi:10.1257/aer.20241087

Manipulation-Robust Prediction

Daniel Björkegren

Joshua E. Blumenstock

Samsun Knight

Canonical DOI Free to read · GREEN Open access ↗

What this paper finds — and why it matters

This paper addresses the problem of algorithmic manipulation: when consequential decisions are encoded in machine learning algorithms, individuals strategically alter their behavior to achieve desired outcomes, undermining the predictive validity of the algorithm. The authors develop a “strategy-robust” approach to training decision rules that explicitly models the incentives and costs of manipulation, producing rules that remain stable even when fully transparent. They then deploy and evaluate this approach in a large field experiment in Kenya — the first real-world implementation and evaluation of such a strategy-robust empirical decision rule.

The theoretical framework considers a policymaker who observes training data with features x_i and optimal decisions y_i, and wishes to estimate a decision rule to apply to new instances where behavior may be manipulated. While the standard approach (OLS or LASSO) selects a rule optimal for the training distribution, the strategy-robust approach models how individuals will adjust behavior in response to the incentive structure implied by any given rule. Under linear decision rules and quadratic manipulation costs, each individual shifts behavior by C_i^{-1} * beta away from their “bliss level,” where C_i captures individual- and behavior-specific manipulation costs. The strategy-robust estimator finds the rule that minimizes prediction error in the counterfactual world where people manipulate — a “Stackelberg” solution that commits the policymaker to a rule while anticipating equilibrium behavioral responses. Unlike LASSO, which penalizes all features equally without regard to their manipulability, the strategy-robust approach attenuates the weight on features that are both easily manipulated and subject to manipulation noise.

The empirical setting is a smartphone app (“Smart Sensing”) deployed to 1,557 participants in Nairobi, Kenya, in collaboration with the Busara Center. The app passively collected over 1,000 behavioral indicators (calls, texts, app usage, mobility, etc.) and delivered weekly financial “challenges” that rewarded participants based on decision rules randomly assigned to them. Average weekly payouts were calibrated to approximate typical digital credit loan amounts in Kenya at the time (approximately $4.80). The experiment has two phases: a training phase using control (beta = 0) and simple single-behavior incentive rules to estimate manipulation cost parameters via GMM, and an implementation phase using complex multi-feature decision rules to compare strategy-robust versus LASSO classifiers.

The main findings are as follows. First, participants demonstrably manipulate behavior: a joint F-test that incentive diagonals all equal zero is rejected with p < 0.001. The number of texts sent was 49 times more responsive to incentives than the number of people called during the workday. Outgoing communications are cheaper to manipulate than incoming, and simple behaviors (e.g., average talk time) more manipulable than complex ones (e.g., standard deviation of talk time). Individuals who self-report higher tech skills find manipulation 9% easier on average, and the 90th percentile of gaming ability finds manipulation twice as easy as the 10th percentile.

Second, in the implementation phase, strategy-robust decision rules outperform LASSO when the decision rule is made transparent to participants. Across all pooled outcomes, strategy-robust rules reduce RMSE by 11% (p = 0.024) relative to LASSO under transparency. For the single income-prediction outcome alone, the improvement is 5% ($0.19 RMSE reduction) but not statistically significant (p = 0.507).

Third, the framework enables estimation of the “cost of transparency.” Making naive LASSO rules transparent lowers performance by 23%. Switching to strategy-robust rules under full transparency reduces that performance decline to 9.2% — a 60% reduction in the cost of transparency. The model predicts this cost to be 9.8%, close to the implemented value of 11.3%.

The scope of the findings is bounded by the linear model with quadratic manipulation costs, a particular population of Kenyan smartphone users, and financial incentive magnitudes comparable to small digital credit loans. The mechanism relies on experimentally estimating manipulation cost parameters, though the authors also show that expert elicitation provides a correlated but noisier substitute (correlation 0.30 with experimental estimates).

Q: What is the core market failure the paper addresses, and why do standard fixes fail?

A: Standard machine learning training assumes the relationship between observed features and outcomes is stable, but implementing a consequential decision rule creates incentives for individuals to manipulate the features on which the rule is based (Goodhart’s Law; Lucas critique). The two common industry responses — restricting to “stable” predictors and keeping rules secret — are inadequate: restricting predictors amounts to a dogmatic prior that manipulation costs are either infinite or zero, while secrecy is increasingly at odds with demands for algorithmic transparency and fails anyway when sophisticated actors reverse-engineer the rule. Periodic retraining treats manipulation as generic covariate shift, can produce non-converging oscillations, and requires observing mistakes before learning from them.

Q: How does the strategy-robust estimator differ from OLS and LASSO?

A: OLS maximizes fit within the unincentivized training sample but ignores that implementing beta will shift behavior; LASSO adds a regularization penalty but still assumes behavior remains fixed at bliss levels and so penalizes all features equally regardless of manipulability. The strategy-robust estimator replaces each individual’s observed behavior x_i with their anticipated counterfactual behavior x_tilde_i(beta) = x_i + C_i^{-1} * beta, and finds the beta that minimizes prediction error in this manipulated distribution — a Stackelberg equilibrium. It attenuates features that are easily manipulated or subject to high manipulation noise, shifting weight toward harder-to-manipulate features even when the latter are less predictive in the training data.

Q: What are the three ways the strategy-robust estimator differs from standard estimators?

A: First, it anticipates level shifts in behavior: behaviors respond to beta, so observed training behaviors are replaced by counterfactual manipulated behaviors. Second, it accounts for signaling and noise: when manipulation ability correlates with the outcome of interest, manipulation can be informative about type (as in Spence 1973), but unobserved heterogeneity in gaming ability that is unrelated to outcomes introduces noise that attenuates coefficients on manipulable behaviors. Third, it achieves subgame perfection by anticipating how behaviors would respond to off-path deviations in beta, rather than assuming behaviors are fixed when beta deviates — yielding a Stackelberg rather than a one-step best-response solution.

Q: How were manipulation cost parameters estimated in the Kenya experiment?

A: In the training phase, each participant was randomly assigned to simple single-behavior incentive rules (e.g., “earn 12 Ksh. per incoming call this week, up to 250 Ksh.”) or control rules (beta = 0). This random variation in per-behavior incentives identifies how sensitive each behavior vector is to incentives, enabling GMM estimation of individual and behavior-specific cost parameters C and the heterogeneity scaling parameter omega. Off-diagonal elements of C were regularized to zero due to noisy estimation; diagonal elements used LASSO penalization with lambda = 1.0 set by cross-validation. Observable heterogeneity was allowed to vary with self-reported tech skills, which explained the most variation in preliminary analysis.

Q: What patterns were found in manipulation costs across behaviors?

A: Outgoing communications are cheaper to manipulate than incoming communications. Text messages, being relatively cheap to send, are more manipulable than calls. Simple behaviors such as average call duration are more manipulable than complex behaviors such as the standard deviation of talk time. Cross-behavior elasticities exist but are mostly noisy: 94.5% of off-diagonal incentive effects are not statistically significant (p < 0.05), 3.6% are significantly positive, and 1.8% are significantly negative.

Q: How large is heterogeneity in gaming ability, and what predicts it?

A: Individuals who self-report advanced or higher tech skills find it on average 9% easier to manipulate behaviors. Including unobserved heterogeneity, the 90th percentile of gaming ability finds manipulation twice as easy as the 10th percentile. Much of the heterogeneity arises from unobservables not captured by observables in the model.

Q: What happened when the naive LASSO rule was made transparent versus when the strategy-robust rule was made transparent?

A: Under the transparent treatment, participants received the full coefficients of the decision rule plus access to an interactive earnings calculator. Making naive LASSO rules transparent lowered performance by 23% relative to the opaque naive rule (RMSE $3.780 versus $4.641 in pooled outcomes). Switching to strategy-robust rules under full transparency reduced the performance decline to 9.2% — corresponding to a 60% reduction in the cost of transparency. The model predicted this cost to be 9.8%, which is close to the implemented value of 11.3%.

Q: What does the reduced-form evidence on behavior change under complex decision rules show?

A: Under the opaque treatment, participant behavior responses to complex decision rules were largely statistically insignificant and often in the wrong direction — 38.5% of estimated behavioral effects are in the same direction as the incentivized behavior. Under the transparent treatment, 75.4% of point-estimated effects are in the same direction as the incentive, confirming that transparency is a prerequisite for meaningful manipulation in this setting.

Q: How does the paper compare strategy-robust estimation to iterative retraining?

A: Simulation results show that iterative retraining of a naive LASSO model approaches the performance of the strategy-robust method after approximately 4 iterations. However, simulated performance of iterative retraining then begins to deteriorate; for the intelligence outcome, performance eventually falls below baseline performance before any retraining began. This illustrates that myopic best responses can produce non-convergent or suboptimal dynamics, while the strategy-robust approach finds the equilibrium rule directly.

Q: How does the paper compare strategy-robust estimation to the “intuitive” approach of simply excluding highly manipulable features?

A: The intuitive approach of excluding features above a manipulability threshold reduces predicted manipulability but also discards useful predictors. In some cases, the exclusions leave LASSO with no behaviors predictive enough to include, reducing performance. The strategy-robust approach can extract signal even from manipulable behaviors by adjusting their weights to account for manipulation noise, and outperforms the intuitive exclusion approach in the simulations reported in the Supplemental Appendix.

Q: Can manipulation costs be estimated without an experiment?

A: The authors briefly explore expert elicitation as a nonexperimental alternative: 171 individuals were surveyed to predict how Kenyans would manipulate phone behaviors when incentivized. Experts generally predicted lower costs (more manipulability) than observed experimentally, but the correlation between expert predictions and experimental estimates is 0.30. Using expert-elicited costs to train the strategy-robust model improved simulated performance substantially for one focal outcome and had an inconsequential negative effect for the other. Costs can also potentially be estimated from market prices and first principles when a structural model of underlying manipulations is available.

Q: What is the paper’s interpretation of its results through the lens of the Lucas critique?

A: The paper frames its contribution as a machine learning interpretation of Lucas (1976): just as implementing an economic policy changes the behavioral relationships on which the policy was calibrated, implementing a predictive decision rule beta changes the distribution of the very features the rule is based on. The key insight is that this counterfactual world has predictable structure — including a feature in the model tends to induce manipulation in that feature of a magnitude directly related to beta — so counterfactual fit can be estimated and rules can be optimized to perform well in the equilibrium they induce.

Q: What are the policy implications for algorithmic transparency?

A: The framework allows a policymaker to quantify and reduce the performance cost of transparency. The estimated equilibrium cost of transparency is roughly 10% when using strategy-robust rules, substantially less than the approximately 23% cost of making naive rules transparent. This means that strategy-robust rules can be disclosed — satisfying demands for a “right to explanation” under regulations such as GDPR — while losing far less performance than opaque naive rules would lose if disclosed.

Strategy-robust decision rule: A decision rule trained to anticipate that individuals will manipulate the features on which it is based, by replacing observed training behaviors with anticipated counterfactual manipulated behaviors in the loss function. It yields a Stackelberg equilibrium in which the policymaker commits to a rule while correctly forecasting the equilibrium behavioral response.

Manipulation costs (C_i): Individual- and behavior-specific quadratic costs that determine how far an individual shifts behavior from their bliss level in response to the incentive implied by a decision rule’s coefficient vector beta. Higher costs imply less behavioral response; costs are parameterized to allow separable heterogeneity by person and by behavior.

Bliss level (x_i): An individual’s unincentivized behavior — the behavior they would exhibit absent any decision rule (i.e., when beta = 0). Estimated from control periods in the experiment.

Gaming ability (gamma_i): Individual-level scaling factor for manipulation costs; a higher value means lower costs and easier manipulation. Modeled as a function of observable characteristics (e.g., self-reported tech skills) and unobservable heterogeneity.

Counterfactual fit: Predictive fit evaluated in the counterfactual state of the world where the decision rule is implemented and agents manipulate their features in response. The strategy-robust approach maximizes counterfactual fit, sacrificing within-sample fit (as measured on unmanipulated training data) to improve performance in deployment.

Cost of transparency: The reduction in predictive performance of a decision rule when its coefficients are disclosed to the individuals being evaluated. In the experiment, disclosure reduces performance of naive LASSO rules by 23% and strategy-robust rules by 9.2%, implying strategy-robust rules reduce the cost of transparency by 60%.

Stackelberg equilibrium: The solution concept in which the policymaker (leader) commits to a decision rule, correctly anticipating the best-response behavior of individuals (followers), rather than taking behavior as fixed or updating myopically. The strategy-robust estimator implements this equilibrium concept.

Performative prediction: The broader phenomenon, drawing on Perdomo et al. (2020), whereby a decision rule changes the distribution of the data it is applied to. The paper’s strategy-robust approach is an empirically estimable solution within this framework.

How this summary was made. Bibliographic fields are pulled from Crossref and OpenAlex and are not model-generated. The summary was drafted from the open-access manuscript , checked by a claim-grounding and calibration review pass, and approved before publishing. Found an error or a misrepresentation? Flag it here — corrections are welcome, especially from the authors.