Selection in Surveys: Using Randomized Incentives to Detect and Account for Nonresponse Bias
What this paper finds — and why it matters
This paper addresses nonresponse bias in surveys — the distortion that arises when survey participants differ systematically from nonparticipants in ways that correlate with the survey’s outcomes of interest. The authors develop and apply methods to detect and correct for nonresponse bias using randomized financial incentives embedded in the survey design itself.
The empirical application is the “Norge i Koronatid” (NiK) survey, conducted by Statistics Norway in April–May 2020 to study the immediate labor market consequences of Norway’s COVID-19 lockdown. The NiK survey has two features that make it unusually well-suited for studying nonresponse bias: (1) it is linked to full-population administrative data, providing a verifiable ground truth for the entire Norwegian adult population; and (2) survey invitees were randomly assigned to one of five financial incentive levels (0%, 1%, 5%, 7%, or 10% probability of receiving a 1,000 NOK prepaid card), generating exogenous variation in participation rates. The final sample of 10,000 randomly drawn adults achieved a 47.4% participation rate.
The administrative data reveal large, statistically significant nonresponse bias across all six labor market outcomes examined. Participants in the high-incentive arm had on average roughly 930 USD (30%) higher monthly pre-lockdown earnings than the full population, and were 10.8 percentage points (19%) more likely to be employed. Standard corrections for selection on observable characteristics — including propensity-score reweighting on age, gender, immigration status, schooling, and municipality-level variables — fail to eliminate this bias. For the high-incentive arm, reweighting on individual characteristics more than doubles the nonresponse bias for earnings loss and employment loss measures relative to unweighted estimates, meaning that observable-based corrections can make things worse, not better.
A key finding is that higher participation rates do not imply lower nonresponse bias. The high-incentive arm, with the highest response rate, exhibited larger nonresponse bias than the no-incentive arm. Marginal participants — those induced to respond by higher incentives — had much stronger pre-lockdown labor market attachment (average earnings of 6,806 USD/month vs. 3,666 USD/month for inframarginal participants) but suffered substantially greater lockdown impacts: 32.3% became furloughed or unemployed versus only 3.4% of inframarginal participants.
Existing methods designed to handle selection on unobservables also perform poorly. Worst-case (Manski) bounds contain the truth but are very wide: employment before lockdown is bounded between 30% and 83% against a true value of 57%. Monotone response selection assumptions produce bounds that do not contain the population quantities for any of the six outcomes, because the marginal survey response function is empirically non-monotone. A Heckman parametric selection model produces point estimates inconsistent with the ground truth (e.g., estimating 51% pre-lockdown employment against the true 57%).
Investigation of participation timing reveals that reminder emails attract a qualitatively different type of respondent than incentives do. This motivates the paper’s central methodological contribution: a two-dimensional participation model that distinguishes “active” nonparticipants (those who received the invitation and chose not to respond because the incentive was insufficient) from “passive” nonparticipants (those who never received or attended to the invitation but who may respond to reminders). These two groups have labor market outcomes that differ from participants in opposite directions, which is why single-dimensional monotone selection models fail. The two-dimensional model, exploiting both incentive randomization and the timing of responses, produces bounds that contain or are closer to the ground truth than all other methods examined — for example, bounding pre-lockdown employment at [48%, 63%] around the true value of 57%.
The paper is scoped to a high-quality, randomly sampled, administrative-data-linked survey conducted during a period of acute economic disruption. The authors note the patterns observed may differ outside crisis periods, though the methods developed apply generally.
Q: How prevalent is nonresponse bias discussion in economics research, and what methods do researchers currently use? A: A systematic review of survey-based papers in top-five economics journals from January 2015 to August 2020 found that nearly half of studies omit any discussion of nonresponse bias despite often high nonresponse rates. Among studies using researcher-collected survey data, the average nonresponse rate is 50%; rates reach as high as 87%. When researchers do address nonresponse, 47% of own-survey papers compare sample means to a reference population and 16% apply reweighting on observables; virtually none use methods that address selection on unobservables.
Q: How was the NiK survey designed to enable testing for nonresponse bias? A: The 10,000-person random sample was assigned to five incentive groups with probabilities of receiving a 1,000 NOK credit card set at 0%, 1%, 5%, 7%, and 10%, yielding expected payoffs ranging from 1.1 USD to 11 USD. Because group assignment was random, the groups are probabilistically identical ex ante, so differences in average responses across groups — given an exclusion restriction that incentives do not directly affect answers — provide a direct test for nonresponse bias. Participation rates across the aggregated no/low/high incentive groups were 45.7%, approximately 47.6%, and approximately 51.7%, respectively; the joint test of equal participation across groups rejects with p-value < 0.01.
Q: How large is nonresponse bias in the NiK survey as measured against the administrative ground truth? A: Across all six administrative outcomes and all three incentive arms, joint tests of no nonresponse bias are rejected with p-values < 0.01. High-incentive arm participants had pre-lockdown monthly earnings roughly 930 USD (30%) above the population mean, and were 10.8 percentage points (19%) more likely to be employed. The high-incentive arm’s estimated post-lockdown employment rate of 58% overstates the true rate by 8 percentage points; a researcher comparing this to the true pre-lockdown rate of 57% would erroneously conclude employment was essentially unchanged, when in fact it dropped 7 percentage points.
Q: Does correcting for observable characteristics remove nonresponse bias? A: No. After reweighting by propensity scores constructed from age, gender, immigration status, schooling, and municipality or individual-level characteristics, joint tests of zero remaining nonresponse bias are rejected with p-values < 0.01 for each specification and incentive arm. In some cases, reweighting on individual characteristics more than doubles the nonresponse bias — for example, for earnings loss and employment loss measures in the high-incentive arm — meaning that standard observable-based corrections can amplify rather than reduce bias. Robustness checks using machine learning algorithms, class weights, imputation, and richer covariate sets including lagged outcomes yield the same conclusion.
Q: Does nonresponse bias in survey responses (not just administrative outcomes) differ across incentive arms? A: Yes. For survey-elicited outcomes, average responses differ significantly across incentive arms, with all joint equality tests rejected at p < 0.1. For example, 10.4% of high-incentive participants reported applying for UI benefits versus 7.5% in the no-incentive group. Estimated UI expenditure as a share of Norway’s 2020 social insurance budget varies from 13.2% (no-incentive arm) to 18.4% (high-incentive arm), illustrating the policy stakes.
Q: Do higher response rates reduce nonresponse bias? A: Not in this survey. The no-incentive arm, with the lowest participation rate (45.7%), exhibits smaller nonresponse bias than the high-incentive arm (51.7% participation). This finding contradicts standard guidance from the U.S. Office of Management and Budget and J-PAL research guidelines, which equate higher response rates with lower bias risk. The authors note that J-PAL has subsequently updated its guidance in response to this paper’s findings.
Q: How do marginal participants (induced by higher incentives) differ from inframarginal participants? A: Marginal participants — those who participate only under high incentives but not without them — had average pre-lockdown monthly earnings of 6,806 USD versus 3,666 USD for inframarginal participants (p-value 0.08), indicating much stronger pre-lockdown labor market attachment. Post-lockdown, both groups had similar earnings (approximately 3,600–3,800 USD/month). Consistent with this, 32.3% of marginal participants became furloughed or unemployed after the lockdown versus 3.4% of inframarginal participants. Notably, marginal and inframarginal participants do not differ significantly on observable background characteristics (age, gender, immigrant status, schooling; joint test p-value 0.70), confirming that selection is on unobservables.
Q: Why do existing methods designed to handle selection on unobservables fail? A: Worst-case (Manski) bounds contain the truth but are too wide to be informative — pre-lockdown employment is bounded at [30%, 83%] against a true value of 57%. Adding randomized incentives as instruments tightens bounds only modestly (8.5% width reduction for employment before lockdown). Monotone response selection assumptions fail because the empirically estimated marginal survey response function is non-monotone: for employment, the probability first decreases and then increases as a function of willingness-to-participate. The Heckman parametric selection model gives point estimates inconsistent with the ground truth for most outcomes (e.g., 51% estimated pre-lockdown employment vs. 57% true).
Q: What motivates the two-dimensional participation model? A: Analysis of participation timing shows that reminder emails attract a qualitatively different type of respondent than incentives alone. Reminders have a larger proportional effect on participation in the no-incentive group than in the high-incentive group, both in absolute and proportional terms. Early respondents (responding to initial contact) had lower pre-lockdown earnings and employment than late respondents (responding to reminders). This implies that the two types of unobservables — resistance to incentive and probability of receiving the invitation — are associated with outcomes that move in opposite directions, producing a non-monotone marginal survey response function that single-dimensional models cannot capture.
Q: How does the two-dimensional model work and what are its results? A: The model distinguishes active nonparticipants (saw the invitation, declined because the incentive was too low — more likely to be employed and higher earners) from passive nonparticipants (did not receive or attend to the invitation — more likely to have been adversely affected by the lockdown). By exploiting both the randomized incentive variation and the timing of responses (initial contact vs. reminder), the model partially identifies population mean outcomes under shape restrictions on the joint distribution of the two unobservables. For pre-lockdown employment, the model produces bounds of [48%, 63%] bracketing the true value of 57%, compared to worst-case bounds of [34%, 83%] and monotone selection bounds that do not contain the truth. Improvements are largest for pre-lockdown levels outcomes where the two types of nonparticipants differ most.
Q: What are the practical recommendations for survey researchers? A: Embedding randomized incentives in surveys at little or no additional cost enables an inexpensive test for nonresponse bias that does not require linked administrative data. When such a test detects bias, researchers should apply the two-dimensional model rather than relying on observable-based reweighting or conventional selection models. The question of who participates matters at least as much as how many participate; surveys should be designed to characterize and correct for selection, not merely to maximize response rates.
Nonresponse bias: The difference between the mean response among survey participants and the true population mean, arising when the decision to participate is correlated with the outcome of interest. Distinct from sampling bias; it persists even with a randomly drawn sample.
Selection on unobservables: Nonresponse bias that remains after conditioning on all observed characteristics. In the NiK survey, marginal and inframarginal participants are indistinguishable on observable demographics but differ dramatically in labor market outcomes, providing direct evidence that unobservables drive selection.
Marginal vs. inframarginal participants: Under the Imbens-Angrist monotonicity condition, inframarginal participants would respond at any incentive level; marginal participants respond only at higher incentive levels. Their average responses are separately identified using an IV regression with the incentive as instrument.
Marginal survey response (MSR): The function m(u) = E[Y*_i | U_i = u], giving the average outcome for individuals at the uth quantile of willingness to participate. The MSR is nonparametrically identified for u in [0, p(z_high)]; its empirically non-monotone shape in the NiK data explains why monotone selection assumptions produce bounds that miss the ground truth.
Active vs. passive nonparticipants: Active nonparticipants received the survey invitation and declined because the incentive was insufficient; they tend to have higher labor market attachment. Passive nonparticipants never received or attended to the invitation but may respond to reminders; they tend to have been more adversely affected by the lockdown. This distinction motivates the two-dimensional model.
Two-dimensional participation model: A model of survey participation with two unobservables — resistance to incentive (determining active nonresponse) and probability of receiving the invitation (determining passive nonresponse). By exploiting both incentive randomization and the timing of responses (initial contact vs. reminder), the model produces bounds or point estimates on population means that are narrower and closer to ground truth than single-dimensional alternatives.
Exclusion restriction for incentives: The assumption that randomly assigned incentives affect participation rates but do not directly affect participants’ answers to survey questions. This is required for incentives to serve as valid instruments for testing and correcting nonresponse bias; the authors test and find no evidence that it is violated.