J24 | Macro Paper Warehouse

Abundance from Abroad: Migrant Income and Long-Run Economic Development

Mon, 01 Jan 0001 00:00:00 +0000

Layer 1 — Overview

Research Question

This paper asks how persistent increases in international migrant income prospects affect long-run economic development in migrant-origin areas. The central question is whether Philippine provinces with persistent access to higher-income migration opportunities develop faster than provinces with less attractive migration opportunities, and through which channels.

Natural Experiment and Identification Strategy

The authors exploit the 1997 Asian Financial Crisis as a large-scale natural experiment. The crisis triggered sharp, heterogeneous, and persistent exchange rate changes across Philippine migrants’ destination countries — ranging from a 4% depreciation against the Philippine peso (Korea) to a 57% appreciation (Libya), with Japan and Saudi Arabia in between (appreciations of 32% and 52%, respectively). Because Philippine provinces differed in the pre-crisis distribution of migrant income across destinations (measured using unusual POEA/OWWA administrative contract data covering all overseas worker contracts, including migrant incomes, origins, and destinations), these exchange rate shocks generated exogenous, province-level variation in a shift-share instrument: the predicted change in province migrant income per capita due to the 1997 shocks. Identification follows the “exogenous shares” framework of Goldsmith-Pinkham et al. (2020). Pre-trend tests across up to 12 years of pre-shock panel data find no evidence of differential trends across provinces. The five destinations with the highest Rotemberg weights — Saudi Arabia, Japan, United States, Taiwan, and Hong Kong — collectively account for 75% of the identifying variation. The exchange rate shocks and the exposure weights both exhibit strong persistence over two decades post-1997.

Data

Philippine government administrative data (POEA/OWWA) on all overseas worker contracts, 1992–2015, matched at 95% rate, providing province-of-origin and destination-specific migrant income.
Philippine Family Income and Expenditure Survey (FIES), up to twelve triennial rounds from 1985–2018 (74 provinces, ~40,000 households per round), for domestic income and expenditure.
Six rounds of the Philippine Census of Population (1990–2015) for education, migration rates, and sectoral employment shares.
Province-level consumer price index data (1994–2017) and firm-level export survey data for robustness checks.
Unit of analysis: 74 Philippine provinces (consistent 1990 borders).

Main Findings with Quantitative Magnitudes

Six-fold magnification of migrant income: Each unit of initial short-run shock (1997–1998) to migrant income per capita is magnified more than six-fold by 2009–2015. A one-standard-deviation shock (0.093) raises long-run migrant income per capita by 14.7% of the baseline mean (PhP 601 per capita, 0.2 standard deviations).
Domestic income gains predominate: A one-standard-deviation shock raises domestic income per capita (excluding migrant income and remittances) by 6.4% of the baseline mean (PhP 1,676, 0.18 standard deviations). Remarkably, 73.6% of the long-run global income increase comes from domestic income and only 26.4% from migrant income.
Global income and expenditure: A one-standard-deviation shock raises global income per capita by PhP 2,277 (0.2 standard deviations, or 7.5% of the baseline mean) in 2009–2015. Expenditure per capita rises by PhP 1,159 (0.13 standard deviations). Effects emerge gradually over two decades.
Education: A one-standard-deviation shock increases the college-educated share of the population by 0.46–0.51 percentage points (0.11–0.12 standard deviations) and secondary completion by 0.63 percentage points. There is no significant effect on primary completion.
Migration rates and skill composition: A one-standard-deviation shock increases the migration rate by 0.19 percentage points (0.22 standard deviations), raises the share of skilled migrants by 1.84 percentage points (0.19 standard deviations), and increases average migrant annual salary by PhP 23,703 (0.16 standard deviations). New migration concentrates in higher-education-quartile occupations.
Structural change: The shock reduces primary sector employment shares by 1.2 percentage points per standard deviation (0.06 standard deviations), with over 70% of that shift absorbed by non-tradable goods and services sectors. Domestic income gains are driven almost entirely by non-agricultural income, and roughly 55% of the increase in entrepreneurial income is from service sectors.
Education’s contribution to income: Model-based calculations assign 19.6% of the global income gain, 17.8% of the migrant income gain, and 20.2% of the domestic income gain to educational investments. Exchange rate persistence plus altered migration flows explain an additional 64.6% of the migrant income increase, so together these mechanisms account for 82.3% of the six-fold magnification. A demand multiplier (assuming 64% of migrant income returns to origin economies and a multiplier of 2.9, consistent with estimates from the literature) accounts for approximately 83.3% of the non-education-related portion of the domestic income increase.

Threats to Identification Ruled Out

Import and export shift-share controls (constructed analogously using bilateral trade data and province-level industry employment shares) are uncorrelated with the migrant income shock and leave coefficient estimates unchanged. Province-level manufactured exports, agricultural income, the CPI, and national-level FDI inflows show no statistically significant response to the shock. Internal migration rates are unaffected. Geographic spillover controls and tourism controls do not alter results. Placebo regressions in the pre-period yield small, statistically insignificant coefficients.

Scope Conditions

The paper studies formal, government-regulated temporary labor migration from the Philippines, where migrants sign contracts through POEA-licensed agencies and typically expect to return after one or more contracts. The findings apply specifically to settings where persistent (not transitory) migrant income shocks occur. Approximately 60% of contract migrants are female. The study period spans 1985–2018, with main long-run outcome analyses comparing 1994 (pre-shock) with 2009–2015 (post-shock).

Layer 2 — Q&A

Q1: What makes the 1997 Asian Financial Crisis useful as a natural experiment for this paper’s purposes?

A1: The crisis was largely unanticipated by policymakers, international organizations, and financial markets, making it implausible that pre-1997 migration destination choices reflected anticipation of the shocks. Exchange rate changes were heterogeneous across destinations (ranging from a 4% depreciation to a 57% appreciation), and crucially, these changes proved highly persistent over two decades — regression coefficients of long-run exchange rate changes on the initial 1997–1998 shock are close to and statistically indistinguishable from 1 in nearly all post-shock periods. Combined with the province-specific variation in migrant destination exposure, this generates persistent, exogenous, and heterogeneous shocks to migrant income prospects across provinces.

Q2: What is the shift-share variable, and how does it combine “shifts” and “shares”?

A2: The shift-share variable Shiftshareo equals the sum over destinations d of (ωdo0 × ΔRd), where ωdo0 is province o’s pre-shock migrant income per capita from destination d (the “exposure weight” or “share”), and ΔRd is the fractional change in destination d’s exchange rate from before to after the crisis (the “shift”). It captures the predicted change in province-level migrant income per capita due to the 1997 exchange rate shocks, and is derived directly from a theoretical model of migration. Identification relies on the “exogenous shares” approach of Goldsmith-Pinkham et al. (2020): the pre-1997 exposure weights are treated as as-good-as-randomly assigned conditional on controls, because they reflect historical migration networks formed well before the crisis.

Q3: Why is the six-fold magnification of the initial migrant income shock so striking, and what does the structural model say about its sources?

A3: The coefficient on migrant income per capita (6.463 in Panel D of Table 1) implies that for each unit of initial short-run migrant income shock, migrant income per capita is more than six units higher in 2009–2015 — a far larger response than a one-for-one pass-through would predict. The structural model, which augments a Fréchet-based gravity model of migration with endogenous education investments, accounts for 82.3% of this magnification. Education investments explain 17.8% of the migrant income increase; persistent favorable exchange rates and resulting shifts in migration flows across destinations explain an additional 64.6%. The Fréchet elasticity of migration flows with respect to destination wages is estimated at θ = 3.42 via PPML, implying that even partial reorientation of migrants toward now-higher-wage destinations substantially raises aggregate migrant income.

Q4: What evidence supports the parallel trends assumption in the pre-shock period?

A4: The authors present event study diagrams (Figure 2) showing no differential positive pre-trends in either expenditure per capita or domestic income per capita prior to 1997 — for domestic income, there is a statistically insignificant negative trend from 1985–1991 and no trend in 1991–1994. Placebo regressions estimated on the pre-period only (1985, 1988, 1991 as “pre,” 1994 and 1997 as “post”) yield small, statistically insignificant coefficients on both domestic income and expenditure. Balance tests focusing on the five high-Rotemberg-weight destination shares (Saudi Arabia, Japan, US, Taiwan, Hong Kong) — which collectively account for 75% of the identifying variation — also show no significant pre-trends in key outcomes across provinces with varying levels of exposure.

Q5: How do the authors rule out trade flows as an alternative mechanism for the estimated income effects?

A5: They construct separate import and export shift-share variables, analogous to the “China shock” of Autor et al. (2013), using baseline bilateral trade values (from COMTRADE, disaggregated to 36 ISIC industries), province-level employment shares in import and export industries (from the 1990 Census), and the same destination exchange rate shocks. These trade shift-share variables are uncorrelated with the migrant income shock after conditioning on baseline controls (Appendix Table A5). Including them as additional controls in Panel D of all main regression tables leaves the migrant income coefficient stable. Further, province-level manufactured exports per capita show no large or statistically significant response to the migrant income shock, agricultural income similarly shows no significant response, and consumer price indices are unresponsive — ruling out import price changes as a confound. FDI inflows at the national level also show no significant relationship with destination-country exchange rate shocks.

Q6: What is the composition of the domestic income gains — where do they come from?

A6: Both wage income and entrepreneurial/rental income rise significantly and in similar magnitude, while “other income” (pensions, interest, dividends) shows no robust increase (Table 4). Non-agricultural income drives virtually the entire domestic income gain; agricultural income per capita is statistically insignificant (Table 5, columns 1–2). Within entrepreneurial income, approximately 55% of the increase is from service sectors, with manufacturing and primary sector entrepreneurial income showing insignificant effects at the 10% level (Table 5, columns 3–5). These patterns are consistent with the structural change finding: the shock shifts labor from primary sectors toward non-tradable goods and services rather than toward tradable manufacturing.

Q7: What is the “global income” concept and what share does each component contribute?

A7: Global income per capita is defined as the sum of domestic income per capita (earned within the Philippine economy, excluding all international transfers) and migrant income per capita (the full income earned abroad by a province’s international migrants, calculated from contract data). Of the long-run global income increase, 73.6% comes from domestic income and 26.4% from migrant income. A one-standard-deviation shock raises global income by PhP 2,277 per capita in 2009–2015 (0.2 standard deviations, or 7.5% of the baseline mean).

Q8: How do education effects translate into more and higher-skilled migration?

A8: A one-standard-deviation migrant income shock increases college completion by 0.46 percentage points and secondary completion by 0.63 percentage points (with no significant effect on primary completion), consistent with the shock raising the return to higher education in the broader population. These better-educated workers then migrate at higher rates: the share of migrants who are skilled (college-educated) rises by 1.84 percentage points per standard deviation. Migration increases are concentrated in the two highest-education quartiles of occupations (engineers, medical professionals, teachers in the 4th quartile; caregivers, restaurant workers, performing artists in the 3rd quartile), with no significant effect in the two lowest quartiles. Average annual migrant salary rises by PhP 23,703 per standard deviation (0.16 standard deviations).

Q9: What mechanisms does the structural model invoke to explain the domestic income gains?

A9: The model treats domestic income changes as arising through at least two channels: (1) the education channel, which the model assigns 20.2% of the domestic income increase (using the estimated college completion response of 0.046 per unit shock, baseline skill-migration probabilities, and baseline skill premia for domestic income); and (2) a demand multiplier operating on the portion of migrant income remitted to origin provinces, combined with capital accumulation from sustained migrant income flows. Assuming 64% of migrant income returns to origin economies (estimated indirectly from KNOMAD/ILO and Survey on Overseas Filipinos data) and a multiplier of 2.9 (consistent with estimates from Kenya and India), this demand-plus-investment channel can explain approximately 83.3% of the remaining (non-education-related) domestic income increase of PhP 14.4 per unit shock. Under baseline assumptions (α = 0.64), the stylized dynamic model generates PhP 18.88 of domestic income by 2015 from a PhP 1 initial shock — close to the empirical estimate of PhP 18.02.

Q10: How do the authors assess SUTVA and internal migration?

A10: They test whether the migrant income shock affects net internal migration rates at the provincial level (Appendix Table A6) and find no large or statistically significant impact. There is a small negative effect on outmigration of young adults (aged 16–24) that the authors judge cannot account for the documented income impacts. The Philippines’ archipelago geography (over 7,000 islands) is noted as likely limiting inter-provincial economic spillovers; to the extent spillovers occur, they would be positive (demand spillovers from provinces experiencing income gains to neighboring provinces), making estimates conservative lower bounds. Direct tests controlling for the inverse-distance-weighted migrant income shock in neighboring provinces leave main estimates unchanged.

Q11: Are the exposure weights (migration shares) persistent, and does this support interpreting the shock as persistent?

A11: Yes. Regressions of dyadic migrant income per capita in post-shock years (2009, 2012, 2015) on dyadic migrant income per capita in 1995 yield coefficients ranging from 0.4 to 0.6, each statistically significantly different from zero (and from 1, indicating partial but substantial persistence). The exchange rate shocks ΔRd are even more persistent: regression coefficients on the initial 1997–1998 shock are close to 1 and statistically indistinguishable from 1 in nearly all post-shock periods (with the only exceptions in 2009–2012 during the Great Recession). Both components of the shift-share variable thus show persistence over two decades, supporting interpretation of the long-run effects as responses to a persistent (not transitory) income shock.

Q12: What are the policy implications and how do the authors connect findings to migration policy?

A12: The findings suggest migration policy should be an important part of the development policy toolkit. The results are directly relevant to origin-country policies facilitating formal, contract-based labor migration (e.g., regulation of recruitment agencies, educational investments to raise worker skills and competitiveness for overseas employment) and destination-country policies governing legal immigration opportunities. The authors also note implications for overseas development assistance: development agencies could consider supplementing traditional foreign aid with programs that facilitate international labor migration. The paper’s context — formal, government-regulated migration through POEA and OWWA — is described as highly policy-relevant, with 94% of developing countries with populations exceeding 1 million having a dedicated government migration agency and 78% having policies promoting migrant remittances.

Key Concepts

Shift-share variable (Shiftshareo): The paper’s primary independent variable, equal to the sum over all overseas destinations d of (ωdo0 × ΔRd) — the province’s pre-shock migrant income per capita from each destination (the exposure weight or “share”) multiplied by that destination’s exchange rate shock (the “shift”). It is the predicted change in province migrant income per capita due to the 1997 Asian Financial Crisis exchange rate shocks, and is derived directly from the theoretical model of migration (Equation A9). Identification treats the exposure weights as exogenous following the “exogenous shares” approach of Goldsmith-Pinkham et al. (2020).

Exposure weights (ωdo0): Province o’s pre-shock aggregate migrant income per capita earned in destination d, calculated from administrative POEA/OWWA contract data for 1995. These serve as the “shares” in the shift-share and capture the extent to which a province’s residents are exposed to a given destination’s exchange rate shock. They reflect historically-formed migration networks rather than anticipation of future shocks.

Global income per capita: The sum of domestic income per capita and migrant income per capita. Domestic income is household income earned within the Philippine economy (wages, entrepreneurial, and other sources), explicitly excluding all income from international sources including remittances. Migrant income is the full income earned abroad by all international migrants from the province, calculated from contract data (not remittances sent home). Global income thus captures the full resource gain available to a province from the combination of domestic production and international migration.

Magnification (of migrant income shock): The empirical finding that the long-run coefficient on migrant income per capita (6.463 in Panel D, Table 1) far exceeds 1 — meaning each unit of initial short-run shock becomes more than six units of migrant income per capita in 2009–2015. The paper decomposes this magnification into contributions from persistent exchange rates, educational investments raising skill levels and migration, and shifts in migration flows toward now-higher-wage destinations.

Brain gain: The paper’s term for the process by which improved migrant income prospects raise educational investments among the broader population (not just among migrants), leading to higher skill levels among non-migrants as well. The paper distinguishes this from “brain drain” (where migration of skilled workers reduces origin-area human capital) and provides evidence of a “virtuous cycle”: education raises migration rates and migrant skill levels, which in turn raises migrant and domestic incomes, potentially funding further education.

Rotemberg weights: Province-destination-level weights (following Goldsmith-Pinkham et al. 2020) characterizing which destination-specific exchange rate shocks drive the estimates most. Saudi Arabia (0.20), Japan (0.19), United States (0.18), Taiwan (0.10), and Hong Kong (0.08) together account for 75% of the total Rotemberg weight. These weights guide which destination-specific exposure shares receive the most scrutiny in pre-trend and balance tests.

Fréchet elasticity (θ): The elasticity of migration flows from an origin province to a destination with respect to destination wages (in Philippine pesos), estimated at 3.42 via PPML using the exchange rate shocks. This parameter governs how much migration flows — and thereby migrant income — respond to the persistent exchange rate changes, and is central to the model’s decomposition of the six-fold magnification of migrant income effects.

Domestic income multiplier: The ratio of long-run domestic income increase to the portion of the migrant income shock that returns to origin provinces. Assuming 64% of migrant income returns to origin economies (estimated from multiple administrative data sources), the implicit demand multiplier in the paper’s context ranges from about 2.9 to 3.4, consistent with multipliers found in related literature on cash transfers and credit supply shocks in low-income settings.

Defying Distance? The Provision of Medical Services in the Digital Age

Mon, 01 Jan 0001 00:00:00 +0000

This paper asks whether digital platforms can improve healthcare outcomes by enabling needs-based matching between patients and physicians unconstrained by geography. Amanda Dahlstrand studies digital primary care in Sweden during 2016-2018, exploiting nationwide conditional random assignment between approximately 200,000 patients and 143 doctors employed by Europe’s largest digital primary care provider. Patients who selected the “first available doctor” option (82% of first visits) were effectively randomized to a doctor within each 3-hour shift-by-date stratum, generating quasi-experimental variation free of the patient-doctor sorting that confounds identification in physical primary care.

The paper defines three observable dimensions of primary care physician skill: (1) identifying risky patients and triaging them to higher levels of care, measured by whether patients subsequently have an avoidable hospitalization within 90 days; (2) providing guideline-consistent treatment, measured by counter-guideline antibiotic prescriptions; and (3) leaving patients sufficiently informed so they do not unnecessarily seek additional in-person care within the following week. Doctor skill in each dimension is estimated via a value-added framework in a hold-out sample (Sample 1, the first 600 randomized consultations per doctor), using empirical Bayes shrinkage to reduce noise. Complementarities between doctor skill and patient risk are then estimated in a disjoint main sample (Sample 2).

A central finding is that doctor skill is task-specific rather than governed by a single latent ability: skills across the three tasks are not positively correlated, meaning doctors within general practice have individual “specializations.” A patient ranked in the top 1% of avoidable hospitalization risk who is matched to a doctor ranked in the top 10% at reducing avoidable hospitalizations experiences a 90% reduction in that adverse outcome, relative to a patient with the same risk profile matched to the worst-performing doctor. Patients not estimated as risky show effects indistinguishable from zero when matched to the same high-skilled doctors, establishing a strong complementarity between doctor type and patient risk.

Using the Average Match Function framework of Graham, Imbens, and Ridder (2014, 2020), the paper evaluates counterfactual reallocation policies. Reallocating only 2% of patients — those in the top 1% of predicted avoidable hospitalization risk — to doctors in the top 10% of triage skill reduces aggregate avoidable hospitalizations by 20% relative to random assignment, without adversely affecting counter-guideline prescriptions or other measured outcomes. Doctor skills across outcomes are not positively correlated, so this reallocation does not generate meaningful trade-offs. The paper benchmarks this matching policy against a selective hiring/expansion policy in which doctors with above-median skill in three tasks expand their hours by up to 70% at the expense of below-median peers; that policy yields no significant reduction in avoidable hospitalizations and only a 4% reduction in counter-guideline prescriptions — smaller gains than matching and harder to implement.

The paper also documents that physical primary care quality is worse in lower-income and more deprived areas of Sweden (a negative relationship between deprivation index and patient-reported experience is statistically significant at the 1% level in a cross-section of roughly 120-150 primary care centers in Region Skane). Because the estimated risk of avoidable hospitalization and prior avoidable hospitalizations are concentrated in the lower end of the income distribution, needs-based digital matching reallocates triage skill toward lower-income patients, severing the correlation between local area income and service quality. Simulating positive assortative matching on patient income and doctor skill — approximating existing healthcare inequalities — leads to more avoidable hospitalizations than random assignment, because the most vulnerable patients tend to be the poorest. Scope conditions: findings derive from a single digital primary care provider in Sweden, 2016-2018, pre-pandemic, covering conditions amenable to video consultation and a patient pool younger and somewhat more urban than the average Swedish citizen.

Q: What is the key identification strategy, and why is it valid in this setting but not in physical primary care? A: Patients who selected the “drop in” (first available doctor) option — 82% of first visits — were assigned to whichever certified doctor was next in the roster within a 3-hour shift-by-date stratum, a by-product of the first-come-first-served queue. Neither patients nor doctors could intervene in this digital process. The author validates the assumption by regressing doctor characteristics on patient characteristics controlling for shift-by-date fixed effects and finds characteristics are balanced. In physical primary care, endemic patient-doctor sorting means doctors do not meet a common support of patient types, preventing causal identification of doctor effects.

Q: How are doctor skill estimates constructed and why does the split-sample matter? A: Doctor skill in each task is estimated as an empirical Bayes-shrunk random effect from a value-added regression on Sample 1, each doctor’s first 600 randomized consultations (40% of the sample). Sample 2 (60%) is entirely disjoint and used to estimate complementarities between doctor skill and patient risk. The split-sample design prevents overfitting: doctor skill was estimated on different patients than those in Sample 2. The Durbin-Wu-Hausman test does not reject random effects (p = 0.16).

Q: What is the main quantitative result on avoidable hospitalization matching? A: A patient ranked in the top 1% of predicted avoidable hospitalization risk matched to a doctor ranked in the top 10% at reducing avoidable hospitalizations could reduce that patient’s avoidable hospitalizations by 90%, relative to the worst-performing doctor in that skill. At the aggregate level, reallocating only 2% of patients (those in the top 1% risk group) to high-triage-skill doctors reduces avoidable hospitalizations across the full patient population by 20% compared to random assignment.

Q: Does the avoidable hospitalization reallocation harm other outcomes? A: No. The paper explicitly evaluates the Average Reallocation Effect on counter-guideline prescriptions and additional in-person care seeking when optimizing for avoidable hospitalizations, and finds no significant adverse effects on these other outcomes. The author attributes this to the fact that doctor skills across tasks are not positively correlated, so reallocating triage-skilled doctors does not systematically remove skill from other dimensions.

Q: How does matching compare to selective hiring and hour expansion as a policy? A: Even expanding the working hours of doctors with above-median skill across three tasks by as much as 70% yields no significant reduction in avoidable hospitalizations and only a 4% reduction in counter-guideline prescriptions — both smaller gains than the matching policy. Matching outperforms hiring expansion because patients have heterogeneous needs that can be identified from prior healthcare records, and doctors have differentiated skill sets relevant to some patients but not others.

Q: What is the evidence that doctor skills are task-specific rather than reflecting a single latent ability? A: The estimated doctor effects across the three tasks — triaging to avoid hospitalizations, guideline-consistent antibiotic prescribing, and minimizing unnecessary follow-up care — are not positively correlated with one another. This means a doctor who is effective at one task is not systematically effective at others, indicating individual specializations within general practice that are not accounted for in standard primary care organization.

Q: How is patient risk for avoidable hospitalizations measured? A: A propensity score is estimated from pre-digital physical healthcare data (2013-2015), regressing past number of avoidable hospitalizations on demographic and healthcare utilization variables — including age, a disease index of chronic diagnoses, and previous hospitalizations — all variables already available in patient medical records. The top 1% of predicted risk scores are classified as “risky.” Patients in the risky group had on average 0.35 avoidable hospitalizations in the prior 3 years, versus 0.01 for non-risky patients.

Q: What is the distributional (equity) implication of needs-based matching versus income-assortative matching? A: Estimated risk of avoidable hospitalization and the count of prior avoidable hospitalizations are concentrated in the lower end of the income distribution. Needs-based matching therefore reallocates triage skill toward lower-income patients. Simulating positive assortative matching on patient income and doctor skill — approximating observed inequalities in physical care — produces more avoidable hospitalizations than random assignment, because the most vulnerable patients are often the poorest. Needs-based digital matching can sever the link between local area income and service quality.

Q: How does digital care usage sort by income and demographics in the data? A: At the extensive margin, the deprivation index (Care Need Index) is similar among digital users and non-users in Region Skane. However, at the intensive margin, individuals with a higher deprivation index who use the digital service have more appointments in it; similarly, lower-income users use the service more intensively. Digital care users are younger than non-users and are more likely to live in cities than the average Swedish citizen.

Q: What are avoidable hospitalizations and why are they the primary outcome? A: Avoidable hospitalizations (also called hospitalizations for ambulatory care sensitive conditions) are hospital admissions defined in the medical literature as preventable by adequate and timely primary care. They are coded using ICD-10 diagnosis codes listed in Page et al. (2007). The most common diagnoses in the 90-day post-consultation window are respiratory and genitourinary, conditions commonly treated in digital care. The outcome is rare (0.2% of patients in the sample), but high-stakes: an estimated 1.1 potential life years are lost per avoidable hospitalization, and in Sweden they cost an estimated SEK 7.1 billion (~$820 million) annually (7% of inpatient curative and rehabilitative care costs).

Q: What is the scope of the counter-guideline antibiotic prescription outcome? A: Non-adherence is coded against 16 guidelines from Sweden’s strategic programme against antibiotic resistance (Strama 2017, 2019), all designed to limit or narrow antibiotic use. The measured rate of non-adherence is described as quite low by international standards; the CDC estimates 28% of US antibiotic prescriptions are unnecessary, while the author’s sample rate is 2%. The guidelines require doctors to sometimes refuse patients who request antibiotics, introducing a behavioral compliance dimension to this skill.

Q: What are the costs and feasibility considerations for implementing needs-based digital matching? A: The paper characterizes matching as a “resource-neutral” policy because it reallocates existing doctors without hiring or training. The primary costs are a small increase in waiting time for some patients and the costs of importing data and developing the matching algorithm. Because the algorithm handles patient-doctor allocation while doctors retain all clinical decision-making, the policy functions as a complement to human skill rather than a substitute, which the author argues makes it less subject to “algorithm aversion.”

Q: Why does the paper restrict to each patient’s first digital consultation only? A: The first visit is the one subject to conditional random assignment; subsequent visits could reflect endogenous selection by patients who preferred a particular doctor or outcome. Using only first visits eliminates this concern. The restriction reduces the sample from approximately 378,000 to 210,171 patients (56% of the original), paired with 143 doctors who each had at least 600 randomized consultations.

Conditional random assignment: The allocation mechanism by which patients selecting the “first available doctor” option in digital primary care were assigned to whichever certified doctor was next in the shift roster, conditional on 3-hour shift-by-date strata — a by-product of the first-come-first-served queue rather than an intended experimental design.

Average Match Function (AMF): The conditional mean of a patient outcome given observable doctor type and patient type under random assignment, β(x,w) = E[Y|X=x, W=w], which serves as the building block for evaluating counterfactual reallocation policies.

Average Reallocation Effect (ARE): The difference in expected patient outcomes between a counterfactual doctor-patient assignment and the status quo random assignment, taking into account the externality on the patient from whom a high-skilled doctor is moved.

Task-specific doctor skill: The paper’s finding that primary care physician effectiveness is not governed by a single latent ability but varies across distinct tasks — triage/risk prediction, guideline-consistent prescribing, and minimizing unnecessary follow-up care — with skills across tasks not positively correlated.

Avoidable hospitalization: A hospital admission coded to a diagnosis (per Page et al. 2007 ICD-10 classification) defined in the medical literature as preventable by adequate and timely primary care, used as the primary high-stakes outcome measure (0.2% incidence in the sample within 90 days of a digital consultation).

Counter-guideline prescription: A prescription of an antibiotic in violation of one of 16 guidelines from Sweden’s Strama antibiotic resistance programme, all of which are designed to limit use or require narrower-spectrum first-line antibiotics; used as the primary guideline-adherence outcome (2% incidence in the sample).

Empirical Bayes shrinkage: A procedure applied to raw doctor value-added estimates in which the noisy estimate of doctor quality is multiplied by the ratio of signal variance to total (signal plus noise) variance, yielding a best linear predictor of the underlying doctor random effect and reducing noise from small-sample estimation.

Do Financial Concerns Make Workers Less Productive?

Mon, 01 Jan 0001 00:00:00 +0000

Do Financial Concerns Make Workers Less Productive?

Research Question

The paper tests whether financial concerns distract workers sufficiently to meaningfully reduce their productivity, and whether receiving cash — by alleviating those concerns — can raise output even when total compensation is held fixed.

Setting and Sample

The experiment involves 408 low-income male agricultural casual laborers in rural Odisha, India, recruited from 47 villages across five worksites in four districts. The study takes place during the lean agricultural season (March–June 2017 and 2018), when formal employment is scarce (workers found paid wage work on only 1.9 days per week on average). During this period, 86% of workers reported being “worried” or “very worried” about their finances, 68–71% carried outstanding loans, and 64–66% said they would have difficulty coming up with Rs. 1,000 (roughly four days of wages) in an emergency. Workers bring these burdens to the job: on a given day, approximately one in two workers reported thinking about financial worries while working.

Experimental Design

Workers were employed for twelve days in a piece-rate manufacturing task — stitching sal tree leaves into disposable plates for restaurants. The payment-timing manipulation is the core of the identification strategy. Control workers received all accrued earnings as a lump sum on the final day (day 12). Treatment workers received their earnings in two installments: an interim payment of earnings to date on day 8 or 9 (randomly staggered across waves), with the balance paid on day 12. Total compensation was held constant across groups; only the timing of receipt differed. On day 5 (the “announcement day”), each worker learned his payment schedule individually. The design thus separates the announcement period (days 5 through the interim payment day, when workers know their schedule but have not yet received cash) from the post-pay period (days after the interim payment until the contract end). This enables the authors to test whether productivity effects arise from information about impending cash, or only once cash is physically in hand.

First Stage: Effects on Financial Strain

Within three days of receiving the interim payment, treated workers increased loan repayments by Rs. 271, a 287% increase relative to the control group mean (p < 0.001), and were 40 percentage points (222%) more likely to repay any loan (p < 0.001). The majority of repayments occurred on the same evening as the cash disbursement — a 746% single-day increase in loan payments. Household expenditures on food, clothing, and essentials rose by 40% (Rs. 150) over three days (p < 0.001). Treatment workers also reported feeling more focused on the work task (11.5 percentage points more likely, p = 0.032) and were less likely to report thinking about financial worries while making plates (13.7 percentage points, p = 0.044).

Main Productivity Results

In the post-pay period, treated workers increased output by 0.109 SD (6.9%) relative to the control group (p = 0.020). No treatment effect emerged during the announcement period (0.014 SD, p = 0.685); the post-pay and announcement-period effects are statistically distinguishable (p = 0.008). Because work hours are fixed and daily attendance is 98.3% with no treatment effect on attendance, these gains reflect improvements in how quickly workers produce plates per hour of work.

Effects are concentrated among workers with below-median baseline wealth (fewer assets, less liquidity): for this subgroup, the interim payment increases output by 0.204 SD (13.0%, p = 0.003). For workers with above-median wealth, the effect is close to zero and statistically insignificant (p = 0.819).

Attentiveness Results

Beyond total output, the authors measure attentiveness through three markers embedded in the finished plates: the number of “double holes” (paired stitching holes indicating a removed mistaken stitch), the number of leaves used, and the number of stitches used. These measures are collected unbeknownst to workers and combined into an “attentiveness index.” After receiving the interim payment, treated workers’ attentiveness index increased by 0.077 SD across all workers (p = 0.092); among poorer workers, attentiveness increased by 0.17 SD (p = 0.041). This improvement occurred simultaneously with higher output speed — workers were producing plates faster while also making fewer mistakes, suggesting improved cognitive engagement rather than mere effort intensification.

Piece-Rate Comparison

In separate supplementary rounds with 150 experienced workers, the authors varied piece rates (Rs. 2, 3, or 4) while holding overall earnings constant. Each one-rupee increase in the piece rate raised output by 0.020 SD (p = 0.042). Critically, piece-rate increases produced no detectable change in the attentiveness index (point estimate negative, statistically insignificant), and the piece-rate effect on output differs significantly from the attentiveness effect (p = 0.001). This indicates that consciou effort and automatic attentiveness can move independently: higher incentives increase pace but do not reduce attentional lapses, whereas financial relief increases both pace and attentiveness.

Alternative Explanations Ruled Out

The authors systematically address gift exchange/fairness, trust, nutrition, and sleep. Fairness and gift-exchange stories are inconsistent with: (i) no detectable announcement-period effect; (ii) no decline in control-worker effort when treatment workers are paid before them; (iii) the pattern of effects being concentrated among poorer workers; and (iv) attentiveness being affected when it is not a sanctioned quality dimension for payment. Nutritional channels are inconsistent with overnight effect onset (nutritional stock changes are too slow biologically), no treatment effect on breakfast consumption patterns, and productivity effects persisting through the end of each workday. Sleep channels are inconsistent with no treatment effect on hours or quality of sleep.

Scope Conditions and Implications

The effect operates through the actual arrival of cash, not its anticipation, consistent with a model in which automatic cognitive inputs — unlike consciously chosen effort — respond to current financial strain rather than expected future income. Effects are concentrated among more financially constrained workers within an already-poor sample. The authors do not identify the specific psychological mechanism (worry, anxiety, affect, or rumination) but interpret results as evidence that financial strain, at least partly through psychological channels, reduces earnings exactly when money is most needed.

Q&A

Q1: Why does the experiment focus on payment timing rather than an outright transfer of additional money? Varying only payment timing — not total pay — holds constant both the piece-rate incentive and total wealth across treatment and control. An outright cash transfer would raise total lifetime income, potentially reducing effort through a neoclassical income effect (more lifetime wealth lowers the marginal utility of current consumption). By holding total compensation fixed and only shifting when it arrives, the design isolates the effect of financial strain per se, separable from any wealth or incentive effect.

Q2: Why is there no treatment effect during the announcement period, and why does this matter? Between day 5 (when workers learn their payment schedule) and the interim payment date, treated workers know cash is coming but have not yet received it. Output in this window shows no treatment effect (0.014 SD, p = 0.685), and the announcement effect is significantly smaller than the post-pay effect (p = 0.008). This matters because it rules out mechanisms that should operate on information alone — including gift exchange, trust updating, or effort responses to higher discounted expected income — and is consistent with a model in which financial strain falls only when cash is physically received (e.g., moneylenders do not relent until the loan is actually repaid).

Q3: What is the attentiveness index and how was it constructed? The attentiveness index averages three plate-level markers: (i) number of “double holes” — pairs of stitching holes indicating a mistaken stitch was removed; (ii) number of leaves used; and (iii) number of stitches used. Each component was normalized using the control group’s post-pay mean and standard deviation, then averaged and reverse-coded so that higher values denote better attentiveness (fewer mistakes, fewer leaves, fewer stitches). Workers were unaware these dimensions were being measured. The index thus captures the number of unforced steps a worker took to complete a plate — a behavioral trace of cognitive lapses.

Q4: How do the piece-rate rounds demonstrate that effort and attentiveness are separable? In supplementary rounds (150 workers, 2019), piece rates were experimentally varied among Rs. 2, 3, and 4 per plate with the base wage adjusted to hold total earnings constant, so financial strain was unchanged. A one-rupee increase in the piece rate raises output by 0.020 SD (p = 0.042), consistent with a standard effort response. The same increase produces no discernible change in the attentiveness index (point estimate: negative but not significant), and the output and attentiveness effects are significantly different from each other (p = 0.001). This shows that workers can speed up via conscious effort without reducing attentional lapses, whereas the cash infusion raises both pace and attentiveness simultaneously — a pattern inconsistent with pure motivation as the mechanism.

Q5: What does the staggered timing within the treatment group (Wave A vs. Wave B) contribute to identification? Treatment workers were randomized to receive their interim payment on day 8 (Wave A) or day 9 (Wave B). On day 9, Wave B workers have not yet been paid while Wave A workers have. If fairness concerns drove control workers to reduce effort upon seeing colleagues paid first, control workers on day 9 — having observed Wave A payments the evening before — should work less hard relative to Wave B treatment workers (who have also not yet been paid). The authors find no such pattern: the triple interaction (Cash × Payment Day × Wave B) is close to zero and insignificant, ruling out effort reductions from seeing peers paid earlier.

Q6: What are the magnitudes and timing of the spending response to the cash infusion? Within three days of the interim payment, treatment workers spent Rs. 900 in total — roughly two-thirds of the average interim payment of over Rs. 1,400. On the day of the payment itself, loan repayments rose by Rs. 169 (746% increase), and household expenditures rose by Rs. 70 (68% increase). Over three days, loan repayments increased by Rs. 271 (287%), the probability of repaying any loan rose by 40 percentage points (222%), and total household spending rose by 65% (Rs. 371). These patterns indicate that the two main sources of financial stress cited by workers — outstanding debt and inability to meet household essentials — were directly addressed, suggesting a meaningful reduction in financial strain.

Q7: Why are the productivity effects concentrated among poorer workers, and what are the two interpretations? Workers with below-median baseline wealth (fewer assets, lower liquidity) show a 0.204 SD (13.0%) productivity gain, while workers above the median wealth threshold show essentially no effect. The authors offer two interpretations. First, poorer workers may start from a higher level of financial strain, giving the intervention more scope to reduce it. Second, since all workers in the sample are objectively poor and report similar baseline financial worries and loan levels, the more likely explanation is that the interim payment is larger relative to the wealth and income buffer of poorer workers, making the same nominal cash infusion more meaningful for them. Both richer and poorer workers in the sample use the interim payment to repay loans and cover household needs.

Q8: How do the authors rule out nutritional channels? Two tests address nutrition. First, workers were not at subsistence — 94% reported missing no meals the prior week — and increased food spending cannot change the nutritional stock overnight (the medical literature indicates nutritional-stock effects on cognition operate over longer time horizons). Second, and more precisely, all food consumed at the worksite during the workday was provided by the researchers, so differential pre-worksite breakfast consumption is the only plausible same-day biological channel. The authors find no treatment effect on breakfast consumption (whether workers had breakfast, how much, or what they ate). Further, if blood sugar or satiety drove effects, they should attenuate over the workday as all workers are given the same afternoon meal; instead, treatment effects persist and if anything increase through the final hours of the workday.

Q9: What does the self-report evidence on focus and worry show, and why is it treated as suggestive rather than primary? Two days after the interim payment, workers were asked an open-ended question about what they were thinking about while working. Treatment workers were 11.5 percentage points (15.5%) more likely to report feeling focused on the task (p = 0.032) and 13.7 percentage points (32.7%) less likely to report thinking about financial worries (p = 0.044). A supplementary test showed treated workers were 10 percentage points (31%) more likely to generate explanations for a low-income person’s negative affect that were unrelated to financial concerns (p < 0.05), suggesting a broadening of cognitive scope. These measures are treated as suggestive because they were collected only at a single point and are self-reported; the primary evidence rests on objective production data because it is more objective and collected at fine hourly resolution throughout the post-pay period.

Q10: What does the paper say about optimal payment frequency as a policy implication? The authors are cautious in drawing a direct policy inference about paying workers more frequently. While the positive productivity effect of early payment points toward more frequent paydays reducing financial strain, this must be weighed against workers’ self-control problems in consumption. In settings where workers face lumpy expenditure needs (e.g., monthly rent), more frequent payments could cause under-saving and worsen strain at the time of lumpy bills. The authors suggest payment frequency or size that matches expenditure needs, or more generally financial products that allow workers to time income receipts to coincide with expenses, as potentially more robust solutions — noting that such products appear largely absent in these markets.

Key Concepts

Financial strain (as used in the paper): A psychological burden arising from pressing present needs for resources — defined in the authors’ model as increasing in both the current marginal utility of consumption (i.e., how valuable an additional rupee would be today) and the level of outstanding debt (including lender harassment pressure). Strain is present-oriented: it responds to current cash-on-hand and debt levels, not to expected future income, which is why anticipating a payment does not fully relieve it.

Automatic input (a): In the authors’ behavioral model, one of two inputs into production. Unlike “effortful” input (e), which the worker consciously controls (speed of hands, consciously directed attention), the automatic input captures cognitive functions that are beyond the worker’s full control — background attentional processes that can be degraded by financial strain even when a worker is motivated and exerting high effort. The key behavioral assumption is that a falls when financial strain is high, independently of chosen effort.

Attentiveness index: A composite measure constructed from three unincentivized physical markers embedded in completed leaf plates: (i) number of double holes (pairs indicating a stitch was removed to correct a mistake); (ii) number of leaves used; (iii) number of stitches used. The index is normalized to the control group’s post-pay distribution and reverse-coded so higher values denote better attentiveness. Workers were unaware these dimensions were measured. The index captures attentional lapses — unforced errors that increase the number of steps and time needed to complete each plate.

Announcement period: The days between when workers are individually informed of their payment schedule (day 5) and when the interim payment is actually disbursed (day 8 or 9). This window serves as a within-experiment control: if effects arose from information about impending cash (e.g., through discounting, gift exchange, or trust), they should appear here. The consistent absence of treatment effects during this period is a key identification result.

Post-pay period: The days from the interim payment until the contract end (day 12). The main productivity and attentiveness treatment effects are estimated in this window, comparing treatment workers (who have received cash) to control workers (who have not yet been paid).

Lean season: The months outside the peak agricultural planting and harvesting periods (roughly six to eight months per year in the study area) during which agricultural workers seek intermittent casual employment in manufacturing, construction, and other sectors. Employment rates are low (1.9 paid days per week on average), income is low and variable, and financial strain is correspondingly high. The experiment is intentionally conducted during this period to maximize baseline levels of financial concern.

Piece-rate elasticity of effort: The responsiveness of output to changes in the marginal return per unit produced (the piece rate), holding financial strain constant. In the supplementary rounds, a one-rupee increase in the piece rate raises output by 0.020 SD. The authors interpret this as the upper bound on how much pure motivational effort can move output in this task, and use it to benchmark the cash infusion effects, which are roughly five times larger per unit of treatment variation and additionally move attentiveness (which piece-rate changes do not).

Firm Accommodation After Workplace Disability: Labor Market Impacts and Implications for Subsidy Design

Mon, 01 Jan 0001 00:00:00 +0000

Layer 1 — Overview

Research Question

This paper studies (1) how firm accommodation decisions respond to financial incentives in the context of workplace disability under workers’ compensation, (2) what the causal effect of accommodation is on workers’ subsequent labor market outcomes, and (3) whether the equilibrium level of accommodation is socially efficient, and what the welfare implications of wage subsidies for accommodation are.

Empirical Context and Data

The analysis uses the universe of Oregon workers’ compensation claims from 2005 through 2017 — over 131,000 disabling claims — linked to longitudinal quarterly earnings records from the Oregon Employment Department. The setting exploits Oregon’s Employer at Injury Program (EAIP), which subsidizes employers who provide “transitional work” accommodations (primarily through wage subsidies) to workers with temporary workplace disabilities. EAIP accounts for roughly 25 percent of claims on average, with the wage subsidy component representing over 96 percent of EAIP expenses.

Identification Strategy

The authors exploit a policy change in July 2013 that reduced the EAIP wage subsidy rate from 50 percent to 45 percent. They construct a firm-level “exposure” measure — the fraction of a firm’s claims that used EAIP in a baseline period (2005–2009) — and estimate a continuous difference-in-differences specification in which the interaction of exposure and a post-2013 indicator instruments for accommodation. The identifying assumption is strong parallel trends: firms with low baseline exposure are unlikely to respond to the subsidy reduction, while high-exposure firms respond more, generating cross-firm variation in accommodation rates after 2013. An MTE framework (Heckman and Vytlacil 2005) is then used to explore heterogeneous treatment effects along an unobserved resistance-to-treatment dimension.

Main Empirical Findings

The subsidy reduction from 50% to 45% decreased accommodation rates by 2.9 percentage points (9.3 percent) for claims in firms with average exposure, implying a subsidy elasticity of accommodation of 0.9.
The policy change led to a 0.95 percentage point decrease in employment and a $120 decrease in quarterly earnings four quarters after disability for claims in average-exposure firms (roughly 1.3–1.5 percent declines relative to means), with no significant effect on worker turnover to other firms.
IV estimates of the effect of accommodation itself (using predicted EAIP as instrument) show accommodation increases the probability of employment four quarters after disability by 33 percentage points and increases quarterly earnings by approximately $4,100.
The MTE analysis reveals negative selection on gains: workers with workplace disabilities who are least likely to receive accommodation have the highest potential gains from it, driven largely by severe disabilities with high accommodation costs.
Descriptive and IV evidence is consistent with accommodation operating primarily as general human capital investment: accommodation has no statistically significant effect on the probability of moving to a new firm, and earnings gains are not systematically lower for workers who change employers after accommodation.

Structural Model and Counterfactual Findings

A two-period frictional labor market model with risk-averse workers, risk-neutral firms, Nash bargaining, imperfect experience rating in workers’ compensation, and firm accommodation as human capital investment is developed and estimated. Two inefficiency sources are identified: (1) a human capital externality — because accommodation builds general human capital, firms cannot capture the full surplus when workers separate, reducing accommodation incentives; and (2) a fiscal externality — imperfectly experience-rated firms do not fully internalize the workers’ compensation cost savings from accommodation, further depressing it below the efficient level. Counterfactual simulations show:

Eliminating wage subsidies (from 50% to 0%) reduces accommodation rates from 33% to 11%, leading to a 7% decline in post-disability employment and a 15% decline in post-disability quarterly wages (roughly $1,358).
A revenue-neutral reform eliminating wage subsidies reduces average welfare and the welfare of more than 90% of workers.
Welfare gains from the subsidy are larger for low-skilled workers than high-skilled workers.
Conditional on experiencing disability, eliminating wage subsidies decreases welfare by about 10%, while increasing the subsidy to 100% raises welfare for disabled workers by around 30%.
Firm profit is maximized at a subsidy rate around 80%, after which higher taxes offset accommodation gains.

Layer 2 — Q&A

Q1: What is the Employer at Injury Program (EAIP), and how does it differ from standard workers’ compensation?

A1: EAIP is an optional component of Oregon’s workers’ compensation system that subsidizes employers for the costs of accommodating workers with temporary disabilities during a transitional return-to-work period. Unlike standard workers’ compensation premiums (which are experience-rated at the firm level), EAIP is funded through a flat payroll tax on all firms that is not experience-rated — meaning firms that use EAIP do not pay higher premiums. The wage subsidy component accounts for over 96 percent of EAIP expenses; other reimbursable costs (worksite modifications up to $5,000, retraining up to $1,000, clothing up to $400) are rarely used. Eligible employers must be the employer at which the disability occurred, and accommodation is limited to a transitional period during which workers cannot simultaneously receive time-loss benefits.

Q2: How is firm-level “exposure” constructed, and what is the rationale for using it as an instrument?

A2: Exposure is the fraction of a firm’s workers’ compensation claims that used EAIP during a five-year baseline period from 2005 to 2009 — a separate historical period chosen to reduce volatility and avoid mean-reversion. The rationale draws on prior work (Aizawa et al., 2022) showing that firm fixed effects account for nearly 25 percent of variation in accommodation, far more than worker or disability characteristics (1 and 3 percent, respectively), suggesting permanent firm-level heterogeneity in the relative benefits and costs of accommodation. Firms with zero historical exposure are unlikely to change accommodation behavior in response to a subsidy reduction, while high-exposure firms respond more, creating differential quasi-experimental variation in accommodation rates after July 2013.

Q3: What are the first-stage and reduced-form results from the DID specification?

A3: The first-stage DID coefficient shows that a ten-percentage-point increase in exposure is associated with a one-percentage-point decrease in EAIP take-up after 2013, implying a 2.9 percentage point decrease for claims in firms with average exposure (mean 0.27). The corresponding reduced-form results show a 0.35 percentage point decrease in employment four quarters post-disability and a $45 decrease in quarterly earnings for every ten-percentage-point increase in exposure, scaling to 0.95 percentage points and $120 at average exposure. There is no statistically significant effect on the probability of moving to a new firm. Pre-trend tests show parallel accommodation trends across exposure terciles prior to 2013, supporting the identifying assumption.

Q4: What do the IV estimates imply about the causal effect of accommodation on labor market outcomes?

A4: Under the exclusion restriction that the subsidy change affects labor market outcomes only through accommodation, the IV estimates imply that receipt of accommodation increases the probability of employment four quarters after disability by 33 percentage points (against a mean of 72 percent) and increases quarterly earnings by approximately $4,100 (against a mean of $7,807). There is no significant effect on the probability of working at a new firm four quarters later. The authors note these large estimates reflect local average treatment effects for compliers — workers whose accommodation status was changed by the instrument — who disproportionately have high unobserved resistance to treatment and high accommodation returns, explaining the magnitude.

Q5: What does the MTE framework reveal about the distribution of accommodation effects and selection?

A5: The MTE curves show that workers with the highest unobserved resistance to treatment (least likely to receive accommodation) have the highest potential employment and earnings gains from accommodation. This negative selection on gains arises because these workers tend to have worse employment outcomes in the untreated state, consistent with more severe disabilities commanding higher accommodation costs. IV weights are concentrated at high-resistance values, explaining the large IV estimates. Negative selection on gains is also found along observable dimensions: workers in self-insured firms, healthcare support occupations, women, and those with wounds/cuts/burns show larger gains but lower likelihood of receiving accommodation.

Q6: What evidence supports characterizing firm accommodation as general rather than firm-specific human capital investment?

A6: Three pieces of evidence point toward general human capital. First, the IV estimate shows accommodation has no statistically significant effect on the probability of working at a new firm four quarters after disability. Second, a triple-interaction specification (DID interacted with new-firm indicator) yields suggestive evidence of even larger earnings gains for workers who move to a new firm post-accommodation, though this is not statistically significant — a pattern inconsistent with firm-specific human capital. Third, the subset of claims that receive non-wage EAIP benefits (worksite modifications, retraining) do show lower mobility, but this comprises fewer than 5 percent of the sample, meaning the predominant form of investment in the context is general in nature.

Q7: What are the two sources of market inefficiency in accommodation identified in the model?

A7: The first is a human capital externality operating through worker turnover. Because accommodation builds general human capital that workers carry to new employers, a firm accommodating a worker does not capture the portion of future surplus that accrues to future employers upon separation. In a Nash bargaining framework with lack of commitment, this dynamic inefficiency is larger when industry-wide turnover rates are higher — consistent with the descriptive finding that accommodation rates are strongly negatively associated with industry separation rates. The second is a fiscal externality from imperfect experience rating: firms whose workers’ compensation premiums are not fully linked to their own claim costs do not fully internalize the cost-savings from accommodation (i.e., reduced time-loss benefit payments), leading them to accommodate at inefficiently low rates.

Q8: How is heterogeneity incorporated in the structural estimation, and what do the estimated parameters show?

A8: The model incorporates observed heterogeneity (firm insurance status, worker skill type — measured by pre-disability wages — firm baseline exposure, and pre/post policy change) and unobserved heterogeneity mapped to the MTE framework’s unobserved resistance to treatment. Indirect inference matches cross-sectional accommodation rates, earnings by subgroup, and the DID coefficients. Key findings: net output during the disability period is negative (accommodation is a costly short-run investment), while post-disability output is higher for accommodated workers. Low-skilled workers experience larger productivity gains from accommodation than high-skilled workers. Accommodation cost shock variance is lower for higher unobserved types, meaning high-gain workers are also more sensitive to subsidy changes, consistent with the large IV estimates. The model fits the DID coefficients for accommodation, employment, and wages well.

Q9: What do the counterfactual simulations show about the welfare effects of varying the subsidy rate?

A9: Eliminating wage subsidies from the current 50% rate reduces the accommodation rate from 33% to 11% and lowers post-disability employment by 7 percentage points and post-disability quarterly wages by 15% ($1,358). From a welfare perspective, eliminating subsidies in a revenue-neutral reform reduces average ex-ante worker welfare and lowers welfare for more than 90% of workers. Conditional on experiencing disability, eliminating subsidies reduces welfare by about 10% while raising the subsidy to 100% increases welfare of disabled workers by around 30%. Firm profit is increasing in the subsidy rate up to about 80%, then decreases. Ex-ante worker welfare gains from the current 50% subsidy relative to no subsidy are modest in consumption-equivalent terms (at most 0.6% increase in consumption), partly because the disability probability is low (2.2%) and because unaccommodated workers still receive two-thirds wage replacement through time-loss benefits.

Q10: What distributional implications do wage subsidies have across worker and firm types?

A10: Welfare gains from higher wage subsidies are larger for low-skilled workers than high-skilled workers, so the subsidy has a redistributive dimension beyond efficiency correction. Welfare gains are also larger for workers in imperfectly experience-rated firms, where the fiscal externality creates the greater wedge from the efficient level. Self-insured firms, which already internalize workers’ compensation cost savings and thus accommodate closer to the optimal rate, benefit less from the subsidy and can even be made worse off if subsidies are set very high (since they bear higher flat payroll taxes with smaller marginal accommodation gains). The fraction of worker-firm matches experiencing welfare gains exceeds 90% under the benchmark subsidy level, indicating broad rather than narrowly concentrated gains.

Q11: How do the experience-rating channel and the worker-turnover channel interact in comparative statics?

A11: Model comparative statics show that reducing the job-to-job transition rate of workers with disabilities to one-quarter of its estimated value substantially raises accommodation rates, and this effect is more pronounced for imperfectly experience-rated firms than for self-insured firms. This occurs because self-insured firms already have a strong incentive to accommodate (to reduce workers’ compensation premiums), so turnover is less marginal for them. Forcing all firms to be self-insured (perfect experience rating) would substantially increase accommodation rates in currently imperfectly rated firms. Lowering the accommodation cost during the disability period (increasing net output during the disability period) also raises accommodation rates for both firm types.

Key Concepts

Firm Accommodation (EAIP): In this paper’s specific sense, accommodation refers to a firm’s decision to offer a worker with a temporary workplace disability “transitional work” — alternative tasks, modified duties, or flexible arrangements — during their recovery period, funded in part through Oregon’s Employer at Injury Program wage subsidy. Accommodation is distinct from simple early return to work; it functions as a form of human capital investment by potentially providing skill development opportunities and preventing human capital depreciation.

Exposure (Instrument): A firm-level continuous measure defined as the fraction of a firm’s workers’ compensation claims that used EAIP during a five-year baseline period (2005–2009). Exposure captures permanent, time-invariant firm-level propensity to accommodate, and is used to construct a difference-in-differences instrument for the causal effect of accommodation by interacting exposure with a post-2013 indicator (when the subsidy rate was cut from 50% to 45%).

Imperfect Experience Rating: The degree to which a firm’s workers’ compensation insurance premium adjusts to reflect that firm’s own claims costs, rather than being set at an industry average. Fully experience-rated (self-insured) firms internalize 100% of claim costs and thus have strong incentives to accommodate. Partially experience-rated firms face a fiscal externality: because their premiums do not fully reflect their own time-loss benefit expenditures, they do not capture all the cost savings from accommodating workers, leading to under-accommodation relative to the social optimum.

Human Capital Externality (Dynamic Inefficiency in Accommodation): The mechanism — analogous to Acemoglu and Pischke (1999) and Fang and Gavazza (2011) — by which worker turnover reduces firms’ incentives to invest in general human capital (here, accommodation). When accommodation raises workers’ general productivity, part of the future surplus from this investment accrues to future employers upon job-to-job separation. With Nash bargaining and lack of commitment (re-bargaining in the second period), the accommodating firm cannot capture this surplus, creating a dynamic inefficiency that is more severe in high-turnover industries.

Negative Selection on Gains: The empirical finding, established via the MTE framework, that workers with workplace disabilities who are least likely to receive accommodation (highest unobserved resistance to treatment) have the largest potential employment and earnings gains from accommodation. This pattern arises because workers with more severe disabilities have high accommodation costs (making firms unwilling to accommodate them) but also face far worse counterfactual labor market outcomes without accommodation, creating large potential gains.

Marginal Treatment Effect (MTE): Following Heckman and Vytlacil (2005), the treatment effect of accommodation evaluated at a specific quantile of unobserved resistance to treatment — defined here as the propensity score value at which a worker is indifferent between treatment and non-treatment. The MTE curve maps out the full distribution of treatment effects and reveals who benefits (and by how much), how IV estimates are weighted averages over this distribution, and which compliers drive the large IV estimates.

General vs. Firm-Specific Human Capital (in Accommodation Context): Accommodation is characterized as general human capital investment if the productivity and earnings gains it produces are transferable across employers — i.e., if accommodated workers who move to new firms retain their wage gains. It is firm-specific if gains are tied to the current match. In this paper, general human capital is supported by the null effect of accommodation on new-firm employment probability, suggestive evidence of non-lower (possibly larger) earnings gains for new-firm movers, and the observation that fewer than 5% of claims use non-wage EAIP benefits associated with firm-specific investment.

Revenue-Neutral Counterfactual: A counterfactual policy experiment in which the wage subsidy rate for accommodation is varied while imposing that both the time-loss benefit program and the EAIP wage subsidy program remain budget-balanced. Higher subsidy rates raise firm accommodation, reduce time-loss benefit payouts (lowering base premiums for imperfectly experience-rated firms), but require a higher flat EAIP payroll tax on all firms, some of which is passed through to workers via lower first-period wages.

Homeownership, Polarization, and Inequality

Mon, 01 Jan 0001 00:00:00 +0000

This paper asks why job polarization and income inequality are higher in large U.S. cities, and proposes a novel housing-market mechanism that operates independently of — but interacts with — the skill-biased technical change (SBTC) explanations dominant in the existing literature.

The core argument is that large cities have experienced faster growth in house prices relative to both wages (price-wage ratio) and rents (price-rent ratio) since 1980. This excess price growth has priced middle-income households out of homeownership in expensive cities. Because low-income households cannot afford to own anywhere and high-income households can afford to own everywhere, it is specifically middle-income (middle-skilled) households whose location choice becomes entangled with their tenure choice. These households increasingly sort toward smaller, more affordable cities where they can purchase a home. This selective out-migration hollows out the middle of the income distribution in large cities, producing greater employment polarization and income inequality there.

Empirically, the paper uses Census and ACS data from 1980 to 2019 covering 465 commuting zones (CZs). Polarization is measured following Autor and Dorn (2013) by assigning 3-digit occupations to income percentiles fixed at 1980 levels; inequality is measured by the Gini coefficient and variance of log annual wages. Housing costs are captured by hedonic price and rent indices and three derived ratios. OLS and IV results (instrumented using the interaction of land unavailability and long-run changes in real interest rates) show that doubling of prices is associated with a 1 percentage point decline in the middle-skilled employment share; doubling of the price-rent ratio is associated with an 11.3 percentage point decline; doubling of the price-wage ratio with a 5.3 percentage point decline. Inequality follows the same pattern: doubling prices raises 100x the variance of log wages by 2.3 points; doubling the price-rent ratio raises it by 11.7 points; doubling the price-wage ratio by 7.7 points.

The migration mechanism is documented using 2001–2019 CPS ASEC data, which — uniquely among available sources — reports reasons for moving. A doubling of the price index, price-wage ratio, or price-rent ratio in the origin state relative to the destination raises the probability that a middle-income (2nd–4th quintile) household moves for housing-related reasons by approximately 5–10 percentage points in absolute terms, implying a 50–80% relative increase compared with low- or high-income households making a housing-related move.

The theoretical framework extends the standard spatial equilibrium (Rosen-Roback) model with two additions: skill heterogeneity and housing tenure choice. Households face a minimum house size constraint and a payment-to-income (PTI) constraint (calibrated at lambda = 0.308). These constraints create distinct skill thresholds for homeownership that vary by city; the interaction between location and tenure choices applies only to middle-skilled households who can afford ownership in cheap but not expensive cities.

In the quantitative model, calibrated separately for 1980 and 2019 with two locations (top 30 CZs vs. the rest), counterfactual experiments show that holding price-wage ratios at their 1980 levels reduces the excess polarization gap between large and small CZs by 93% and the excess inequality gap by 40%. Holding price-rent ratios constant reduces the polarization gap by 96% and the inequality gap by 27%. By contrast, shutting down SBTC entirely reduces the polarization gap by only 54% and the inequality gap by 73%. These results establish that while SBTC is an important driver, its effect on polarization and inequality is substantially amplified by faster house price growth in large cities; without the housing affordability channel, the effect of SBTC on disproportionate polarization would be 63–81% smaller and on the inequality gap 18–36% smaller.

Q: What is the paper’s central research question? A: The paper asks why job polarization and income inequality are systematically higher in large U.S. cities than in small ones. Prior literature attributed this to skill-biased technical change, external labor demand shocks, or IT-driven displacement of routine jobs; this paper proposes a complementary, housing-market-based explanation that does not rely on features of the production technology.

Q: What is the core mechanism linking house prices to polarization? A: When price-wage and price-rent ratios are higher in large cities, middle-income households face binding minimum-size and payment-to-income constraints that prevent them from owning a home there but not in cheaper cities. Because homeownership carries financial advantages, these households sort toward smaller, more affordable cities. Low-income households cannot afford ownership anywhere and high-income households can afford it anywhere, so only the middle group’s location choice is distorted by tenure considerations. This selective out-migration hollows out the middle of the income distribution in expensive large cities.

Q: What empirical patterns in CZ-level data motivate the paper? A: Doubling CZ size is associated with a 1.9 percentage point greater fall in the middle-skilled employment share and a 2.7 point higher growth in 100x the variance of log wages from 1980 to 2019. Larger CZs also experienced 3.4% higher price growth, 3.1% higher price-wage ratio growth, and a 10% greater increase in price-rent ratios. These associations persist after controlling for initial CZ size and other characteristics.

Q: What do the OLS and IV results show about house prices and polarization? A: A doubling of house prices is associated with a 1 percentage point decline in the middle-skilled share; a doubling of the price-rent ratio with an 11.3 percentage point decline; and a doubling of the price-wage ratio with a 5.3 percentage point decline. IV results using the interaction of land unavailability and the change in real interest rates as an instrument confirm the negative relationship remains statistically significant, suggesting a causal interpretation is plausible.

Q: What do the OLS and IV results show about house prices and income inequality? A: A doubling of prices is associated with a 2.3 point increase in 100x the variance of log wages; a doubling of the price-rent ratio with an 11.7 point increase; and a doubling of the price-wage ratio with a 7.7 point increase. IV results suggest a causal relationship between price growth and income inequality at the CZ level.

Q: What evidence does the paper provide for the migration mechanism? A: Using 2001–2019 CPS ASEC data (which reports stated reasons for moving, unlike the ACS), the paper estimates logit regressions of interstate migration for housing-related reasons. A doubling of the price index in the origin state relative to the destination raises the probability of a housing-related move for middle-income (2nd–4th quintile) households by 5–6 percentage points; a doubling of the price-wage ratio raises it by 6–7 percentage points; and a doubling of the price-rent ratio raises it by 7–10 percentage points. These effects imply a 50–80% relative increase in housing-related migration probability for the middle quintiles compared with the bottom or top quintile. Housing-related movers constitute over 12% of all interstate migrants in the sample.

Q: What is the key finding about homeownership rates? A: There is no statistically significant relationship between the change in homeownership rates and the growth in prices, price-rent, or price-wage ratios from 1980 to 2019. This is consistent with the model’s mechanism, in which middle-income households who cannot afford ownership in large cities move away rather than simply switching to renting there — so aggregate local ownership rates need not fall.

Q: How does the theoretical model generate the polarization result? A: The model extends the Rosen-Roback spatial equilibrium framework with skill heterogeneity and housing tenure choice. Two skill thresholds — one for minimum-size-constrained ownership and one for unconstrained ownership — interact with the price-wage and price-rent ratios of each city. Proposition 1 proves that a city with higher price-wage and price-rent ratios will have a lower middle-skilled share, because middle-skilled workers (those who can afford to own in cheap but not expensive cities) are drawn to cheaper locations. Proposition 2 shows that in a world with only renters or only owners, skill shares would be identical across cities regardless of price differences — the polarization result requires heterogeneity in tenure choice.

Q: What does the no-SBTC counterfactual show? A: Holding the parameters governing local returns to skills at their 1980 levels (shutting down skill-biased technical change) reduces the difference in the decline in the middle-skilled share between large and small CZs by 54% and the gap in the increase in the variance of log wages by 73%. This is broadly consistent with prior literature attributing the bulk of disproportionate polarization and inequality in big cities to SBTC.

Q: What do the constant price-ratio counterfactuals show? A: When price-wage ratios are held at 1980 levels (but SBTC is allowed to operate), the excess polarization gap between large and small CZs falls by 93% and the excess inequality gap by 40%. When price-rent ratios are held at 1980 levels, the polarization gap falls by 96% and the inequality gap by 27%. When both are held constant simultaneously, the polarization gap falls by 89% and the inequality gap by 27%. These results show that the effect of SBTC on polarization would be 63–81% smaller in the absence of the housing affordability amplification channel.

Q: Who are the largest losers from rising price-wage ratios in large cities? A: The counterfactual welfare analysis identifies middle-skilled workers with skill levels between approximately 0.29 and 0.80 as the primary losers. In the counterfactual with fixed price-wage ratios, workers with skills from 0.29 to 0.57 who previously could not afford ownership in large cities are now able to own there, and those with skills from 0.57 to 0.80 spend a smaller share of income on housing. This group either lost homeownership opportunities or was induced to move to less productive CZs by the actual price growth that occurred.

Q: How is the quantitative model calibrated and structured? A: The model is calibrated separately for 1980 and 2019 as two stationary spatial equilibria. It features two locations (the top 30 CZs, which account for 49.3% of employment, and the remaining CZs). Key parameters include a Frechet elasticity of 6.1, an agglomeration externality of 0.04, a PTI constraint of 0.308, and an annual discount factor of 0.96. Land shares differ between large and small CZs (0.3965 vs. 0.2239). The model finds that the price-rent ratio was relatively stable in large cities but fell in small ones, while the price-wage ratio increased much more in large CZs — both indicators point to purchasing a home becoming relatively more expensive in large CZs.

Q: What are the paper’s policy implications? A: Zoning reforms and other policies that increase housing supply in large, unaffordable cities could produce a more efficient spatial allocation of labor, greater aggregate productivity, and more economically diverse — less polarized and less unequal — cities, while also reducing the wealth gap between owners and renters. Policies that promote homeownership by reducing the cost of owning without raising housing supply may reduce local polarization and inequality but could lower aggregate output and do not necessarily increase homeownership rates.

Q: How does this paper relate to existing explanations for city-level polarization? A: The paper’s housing-market mechanism is explicitly complementary to SBTC-based explanations (Baum-Snow, Freedman, and Pavan, 2018; Cerina et al., 2023), external demand shock explanations (Davis, Mengus, and Michalski, 2020), and IT-displacement explanations (Eeckhout, Hedtrich, and Pinheiro, 2024). The paper’s key added contribution is that even if SBTC were the primary driver of disproportionate polarization, its measured effect would be substantially smaller in the absence of faster house price growth in large cities — the housing market amplifies rather than replaces the technology channel.

Job polarization (city-level): The hollowing out of middle-income employment shares in a commuting zone, measured as the change in the share of workers in occupations assigned to the 21st–80th income percentile (using the 1980 occupation-to-percentile mapping fixed over time). In this paper, polarization is greater in cities where price-wage and price-rent ratios grew faster, attributed to selective out-migration of middle-skilled households.

Price-wage ratio: The ratio of hedonic house prices to median annual wages in a commuting zone, constructed from Census and ACS data. A higher price-wage ratio tightens the payment-to-income constraint on potential homebuyers and is the primary driver of the skill threshold for homeownership in the model.

Price-rent ratio: The ratio of hedonic house prices to rents in a commuting zone. In the model, a higher price-rent ratio reduces the financial advantage of owning over renting, raising the skill threshold at which ownership becomes optimal. The paper treats price-rent and price-wage ratios as distinct channels that both independently amplify polarization.

Housing tenure choice: The household decision to own or rent, modeled as a discrete choice made at the start of life that interacts with location choice. Ownership requires satisfying both a minimum house size constraint and a payment-to-income (PTI) constraint (lambda = 0.308). The interaction between tenure and location choices is the paper’s key model innovation; it exists only for middle-skilled workers whose income is sufficient for ownership in cheap but not expensive cities.

Skill threshold for homeownership (s*_i): The minimum skill level at which a worker in city i chooses to own rather than rent, defined by Lemma 2. This threshold is decreasing in local labor productivity and increasing in price-wage and price-rent ratios. Workers with skill below s*_i in all cities always rent; those with skill above s*_i in all cities always own; those in between face city-dependent tenure choice that distorts their location decision.

Skill-biased technical change (SBTC): In the paper’s quantitative model, SBTC is represented by faster growth in the skill dispersion parameter (alpha_it) in large CZs, reflecting differential productivity growth concentrated at the top of the skill distribution. The paper finds SBTC accounts for 54% of the polarization gap and 73% of the inequality gap in its counterfactual, but argues its effect is amplified 4–5x by the housing affordability channel.

Payment-to-income (PTI) constraint: The constraint that a homebuyer cannot spend more than a fraction lambda (calibrated at 0.308) of annual labor earnings on the annual housing payment (user cost times price times quantity). This constraint, together with the minimum house size, determines the income threshold for ownership and makes location and tenure choices interdependent for middle-skilled workers.

How Do You Identify a Good Manager?

Mon, 01 Jan 0001 00:00:00 +0000

This paper develops a novel experimental method to identify the causal contribution of managers to team performance, and uses it to evaluate which characteristics predict managerial effectiveness and how manager selection mechanisms affect organizational outcomes.

The core identification challenge is that managers are not randomly assigned to teams in the field, and field managers are a highly non-random sample, making it difficult to infer which traits genuinely predict managerial performance. The authors address this by repeatedly randomly assigning managers to multiple teams in a controlled laboratory experiment, then estimating each manager’s average causal contribution to group output after conditioning on group members’ individual productive skills. The intuition is that a good manager is someone who consistently causes their team to produce more than the sum of their parts.

The experiment was conducted at the University of Essex lab with 555 participants (46% female, mean age 25, ethnically diverse) forming 728 groups of three across four rounds. Each group consisted of one manager and two workers who performed a Collaborative Production Task requiring coordination across three problem-solving modules (numerical, spatial, and analytical reasoning). The team score was the minimum module score — a weakest-link structure making coordination essential. Prior to group testing, all participants completed individual assessments of task-specific skill, fluid intelligence (CFIT), emotional perceptiveness (Reading the Mind in the Eyes Test, RMET), economic decision-making skill (the Assignment Game, which measures resource allocation under comparative advantage), Big 5 personality, and demographic characteristics. Manager selection was randomly varied at the session level: in 20 sessions, the participant with the strongest preference for leadership became manager (self-promotion); in 19 sessions, managers were assigned by lottery.

The main quantitative findings are as follows. First, there are large, stable, and statistically significant manager effects: a manager one standard deviation above average improves team performance by approximately 0.23 standard deviations (p = 0.04). This estimate is roughly 90% the size of the combined productive skill coefficient for the two workers (approximately 0.26 sd), indicating that a good manager is roughly twice as valuable as a good individual worker. Manager contributions predict out-of-sample group performance in a leave-one-out procedure (p < 0.01).

Second, among randomly assigned managers, only two predictors significantly explain managerial performance: fluid intelligence (CFIT) and economic decision-making skill (Assignment Game scores), both significant at below the 1% level. Gender, age, and ethnicity do not predict managerial performance.

Third, self-promoted managers perform substantially worse than lottery-assigned managers, by approximately 0.10 standard deviations — roughly equivalent to being assigned a manager with fluid intelligence one full standard deviation below average. The mechanism is overconfidence: people who strongly prefer management roles are significantly more overconfident (d = 0.41 sd, p < 0.01) and exhibit a strong negative correlation between self-reported social skills and actual emotional perceptiveness on the RMET (r = -0.37, p < 0.001). Among self-promoted managers, self-reported extraversion and political skill are negatively correlated with managerial performance (rho = -0.24 and -0.26, p < 0.05); no such negative relationship appears among lottery managers.

Fourth, selecting managers on economic decision-making skill rather than self-promotion improves average manager quality by 0.6 standard deviations — equivalent to replacing an average worker in every group with a worker at the 99th percentile of individual productivity.

The three mechanisms through which good managers improve performance are: (1) monitoring — good managers (1 sd above average) cut monitoring errors from 16% to 8%; (2) optimal task allocation according to comparative advantage — groups with optimally assigned workers score 0.52 sd higher (p < 0.01); (3) worker motivation in late-stage effort — teams led by a 1-sd-above-average manager solve 0.6 more problems in the final two minutes versus only 0.3 more in the first two minutes.

The experiment was conducted in a university lab in the UK, and the sample skews toward graduate students with limited work experience. Generalizability to field settings is supported by prior evidence that peer productivity spillover experiments yield similar magnitudes in lab versus field settings, and that the estimated manager effects are similar to Lazear et al. (2015) estimates from a large employer dataset.

Q: What is the core methodological innovation of this paper? A: The paper requires repeated random assignment of managers to multiple teams, combined with controls for individual productive skill measured prior to group work. This allows identification of each manager’s average causal contribution to group output, rather than confounding management quality with team composition or individual worker ability. The key estimand is the standard deviation of individual manager effects (sigma_alpha), interpreted as the impact of having a manager one standard deviation above average.

Q: How large is the estimated manager effect, and how does it compare to worker effects? A: A manager one standard deviation above average improves team performance by approximately 0.23 standard deviations (p = 0.04 by randomization inference). This is roughly 90% the size of the combined productive skill effect of both workers together (approximately 0.26 sd), implying a good manager is nearly twice as valuable as a good individual worker. Without conditioning on production skills, the manager effect rises to 0.29 sd.

Q: What characteristics predict managerial performance among randomly assigned managers? A: Only two measures predict managerial performance in the lottery arm: fluid intelligence (CFIT) and economic decision-making skill (scores on the Assignment Game), both significant at below the 1% level. These predictors are robust to controls for demographics, education, work experience, emotional perceptiveness, and personality traits. Gender, age, and ethnicity do not predict managerial performance.

Q: What is the “Assignment Game” and why is it a strong predictor? A: The Assignment Game (Caplin et al., 2024) places participants in a simulated managerial role where they must assign fictional workers to tasks. Performing well requires understanding comparative advantage intuitively, managing an attentionally demanding numerical environment, and avoiding biases such as anchoring. The paper argues its strong predictive power reflects that good managers excel at allocating workers according to comparative advantage — which the experiment directly identifies as a key mechanism.

Q: How do self-promoted managers perform relative to lottery-assigned managers? A: Self-promoted managers perform approximately 0.10 standard deviations below lottery managers, and this gap is robust across model specifications. The performance deficit is roughly equivalent to being assigned a manager whose fluid intelligence is one full standard deviation below average. This finding implies that common organizational practice of selecting managers partly via self-nomination actively reduces team productivity.

Q: Why do self-promoted managers underperform? A: The paper attributes underperformance primarily to overconfidence. People strongly preferring management roles are significantly more overconfident than those without strong preferences (d = 0.41 sd, p < 0.01). Self-promoted managers specifically overestimate their social skills: among them, self-reported people skills are strongly negatively correlated with actual emotional perceptiveness on the RMET (r = -0.37, p < 0.001), and self-reported extraversion and political skill are negatively correlated with managerial performance (rho = -0.24 and -0.26, p < 0.05). None of these negative relationships appear among lottery managers.

Q: Who wants to be a manager, and does it differ by gender? A: The three variables most strongly correlated with wanting to be in charge are extraversion, risk appetite, and being male. The relationship between high extraversion and preference for management is driven largely by men. Women are much less likely to nominate themselves for leadership roles despite being equally or more effective on average — a finding consistent with broader experimental evidence on gender and leadership self-selection.

Q: How large are the potential gains from skill-based manager selection? A: Compared to self-promotion, selecting managers based on economic decision-making skill yields managers who are 0.6 standard deviations better in terms of estimated manager effects. In terms of group performance, this is equivalent to replacing an average worker in every group with a worker at the 99th percentile of individual productivity. Selecting on both economic decision-making and fluid intelligence outperforms random assignment, selection on social skills, or selection on worker task performance (the Peter Principle).

Q: What are the three mechanisms through which good managers improve team performance? A: First, monitoring: good managers (1 sd above average) reduce monitoring errors — defined as having a worker on a module substantially above the minimum score at task end — from 16% to 8% (bivariate correlation with manager performance = -0.40, p < 0.001). Second, optimal task allocation: the probability of finding the optimal comparative-advantage-based assignment is positively associated with manager performance (rho = 0.19, p < 0.01), and groups with always-optimal starting assignments score 0.52 sd higher than those with never-optimal assignments (p < 0.01). Third, worker motivation: team performance in the final two-minute period is about 50% more influential for overall outcomes than the first two minutes (p = 0.038), and 1-sd-above-average managers generate 0.6 more problems solved in the final period versus 0.3 in the first, consistent with differential motivational effects emerging over time.

Q: What is the Peter Principle, and how does this paper relate to it? A: The Peter Principle refers to the practice of promoting employees based on their performance as line workers rather than their suitability for management — promoting individuals to their level of incompetence. Benson et al. (2019) document this selection pattern empirically. This paper shows that selecting managers on worker task skill is inferior to selecting on economic decision-making skill or fluid intelligence, confirming that task skill is not the right criterion for manager selection even if it predicts individual worker output.

Q: How does the paper validate that manager effects are real and not noise? A: The paper uses randomization inference with 5,000 simulated allocations to compute p-values, obtaining p = 0.04 for the main manager effect. Robustness checks include controlling for pre-existing social relationships, manager risk appetite, variance of individual scores, and granular skill measures — all yielding estimates near 0.22 sd. A leave-one-out out-of-sample prediction test confirms manager contributions significantly predict held-out group performance (p < 0.01), while the analogous worker out-of-sample estimate is less than half the magnitude and not statistically significant.

Q: What are the scope conditions on the experimental results? A: The experiment is conducted in a university lab in the UK with graduate students averaging 25 years of age and two years of work experience, limiting direct generalizability to experienced workers or senior management. The task lasts approximately 15 minutes, which may not capture longer-run managerial dynamics. Compensation equalized average earnings between managers and workers, which differs from most real-world settings. The authors note their effect-size estimates closely match Lazear et al. (2015) from a large employer, and that Herbst and Mas (2015) find lab peer-productivity experiments generalize to the field.

Manager Effect (sigma_alpha): The standard deviation of individual managers’ average causal contributions to group performance, estimated via repeated random assignment and conditioning on individual productive skill. Represents the impact of having a manager one standard deviation above average, estimated at approximately 0.23 standard deviations of group output.

Collaborative Production Task: A novel lab group task in which a manager and two workers solve problems across three modules (numerical, spatial, analytical reasoning), with team score defined as the minimum module score (weakest-link structure). Managers are responsible for worker assignment, monitoring, and motivation; workers face no financial performance incentives.

Economic Decision-Making Skill: Defined by Caplin et al. (2024) as the ability to make good resource allocation decisions, assessed via the Assignment Game in which participants must optimally assign workers to tasks under comparative advantage. The single strongest predictor of managerial performance in the lottery arm.

Monitoring Failure: Defined in the paper as having any group member working on a module at task end whose score is substantially greater (e.g., 10 points higher) than the minimum module score — meaning the worker’s effort is not contributing to the group score. Occurs in 16% of groups overall; managers one sd above average reduce this to 8%.

Self-Promotion (as selection mechanism): A treatment condition in which the participant with the strongest stated preference for being manager (on a 1-10 scale) is assigned the managerial role. Contrasted with lottery assignment; self-promoted managers perform approximately 0.10 sd worse than lottery managers.

Overconfidence (in managerial context): The gap between self-assessed skill (particularly social/interpersonal skill) and objectively measured skill (e.g., RMET score). Self-promoters are significantly more overconfident (d = 0.41 sd), and overconfidence is strongly negatively correlated with actual emotional perceptiveness (r = -0.33, p < 0.001).

Comparative Advantage Allocation: The practice of assigning each worker to the module in which they have the highest relative (not absolute) performance advantage. Captured via whether a manager selects the optimal one-to-one assignment given pre-measured individual module scores; groups with always-optimal allocation score 0.52 sd higher.

Life-Cycle Wages and Human Capital Investments: Selection and Missing Data

Mon, 01 Jan 0001 00:00:00 +0000

Layer 1 – Overview

Research Question

This paper asks how wage inequalities build up over the life cycle when individual wage trajectories are plagued by interruptions in private-sector participation, and when the standard Missing At Random (MAR) assumption used to handle those gaps may be violated. Specifically, it asks: what is the causal effect of career interruptions on both the level and the dispersion of wages after twenty years of potential experience, and does endogeneity of those interruptions matter for the dispersion result?

Data and Sample

The empirical analysis uses the 2011 DADS Grand Format-EDP panel, a French administrative dataset merging social security records (DADS) and census extracts (EDP). The working sample covers males who entered the private sector between 1985 and 1992, aged 16-30 at entry, and observed through 2011. The authors require at least 15 years of observed private-sector wages, yielding a working sample of 7,004 males and 137,315 person-year observations. Education is grouped into four levels (high-school dropouts, high-school graduates, some college, college graduates). Participation outside the private sector – including public-sector employment, self-employment, unemployment, and non-employment – constitutes the “alternative sector” and generates missing wage observations. On average, cumulative duration outside the private sector is 3.7 years, and the average number of interruptions is 1.44.

Model and Methodology

The paper builds on a structural Ben Porath (1967) human capital model extended to two sectors (private sector and an alternative sector), yielding a reduced-form log-wage equation with five individual-specific coefficients: an intercept (initial human capital), a linear trend in potential experience (growth rate), a curvature term in potential experience (Mincer concavity), the cumulative years of interruptions, and a curvature term in interruptions. Because parameters are individual-specific, the wage equation is a random-coefficient model estimated with a fixed-effects approach.

Selection into the private sector is addressed not by a standard MAR assumption but by a weaker “Missing At Random Conditionally On Factors” (MARCOF) assumption. Sector-preference shocks, human capital prices, and depreciation rates are each decomposed into a common factor (time-varying) and an individual factor loading, plus a residual that is mean-independent of factors and loadings. Conditional on factors and factor loadings, wage residuals and sector choices are independent, making covariates – including the interruption variables – exogenous. The preferred specification includes two unobserved factors, selected by four of six Bai-Ng (2002) information criteria.

Estimation proceeds via an Expectation-Maximization (EM) algorithm adapted from Bai (2009) and Song (2013), with initial values from Moon and Weidner (2018)’s nuclear-norm convex estimator. Because individual parameters converge at rate sqrt(T) and summary statistics of their distributions suffer from incidental-parameter bias, the authors use bias-correction methods from Jochmans and Weidner (2019) for quantiles and inter-decile ranges, and from Arellano and Bonhomme (2012) for variances. Monte Carlo experiments confirm that variances remain poorly corrected even when T > 20, so the paper focuses on inter-decile ranges as the dispersion measure.

Counterfactual “average structural functions” (Blundell and Powell, 2003) are constructed by holding individual parameters fixed and manipulating the history of interruptions. These compare four scenarios: the observed benchmark, the counterfactual with no interruptions (potential wage), the counterfactual with no current-period selection, and both combined.

Main Findings

Downward bias from omitting interruptions and factors. Omitting interruption variables and unobserved factors strongly downward biases estimated returns to experience after 20 years. Most of this bias is attributable to interruptions rather than to the interactive factor effects: selectivity is mainly captured through the interruption channel, not through residual factor structure.
Effect on mean wages. Potential experience increases log wages by approximately 65% over 20 years, consistent with cross-country evidence from homogeneous Mincer equations. The average cost of interruptions after 20 years is approximately 10% of log wages. Reassigning interruptions to the beginning of the working life has a persistent negative effect on mean log wages that never fully recovers over 20 years, while reassigning them to the end increases mean wages above the no-interruption benchmark at every experience level.
Effect on wage dispersion – a new stylized fact. Interruptions decrease, not increase, the inter-decile range of log wages after 20 years. After 20 years, with an average interruption duration of 2.47 years, interruptions decrease the inter-decile range by 0.52 log points (approximately 38%). This compression operates differentially: the 90th percentile falls by 0.34 and the 10th percentile rises by 0.18.
Endogeneity explains the dispersion compression. When years of interruption are randomly reassigned across time (holding total interruption years fixed), the inter-decile range diverges upward from the observed benchmark after about 5 years. This shows that the dispersion-reducing effect of actual interruptions is due to the endogenous timing of those interruptions – specifically to the negative correlation between the timing of interruptions and potential log wages – rather than to the correlation between the structural coefficients on interruptions and potential wages (which is also negative, with a Spearman rank correlation of -0.32 between eta_i1 and eta_i3). Endogenously chosen interruptions smooth inequality over time.
Current-period selection is negligible. Current-period selection into private-sector employment has no statistically significant effect on median, mean, variance, or inter-decile range of wages at any experience level, as confirmed by the small inter-decile range of the interactive factor component.

Scope Conditions

Results pertain to cohorts of French males entering the private sector between 1985 and 1992, restricted to those with at least 15 observed private-sector years. The French context is distinctive: wage inequality in the working population was stable over 1985-2011, driven in part by minimum wage policy and payroll tax exemptions for lower-skilled workers, in contrast to rising inequality in the United States and Germany. Results on timing of interruptions (eta_i3 and eta_i4) are identified only for individuals with at least two interruptions followed by re-entry (roughly those with K_T >= 2). The paper does not analyze female wages.

Layer 2 – Q&A

Q1: What is the structural model and how does it generate a reduced-form wage equation?

The model is a Ben Porath (1967) two-sector human capital model in which individuals divide time between investing in human capital and earning wages in either the private sector (e) or an alternative sector (n). Human capital accumulation in each sector has a sector-specific return rate (rho^s) and depreciation (lambda^s_t). Period utility is log income minus a quadratic investment cost, plus a sector preference shock. Solving the dynamic program backwards (because of log-linearity) yields closed-form optimal investments that are linear in the individual-specific terminal value of human capital (kappa). The resulting log-wage equation (Proposition 5) is a function of five terms: an intercept (eta_i0), a linear trend in potential experience t (eta_i1), a geometric curvature term beta^{-t} (eta_i2), cumulative years of interruptions x^(3)_it (eta_i3), and a curvature in interruptions x^(4)_it (eta_i4), all with individual-specific coefficients. This provides a tractable random-coefficient structure.

Q2: What is the MARCOF assumption and why is it weaker than MAR?

MARCOF – Missing At Random Conditionally On Factors – posits that sector-preference shocks, human capital prices, and depreciation rates each follow factor structures: a common time-varying factor (phi_t) multiplied by an individual loading (theta_i) plus an i.i.d. residual. The residuals are assumed mean-independent of factors and loadings, and independent over time. Under standard MAR, missingness is assumed independent of outcomes conditional on observables alone. Under MARCOF, residuals in the wage equation and the sector choice equation are independent conditional on (unobserved) factors and factor loadings. This is weaker than MAR because it allows the unobservable determinants of wages and participation to share common factors, accommodating the high persistence observed in human capital stocks (20-year lag correlation of 0.28, far above the geometric decay benchmark of 0.024).

Q3: How are the individual-specific parameters identified?

Under exogenous selection (or, under MARCOF, conditional on factors), identification of eta_i0, eta_i1, and eta_i2 requires variation in potential experience within the individual’s time series. Identification of eta_i3 and eta_i4 separately requires individuals to experience at least two spells out of the private sector each followed by re-entry (at least four transitions, so K_T >= 2). An individual with only one interruption spell generates proportional variation in x^(3) and x^(4), so only a linear combination of eta_i3 and eta_i4 is identified. The “flat spot” approach – using the observed fact that individuals aged 50-55 have stopped investing in human capital – separately identifies time, cohort, and age effects and provides the restriction that factors are orthogonal to the level, trend, and curvature in potential experience.

Q4: What do the distributions of estimated individual-specific coefficients look like?

Focusing on the main (two-factor) specification with bias correction: the median of the growth parameter eta_i1 is positive (consistent with rising wages with experience) and the median of the curvature parameter eta_i2 is negative (consistent with concavity). However, heterogeneity is substantial: the 90th percentile of eta_i1 is 6.2 times the median, and the first quartile of eta_i1 is negative (implying declining potential wages for a non-negligible share). For the interruption coefficients eta_i3 (year of interruptions) and eta_i4 (curvature), bias-corrected medians are close to zero in the sub-sample with >=2 interruptions, but dispersion is large and symmetric around zero. Bias correction reduces the 90th percentile of eta_i1 by approximately 20% and reduces the absolute 10th percentile of eta_i3 by approximately 27%.

Q5: How important are interruptions relative to potential experience and factors in explaining wage variation?

A wage decomposition using inter-decile ranges (preferred over variance due to bias) shows that the potential experience component is the largest contributor to wage dispersion, followed by the interruption component (described as “sizable”), while factors play a minor role. Crucially, the potential experience and interruption components are highly negatively rank-correlated: the Spearman rank correlation between the growth coefficient eta_i1 and the interruption coefficient eta_i3 is -0.32. This negative correlation is central to understanding why interruptions compress dispersion rather than expanding it.

Q6: What is the finding on the effect of interruptions on mean wages, and what does the timing experiment show?

After 20 years, the average cost of interruptions (relative to a counterfactual of no interruptions) is approximately 10% of log wages. The timing of interruptions matters: reassigning interruptions to the beginning of the working life causes a persistent loss in mean log wages that does not fully recover over the 20-year horizon, while reassigning them to the end raises mean log wages above the no-interruption level at every experience level. For median wages, the early-interruption loss is eventually recovered (median log wages do catch up), but the mean does not catch up. These asymmetries are consistent with early interruptions having a larger negative effect on human capital accumulation due to the geometric structure of investment returns.

Q7: What is the key finding on wage dispersion and what explains it?

Interruptions compress the inter-decile range of log wages by 0.52 log points (approximately 38%) after 20 years, with average interruption duration of 2.47 years. This compression is asymmetric: the 90th percentile of wages falls by 0.34 and the 10th percentile rises by 0.18. The dispersion-reducing effect is established by comparing the benchmark (observed interruptions) to the counterfactual of no interruptions. When interruptions are instead randomly reassigned across time (holding total interruption duration fixed), the inter-decile range diverges upward from the benchmark starting around 5 years of experience. This demonstrates that the compression is due to the endogenous timing of interruptions – individuals who have high potential wages tend to time their interruptions in ways that reduce the measured spread of actual wages – rather than to the negative structural coefficient (eta_i3 < 0 for high-wage workers on average).

Q8: How does the paper handle the incidental parameter problem for distributional statistics?

Because individual parameters are estimated at rate sqrt(T) and the panel is unbalanced (some individuals observed for as few as 15 years while the model has up to 7 individual parameters), standard distributional statistics like the variance suffer from substantial incidental parameter bias. Monte Carlo experiments show that bias-corrected variance estimates remain strongly biased even at T > 20. Inter-decile ranges are better behaved and the Jochmans and Weidner (2019) bias-correction procedure reduces their bias satisfactorily. This is why the paper reports inter-decile ranges as its primary dispersion measure rather than variances. The bias in corrected inter-decile ranges is at most approximately 10% of the uncorrected estimate.

Q9: What does the paper show about the MAR assumption in the context of this data?

The results directly challenge the MAR assumption that is standard in the life-cycle earnings literature. Under MAR, interruptions would be treated as random conditional on observables, and their endogeneity would be ignored. The paper shows that treating interruptions as endogenous (through the MARCOF + structural model approach) substantially changes estimated returns to experience (there is a strong downward bias when interruptions and factors are omitted) and reverses the sign of the effect of interruptions on dispersion (under exogenous interruptions, randomly reassigned, dispersion would be higher than observed; the actual compression is an artifact of endogenous timing). The conclusion is that MAR assumptions produce systematically misleading pictures of life-cycle wage inequality dynamics.

Q10: What are the robustness and external validity considerations?

The working sample excludes individuals observed fewer than 15 years. A robustness exercise compares the subsample observed 10-14 years to a censored version of the 20+ subsample with matched marginal distributions of observation counts. Median profiles for the uncensored and censored 20+ samples are similar, and inter-decile ranges are slightly more dispersed in the censored sample only for potential experience greater than 7. However, the 10-14 year sample shows substantially different patterns – larger median gaps between benchmark and no-interruption cases, and a larger inter-decile range – consistent with lower private-sector returns to human capital for that group. The authors conclude that selection into the 15+ working sample matters, and results are explicitly restricted to that working sample. The French context (stable aggregate wage inequality, minimum wage policy) limits direct comparability to countries with rising inequality.

Key Concepts

MARCOF (Missing At Random Conditionally On Factors): The paper’s central identifying assumption, weaker than standard MAR. It posits that sector-preference shocks, human capital prices, and depreciation rates follow factor structures (common time-varying factor x individual loading + i.i.d. residual), and that residuals are mean-independent of factors, loadings, and their own histories. Conditional on factors and loadings, wage residuals and sector-choice residuals are independent, making selection exogenous.

Interactive effects / factor structure for selection: An approach in which unobserved confounders are modeled as a bilinear product of time-varying common factors (phi_t) and individual factor loadings (theta_i). This allows flexible correlation between wage processes and participation choices without requiring exclusion restrictions or instrumental variables. The paper’s preferred specification uses two unobserved factors identified by Bai-Ng information criteria.

Average structural functions: Objects defined by Blundell and Powell (2003) that integrate counterfactual outcomes (wages evaluated at a manipulated interruption history) over the distribution of individual-specific parameters. They allow estimation of the causal impact of a change in interruption timing or presence while holding individual structural parameters fixed, under identification conditions analogous to those of Chernozhukov et al. (2013).

Individual-specific coefficients (random coefficients): The five parameters (eta_i0, eta_i1, eta_i2, eta_i3, eta_i4) governing each individual’s wage equation, with structural interpretations: initial log human capital, return to potential experience, curvature (Mincer concavity), effect of cumulative interruption years, and curvature in interruptions. Their individual-specificity is the source of the incidental parameter problem for distributional statistics.

Flat spot approach: An identification device (from Heckman, Lochner, and Taber, 1998; Bowlus and Robinson, 2012) that uses median wages of workers aged 50-55 – who are assumed to have stopped investing in human capital – as consistent estimates of human capital prices by education group and year. This separates the volume of human capital from its price, and provides the restriction identifying the level, trend, and curvature factors from the time-varying unobserved factors phi_t.

Interruption variables x^(3) and x^(4): Reduced-form variables derived from the structural model summarizing the history of private-sector participation gaps. x^(3)_it is the cumulative number of periods spent in the alternative sector prior to date t; x^(4)_it is a geometric-weighted version of those interruptions that reflects the timing (early vs. late) through the discount factor beta. They enter the wage equation with individual-specific coefficients that are identified only for workers with at least two complete interruption spells.

Mincer dip: A U-shaped profile in wage variance (or inter-decile range) over potential experience, predicted by the Ben Porath model because high-return workers invest more at the start of their careers (reducing current wages), causing their wage profile to cross below then above low-return workers. Estimated in this paper at approximately 5 years of potential experience under the main specification.

Incidental parameter bias in distributional statistics: The bias that arises when estimating moments or quantiles of the distribution of individual-specific parameters that converge at rate sqrt(T) rather than sqrt(N). The paper shows through Monte Carlo experiments that variance estimates remain substantially biased even after Arellano-Bonhomme (2012) correction when T >= 20, while inter-decile ranges corrected by Jochmans-Weidner (2019) are more reliable.

Making the Invisible Hand Visible: Managers and Worker Allocation

Mon, 01 Jan 0001 00:00:00 +0000

This paper asks why managers matter for firm performance, and specifically whether managers improve productivity by matching workers to better-suited jobs inside firms rather than through supervision, motivation, or selection out of the firm. The setting is the internal labor market of a large private consumer goods multinational enterprise (MNE) operating in more than 100 countries, with annual turnover exceeding EUR 50 billion. The data cover the universe of white-collar workers and managers at the firm — 200,000 workers and 30,000 managers observed monthly over 11 years (January 2011 to December 2021) — linked to payroll, performance ratings, organizational chart, digital platform activity, employee surveys, and an independent sales productivity series for field sales workers in 15 countries.

The paper confronts two identification challenges. First, the author constructs a measure of manager quality — “high flyers” — defined as managers who were promoted to the first managerial work level (WL2) by age 30. This threshold yields 26.2% of managers classified as high flyers. The measure is defined entirely ex ante, before the manager ever supervises the worker under study, which addresses reverse causality. It is validated against ex post performance metrics including future salary growth, probability of promotion to WL3, performance ratings, and anonymous subordinate feedback. Second, to identify causal effects of manager quality on workers, the author exploits the firm’s long-standing policy of rotating WL2 managers laterally across teams as part of their career development, a practice implemented for several decades. Using an event-study design centered on the worker’s first manager transition, the author compares workers who transition from a low-flyer to a high-flyer manager (LtoH) against workers who transition from one low-flyer to a different low-flyer (LtoL), netting out the effect of the transition itself. Pre-event parallel trends are confirmed empirically.

The main findings are as follows. Gaining a high-flyer manager causes substantial reallocation of workers within the firm through lateral job transfers: seven years after the manager transition event, cumulative lateral moves are 40% higher for workers who gained a high-flyer manager relative to those who gained another low-flyer. These lateral moves are not confined to a single organizational margin — transfers rise within-team, across teams in the same function, and across functions — and they involve meaningfully larger shifts in task content, as measured by angular separation across O*NET cognitive, routine, and social task intensity dimensions, with cumulative task distance becoming statistically distinguishable from zero approximately seven quarters post-transition. These gains in lateral mobility translate into persistent wage growth: seven years after the manager transition, workers supervised by a high-flyer earn salaries 13% higher than the comparison group, with divergence beginning only after the transition date. Using independent sales bonus data, three years after gaining a high-flyer manager workers’ sales productivity increases by 0.347 standard deviations, ruling out the interpretation that wage gains merely reflect manager favoritism rather than genuine productivity improvement. Establishment-level data further show that sites with a higher share of workers under high-flyer managers display higher output per worker and lower operational costs per unit.

Effects are asymmetric: gaining a good manager has large positive effects, but losing one (comparing HtoL with HtoH transitions) produces no corresponding negative effects, implying that a single exposure to a high-flyer manager generates durable benefits that survive a subsequent downgrade in manager quality. A mediation analysis finds that 64% of the salary gain is explained by lateral job changes, though the author notes this understates the full allocation channel because it excludes vertical transfers and the gains from remaining well-matched in the current role. These findings hold under multiple robustness checks including restricting to new hires, using the Sun and Abraham (2021) interaction-weighted estimator, varying the age threshold for high-flyer classification, using a tenure-based alternative, and placebo tests with randomly assigned manager types.

The scope conditions are specific to white-collar workers at a large, organizationally homogeneous consumer goods multinational. All workers hold college degrees, mean firm tenure is 8.5 years, team sizes average five workers, and the firm has the same organizational structure across all countries, functions, and years.

Q: How does the paper define “high flyer” managers and what share of managers receive this classification? A: High flyers are managers who achieved the first managerial work level (WL2) by age 30, a threshold derived from continuous age estimates constructed from 10-year age bands in the personnel records. This definition yields 26.2% of managers classified as high flyers. The measure is time-invariant and defined ex ante relative to any interaction with the workers whose outcomes are studied.

Q: What validates the high-flyer measure as capturing genuine managerial ability rather than noise? A: The high-flyer classification is significantly positively correlated with multiple ex post performance metrics recorded after the manager’s own promotion: future salary growth, probability of subsequent promotion to WL3 (director level), annual performance ratings, and anonymous upward feedback scores from subordinates on leadership. High flyers are also 14.5 percentage points less likely to be mid-career recruits, suggesting they are internally developed talent rather than external hires.

Q: What is the source of identifying variation and how does the event-study design address endogeneity? A: The firm has operated a decades-long policy of rotating WL2 managers laterally across teams to broaden their experience and to screen candidates for promotion to WL3. These rotations are asserted by firm executives and HR representatives to be orthogonal to worker and team characteristics. The author verifies this empirically by showing that a wide range of team characteristics measured over the two years before a transition — including team performance, inequality, transfer rates, and team diversity — cannot predict the type of incoming manager. The event-study design compares workers who receive a high-flyer replacement (LtoH) against workers who receive another low-flyer replacement (LtoL), netting out any generic effect of a managerial change, and confirms parallel pre-trends.

Q: What is the effect of gaining a high-flyer manager on lateral job mobility? A: Seven years after the manager transition, workers assigned to a high-flyer manager exhibit lateral moves that are 40% higher relative to workers assigned to another low-flyer. These lateral moves occur across all organizational margins: within the same team, across teams within the same function (the largest contributor), and across functions. Beyond frequency, lateral moves under high-flyer managers also involve larger task-content shifts, with cumulative task distance (measured using O*NET cognitive, routine, and social task dimensions via angular separation) becoming statistically distinguishable from zero approximately seven quarters after the transition.

Q: What is the wage effect of gaining a high-flyer manager and when does it materialize? A: Workers who transition from a low-flyer to a high-flyer manager earn a salary 13% higher than workers who transition to another low-flyer, measured seven years after the transition event. The divergence begins only after the transition date, consistent with the pre-event parallel trends assumption, and accumulates gradually rather than appearing as an immediate jump.

Q: Does the wage gain reflect genuine productivity improvement or simply managerial favoritism in pay decisions? A: The author uses an independent sales bonus series — based on monthly targets set by supply chain demand planning teams, not by managers — for 5,604 field sales workers in 15 countries from 2018 to 2021. Three years after gaining a high-flyer manager, workers’ sales productivity increases by 0.347 standard deviations. This confirms that pay gains correspond to actual productivity improvement rather than inflated ratings for unchanged performance.

Q: How much of the wage gain is attributable to the lateral reallocation channel specifically? A: A mediation analysis attributes 64% of the 13% salary gain to lateral job changes. The author cautions that this is a lower bound because the mediation excludes vertical transfers (which mechanically raise salary) and does not capture gains for workers who remain in their current job because it represents a good match rather than requiring reallocation.

Q: Are the effects symmetric — does losing a high-flyer manager reverse the gains? A: No. Comparing workers who transition from a high-flyer to a low-flyer manager (HtoL) against workers who transition from a high-flyer to another high-flyer (HtoH) reveals no corresponding negative effects. The gains from a single prior exposure to a high-flyer manager are persistent and are not undone by a subsequent low-quality manager. The author interprets this as evidence that a good match, once created, endures independently of the manager who created it.

Q: Does gaining a high-flyer manager raise the rate of worker exit from the firm? A: No. There is no statistically detectable effect on either voluntary exits (quits) or involuntary exits (layoffs), with null results that are not masked by heterogeneity across high- and low-performing workers. This rules out the interpretation that high-flyer managers improve measured outcomes of retained workers by selecting out underperformers.

Q: Do workers move into roles connected to their high-flyer manager’s prior network or follow their manager when the manager moves? A: No. There is no evidence that workers move into roles connected to the high-flyer manager’s prior colleagues; if anything, subordinates of high-flyer managers are less likely to make such moves. Workers also do not follow their high-flyer managers when those managers subsequently rotate to a different team. These findings rule out favoritism, social network access, and information-advantage explanations as primary drivers.

Q: How does the paper rule out on-the-job teaching (human capital transmission) as the primary mechanism? A: If high-flyer managers improved worker outcomes primarily by teaching workers to be more productive in their current job, the prediction would be reduced lateral mobility (workers become too productive to leave their current role). The observed pattern — substantially higher rates of lateral reallocation under high-flyer managers — is the opposite of this prediction, making teaching as the dominant channel unlikely.

Q: What does the manager behavior evidence show about how high flyers spend their time? A: Time-use data from a random sample of approximately 600 WL2 managers in 2019 show that high-flyer managers spend 19% more time in one-on-one meetings with subordinates and engage more in communication and multitasking activities relative to low-flyer managers. Their skill profiles also differ: high flyers are more likely to have strengths in strategy and talent management rather than project management, consistent with a more coordination-intensive and people-development-oriented style.

Q: What heterogeneity is there in who benefits from high-flyer managers? A: Effects are larger when managers and workers are in the same physical office (proximity facilitates talent assessment), when the organizational unit has a more diverse set of job roles (more matching opportunities), and for younger workers who are still discovering their comparative advantages. Critically, benefits are not concentrated among high-baseline performers: workers with low initial pay growth experience gains comparable to those of high performers, suggesting high-flyer managers uncover and deploy hidden talent broadly rather than accelerating only already-visible stars.

Q: Does high-flyer management aggregate to establishment-level productivity? A: Yes. Establishments where a higher share of workers are supervised by high-flyer managers show higher output per worker (tons per FTE) and lower operational costs per unit of output (operational costs per ton), measured using establishment-year data across approximately 150 sites globally over 2019-2021. This is consistent with the individual-level allocation mechanism producing aggregate productivity gains.

Q: What are the organizational design implications of the asymmetric effects? A: Because the gains from a single exposure to a high-flyer manager persist even after a subsequent manager downgrade, firms do not need each worker to be continuously supervised by a high-flyer. It is sufficient to rotate high-flyer managers across teams so that each worker receives at least one exposure. This makes the allocation mechanism resource-neutral relative to hiring, firing, or formal training programs.

High flyer (paper’s definition): A manager who achieved the first managerial work level (WL2) at the firm by age 30 — a time-invariant, ex ante classification representing the firm’s revealed-preference assessment of leadership potential, validated against subsequent salary growth, promotion probability, performance ratings, and subordinate feedback. Constitutes 26.2% of managers in the sample.

Internal labor market (paper’s usage): The system within the firm through which workers are allocated to jobs via lateral transfers and vertical promotions, mediated by managers rather than by external price mechanisms; the institutional context within which manager-worker matching produces wage growth and productivity gains.

Lateral transfer (paper’s usage): A horizontal reallocation of a worker to a different job title, team, subfunction, or function at the same work level, as distinct from a vertical promotion. Captured monthly in personnel records; operationalized as moves involving changes in task content measured by O*NET task distances.

Task distance (paper’s usage): The angular separation between origin and destination occupations across three O*NET task dimensions (cognitive, routine, and social intensity), ranging from zero (identical task profiles) to one (completely distinct profiles), used to characterize the substantive scope of lateral moves induced by high-flyer managers.

Manager rotation (paper’s usage): The firm’s longstanding policy of reassigning WL2 managers laterally across teams within a subfunction, designed to broaden managerial experience and screen for promotion to WL3; treated in the empirical strategy as generating plausibly exogenous variation in the manager type each worker encounters.

Allocation mechanism (paper’s usage): The process by which managers discover workers’ specific skills and match them to specialized jobs inside the firm, operating through lateral reallocation rather than through hiring, firing, or on-the-job training; identified in the paper as the primary channel through which high-flyer managers generate persistent wage and productivity gains.

Asymmetric persistence (paper’s usage): The empirical pattern in which the gains from gaining a high-flyer manager are large and durable, while losing a high-flyer manager (transitioning to a low-flyer) produces no corresponding negative effects on the outcomes of previously well-matched workers, implying that good matches, once formed, survive a change in manager quality.

Marginal Returns to Public Universities

Mon, 01 Jan 0001 00:00:00 +0000

This paper asks whether enrolling in an American public university generates positive net returns for marginal students — those who barely qualify for admission — and whether those returns justify public expenditures. The question is policy-relevant because marginal students have weak academic preparation, face high dropout risk, and the net returns to expanding admission margins are theoretically ambiguous.

The author assembles administrative records spanning all 35 public universities in Texas, covering the universe of Texas public high school graduates from 2004–2014 (approximately 2.7 million students). Texas public universities collectively enroll over 10 percent of all American public university students. The data link high school records (test scores, demographics, coursework, attendance, disciplinary infractions) to college application and admission records, postsecondary enrollment and degree completion records, financial aid packages, institutional expenditure data from IPEDS, and quarterly earnings records from the Texas Workforce Commission unemployment insurance system.

The identification strategy exploits hundreds of decentralized SAT/ACT score cutoffs in university admissions — varying across schools and application years — that generate sharp discontinuities in admission probability. A fuzzy regression discontinuity design compares applicants just above versus just below each cutoff. On average, crossing a cutoff raises the probability of admission by 27 percentage points and the probability of enrolling at the target university by 15 percentage points. Density tests and pre-college covariate balance validate the smoothness assumptions. The typical cutoff complier is more disadvantaged than the average college applicant but comparable to the average Texas high school graduate.

Roughly half of cutoff compliers would fall back to another, typically less selective, four-year institution if rejected; 43 percent would fall back to a two-year community college; and only about 6 percent would forgo higher education entirely. The pooled estimates therefore blend intensive-margin effects (more selective versus less selective four-year college) with extensive-margin effects (four-year college versus community college or no college).

Main causal findings for enrollment compliers: the typical marginally admitted student completes approximately one additional year of credits in the four-year sector and becomes 12 percentage points more likely to ever earn a bachelor’s degree from any institution. About half of the additional four-year credits are offset by 15 fewer credits in the two-year sector, and associate degree or certificate completion falls by 7 percentage points. All bachelor’s degree gains are in non-STEM fields; STEM degree completion shows no detectable increase. Compliers become about 3 percentage points more likely to hold a graduate degree by 10 years out.

On earnings, admitted compliers earn less than rejected counterparts in the first five years due to continued enrollment. Year six is the crossover point; by years 8–12, compliers earn a stable 8.6 percent earnings premium in log terms (8.2 percent in dollar ratio terms, representing a LATE of $3,339 against an untreated complier mean of $40,829), with earnings ranks rising approximately 4 percentiles from a base near the 50th percentile.

Marginally admitted students pay no additional net tuition on average: $4,600 in additional gross tuition is nearly fully offset by grant aid, though they take on $5,300 more in student loans. Society incurs approximately $10,000 in additional educational expenditures per complier. Internal rates of return are 26 percent for students, 16 percent for society, and 7 percent for the government budget. At a 3 percent discount rate, the lifetime net present value of enrolling the typical marginal applicant is approximately $80,000 — $70,000 accruing to the student and $10,000 to taxpayers.

Earnings gains are similar across institutions of varying selectivity, but significantly smaller for low-income compliers, who spend more time enrolled, complete fewer degrees, and major in less lucrative fields. A bounding method shows that extensive-margin compliers (those who would otherwise not attend any four-year college) experience larger effects than intensive-margin compliers.

Q: What is the core research question and why is credible evidence scarce? A: The paper asks whether enrolling marginal students in American public universities generates positive net returns — private, social, and fiscal — and what drives heterogeneity in those returns. Credible evidence is scarce because most existing work is correlational and fails to account for selection bias: individuals with more college education may have had pre-existing advantages, confounding college’s causal effect with systematic sorting into it. Even if average returns are positive, the policy-relevant question is whether the marginal student — who has weak preparation and high dropout risk — represents a good investment.

Q: What is the regression discontinuity design, and what does the first stage look like? A: The author infers hundreds of decentralized SAT/ACT score cutoffs across approximately 700 application cells (combinations of university, year, GPA quartile, and test type) by searching for the score value with the largest discontinuity in admission and enrollment within each cell. This procedure delivers a superconsistent estimator of each cell’s true cutoff. Pooled across all cells, crossing a cutoff raises the probability of admission by 27 percentage points and the probability of enrollment at the target university by a precisely estimated 15 percentage points. The density of applicants and a rich set of pre-college characteristics run smoothly through the cutoffs, supporting the exclusion restriction.

Q: Who are the cutoff compliers, and are they representative of any broader population? A: Compliers — applicants who enroll in the target university if and only if they barely cross its cutoff — comprise approximately 15 percent of marginal applicants. In observable characteristics, compliers are roughly representative of the broader population of marginal applicants at the cutoff. They are significantly more disadvantaged than the average public university applicant, but broadly comparable to the average Texas public high school graduate in terms of academic preparation and family income.

Q: What are the next-best alternatives for marginal applicants who are rejected? A: Approximately 47 percent of compliers would fall back to another Texas four-year college (mostly public), 43 percent to a two-year community college, and approximately 9 percent would not enroll in any Texas institution. National Student Clearinghouse data for the 2008–2014 cohorts confirm that only 4 percent of untreated compliers attend a college outside the THECB universe, meaning approximately 6 percent of all compliers truly forgo higher education altogether if rejected. The empirically relevant extensive margin is therefore between the four-year sector and the two-year sector, not between college and no college.

Q: How does cutoff crossing change the institutional characteristics a complier experiences? A: Compliers are propelled into substantially better-resourced environments: the average math test score of college peers rises by half a standard deviation; peers are 12 percentage points less likely to have been low-income; gross tuition rises by $2,400 (a 42 percent increase over the untreated complier mean of $5,700); educational spending per student rises by $3,200 (43 percent over the untreated mean); peers’ 10-year BA completion rate rises by 28 percentage points; and peer mean earnings 8–12 years after college entry are $6,700 higher.

Q: What are the educational attainment effects? A: Cutoff crossing causes compliers to complete approximately 28 additional credits at any four-year institution (roughly one full year of a four-year program) and increases the probability of ever earning a bachelor’s degree by 12 percentage points, raising the completion rate from approximately 40 percent to just above 50 percent. About 15 fewer two-year sector credits are offset against the four-year gains, and associate degree or certificate completion falls by 7 percentage points. All bachelor’s degree gains are in non-STEM fields; there is no detectable increase in STEM degrees. Graduate degree completion rises by approximately 3 percentage points by 10 years out.

Q: What is the earnings trajectory, and when does the premium materialize? A: Admitted compliers earn less than rejected counterparts in the first five years after application because they remain enrolled longer. Year six is the crossover point. By years 8–12, the earnings premium stabilizes at approximately 8.6 percent in log terms and 8.2 percent in dollar ratio terms (a LATE of $3,339 against an untreated complier mean of $40,829). Earnings rank rises by approximately 4 percentiles from a base near the 50th percentile. These results are robust across sandwich earnings, all-quarters-with-earnings, and zero-imputed specifications.

Q: What does the cost-benefit analysis show? A: Marginally admitted students pay no additional net tuition on average: $4,600 in additional gross tuition is nearly fully offset by additional grant aid. They do borrow $5,300 more in student loans, likely financing higher room, board, and consumption costs at four-year colleges. From society’s perspective, compliers generate approximately $10,000 in additional educational expenditures. Cumulative undiscounted earnings benefits surpass costs after 8 years for students, 11 years for society, and 19 years for taxpayers. At a 3 percent discount rate, the lifetime net present value is approximately $80,000 total — $70,000 accruing to the student and $10,000 to taxpayers — with internal rates of return of 26 percent for students, 16 percent for society, and 7 percent for the government budget.

Q: Does selectivity of the admitting institution predict larger earnings returns? A: No. Compliers at more selective institutions experience substantially larger increases in peer quality than those at less selective institutions, but they are also less likely to be on the extensive margin of four-year enrollment and experience smaller BA attainment gains. These factors roughly offset, producing no systematic difference in earnings gains across institutions of varying selectivity. More selective institutions also impose no additional cumulative cost on society, while compliers actually pay slightly less in additional net tuition at more selective schools.

Q: How does the commonly used measure of college value-added (mean peer earnings) compare to actual complier returns? A: Mean peer earnings overpredicts actual value-added for marginal students by a factor of two: compliers attend an institution with $6,700 higher average peer earnings as a result of admission but gain only $3,300 themselves. The measure also overpredicts the earnings return to selectivity by a factor of three: a 100-SAT-point increase in target school selectivity predicts $3,000 higher peer earnings but only a statistically insignificant $900 higher gain in the complier’s own earnings.

Q: How do earnings returns differ by family income? A: Compliers from low-income families experience significantly smaller earnings gains compared to higher-income compliers. The gap is not explained by differential changes in college quality induced by admission. Instead, low-income compliers gain fewer degrees despite spending more time in college and major in less lucrative fields, consistent with related findings in the literature on family income gaps in degree completion and major choice.

Q: How do earnings returns differ by gender and by race? A: Female and male compliers eventually earn similar log earnings and earnings rank gains, but women reach their gains more quickly — likely because men take longer to finish college. White and Asian compliers experience similar earnings gains and BA completion improvements as Black and Hispanic compliers, despite white and Asian students experiencing larger increases in college selectivity and spending per student as a result of admission.

Q: What is the method for separating intensive- and extensive-margin effects? A: The two complier types are not directly distinguishable in the data. The author first uses an endogenous but strong stratification variable — having at least one other Texas public university admission offer — to identify some mean potential outcomes for each type. He then imposes an empirically-informed rank assumption to bound the remaining unknown mean potential outcomes, delivering tightly informative upper and lower bounds on each margin’s effects without requiring full nonparametric identification. The results show that pooled effects are driven by larger returns for extensive-margin compliers who would not have attended any four-year college, with smaller contributions from intensive-margin compliers shifting between four-year institutions.

Q: How do this paper’s earnings estimates compare to prior studies, and what explains the differences? A: This paper’s 8 percent earnings gain is smaller than the 17–26 percent reported in prior studies (Zimmerman 2014: 22%; Kozakowski 2023: 26%; Smith, Goodman, and Hurwitz 2025: 17%; Bleemer 2024: 21%; Hoekstra 2009: 20%). The differences are likely explained by the much larger educational attainment and institutional quality gains induced by those studies’ natural experiments: in Zimmerman (2014), enrollment compliers gain roughly three additional years of four-year education versus one year in this paper; in Bleemer (2024), compliers experience roughly $30,000 more in institutional spending per student versus approximately $3,000 in this paper.

Q: What are the scope conditions for these results? A: The results pertain to marginal applicants to Texas public universities (excluding UT-Austin, which uses holistic admission with no detectable SAT/ACT cutoffs) from the 2004–2014 high school graduation cohorts. The identified effects are local average treatment effects for compliers — applicants who would enroll in the target university if and only if they barely crossed its admission cutoff — and do not represent effects for always-takers or infra-marginal students. Earnings are measured only for Texas-based workers covered by the state unemployment insurance system, which captures an estimated 90 percent of the civilian labor force.

Cutoff complier: An applicant who enrolls in their target university if and only if their SAT/ACT score barely exceeds that university’s admission cutoff. Compliers are the population whose behavior — and thus whose treatment effects — are identified by the fuzzy RD design. They comprise approximately 15 percent of marginal applicants and are more disadvantaged than the average public university applicant but broadly comparable to the average high school graduate.

Extensive versus intensive margin: The extensive margin refers to the contrast between attending any four-year college versus falling back to a two-year community college or no college. The intensive margin refers to the contrast between attending a more selective versus a less selective four-year institution. Approximately half of cutoff compliers are on each margin; the paper treats them as economically distinct parameters requiring separate identification.

Fuzzy regression discontinuity (RD) design: An identification strategy that uses the discontinuous jump in admission probability at a test score cutoff as an instrument for enrollment, recovering the LATE for compliers via the ratio of the reduced-form discontinuity in outcomes to the first-stage discontinuity in enrollment. “Fuzzy” refers to the fact that crossing the cutoff changes admission and enrollment probabilities with a discrete jump rather than with certainty.

Internal rate of return (IRR): The discount rate at which the net present value of an investment equals zero — here, the discount rate equating the discounted stream of earnings benefits to the discounted stream of costs. The paper estimates IRRs separately for students (26 percent), society (16 percent), and the government budget (7 percent), reflecting different cost and benefit definitions from each perspective.

Rank assumption (bounding method): An empirically-informed assumption about the ordering of mean potential outcomes across latent complier types (extensive vs. intensive margin) that, combined with partial identification from a strong endogenous stratification variable, yields tight upper and lower bounds on each margin’s causal effects without requiring full nonparametric identification.

Net tuition: Gross tuition charges minus grant aid. For the typical marginal complier, gross tuition rises by $4,600 but is nearly fully offset by additional grant aid, yielding approximately zero additional net tuition cost — meaning the private financial cost of attending a public university for marginal students is effectively zero on net, though they take on $5,300 more in student loans to finance room, board, and consumption.

Sandwich earnings measure: A procedure applied to quarterly state earnings data that retains only quarters with positive earnings sandwiched between other quarters with positive earnings, discarding high-variance transition quarters between employment spells. Annualized by multiplying the quarterly average by four; used to reduce noise from entry and exit transitions in administrative earnings records.

Peer Effects and the Gender Gap in Corporate Leadership

Mon, 01 Jan 0001 00:00:00 +0000

This paper investigates whether exposure to a larger share of female peers during an MBA program causally affects the gender gap in senior corporate leadership positions. The research question is motivated by the persistent underrepresentation of women in top management: in S&P 1500 companies, women hold only 6% of CEO positions despite comprising 40% of the workforce.

The authors merge administrative data from a top-10 U.S. business school (graduating classes 2000–2018, excluding 2009) with public LinkedIn profile data covering full employment histories, firm-level data from multiple sources including InHerSight crowdsourced female-employee ratings, and a 2023–2024 alumni survey of female graduates. Senior management is defined as Vice President, Director, Senior Vice President, or C-level executive, identified from exact job titles in LinkedIn CVs.

Identification exploits the quasi-random assignment of incoming MBA students to one of eight sections of approximately 60 students each, based on alphabetical order with balance checks on gender, undergraduate institution, and ethnicity. This assignment generates exogenous variation in the share of female section peers (mean 34%, standard deviation 4 percentage points). Randomization tests following Guryan et al. (2009) and Caeyers and Fafchamps (2021) confirm the assignment is as good as random. The estimating equation is a linear-in-means model with class, year, and class-by-year fixed effects interacted with gender, plus individual and section-level controls.

The paper first documents a baseline gender gap: despite 96% of both male and female MBA graduates entering management within 15 years, women are 24% less likely than men to hold senior management positions. This gap emerges immediately after graduation, persists for at least 15 years, and is partly attributable to lower promotion rates from first-level management (43% of women in first-level management transition to senior management within five years, versus 57% of men).

The main causal finding is that a 4 percentage point (1 SD) increase in the share of female MBA section peers increases the probability of a woman holding a senior management position by 8.4% (a 3.3 percentage point increase off a 39.1% baseline), equivalent to a 26% reduction in the management gender gap. There is no corresponding effect for men. The effect emerges as early as two years post-graduation, peaks around year seven, and persists through the 15-year horizon.

The increase is concentrated in female-friendly firms, defined as those with above-median ratings on InHerSight metrics including maternity leave generosity, flexible work schedules, and professional support. Women with more female peers are significantly more likely to transition into female-friendly firms 6 to 10 years after graduation — a period coinciding with prime childbearing years — where they subsequently attain senior management roles. The effect on senior management in female-friendly firms is statistically distinguishable from the null effect in non-female-friendly firms (p-value = 0.03). The results are largest in male-dominated industries (consulting, tech, finance) where women face greater barriers to informal networks.

A survey of 283 female MBA alumnae (10% response rate) reveals three mechanisms: (i) information sharing, especially gender-specific advice about employer policies and culture; (ii) higher ambitions and self-confidence through role modeling and emotional support; and (iii) increased perceived support from male MBA peers as female section representation rises. Corroborating the information-sharing channel, women with more female peers are more likely to work at the same firms as their female section peers, particularly when those firms are female-friendly.

A counterfactual exercise shows that reallocating the existing stock of female students so that all sections have at least 34% women would yield 2 to 5 additional female senior managers per graduating class (a 2.4% to 8.4% increase), holding the total number of female students fixed.

Q: What is the baseline gender gap in senior management among MBA graduates, and how does it evolve over time? A: Female MBA graduates are 24% less likely than male graduates to hold senior management positions in the 15 years after graduation. The gap emerges immediately after the MBA and persists for at least 15 years without closing. At year 15, 74% of men hold a senior management position compared to 59% of women.

Q: How is female peer share defined and what is its distribution across sections? A: Female peer share is the proportion of female students in an individual’s assigned MBA section of approximately 60 students, excluding the individual themselves. The average section female share is 34% with a standard deviation of 4 percentage points. The distribution ranges from 19% at the 1st percentile to 45% at the 99th percentile, with the interquartile range spanning approximately 32% to 36%.

Q: What is the main causal estimate of female peers on women’s senior management probability? A: A 4 percentage point (1 SD) increase in female section peer share increases the probability of a woman holding a senior management position by 8.4% (3.3 percentage points off a 39.1% mean), averaged across the 15 post-MBA years. This translates to a 26% reduction in the management gender gap. There is no statistically significant effect on men.

Q: When does the effect of female peers emerge and how does it evolve dynamically? A: The effect on women emerges as early as two years after MBA graduation and grows over time, peaking around seven years post-graduation. The effect is persistent across the 15-year horizon studied. Estimates become less precise toward the end of the sample period as recent cohorts contribute fewer observations.

Q: How do female-friendly firms mediate the main result? A: The main effect is entirely concentrated in female-friendly firms (those with above-median InHerSight ratings). The coefficient on female peer share is positive and significant for senior management in female-friendly firms, and statistically indistinguishable from zero in non-female-friendly firms. The difference between the two coefficients is significant at p = 0.03.

Q: What is the mechanism linking female peers to female-friendly firm transitions? A: Women with more female peers are significantly more likely to be employed at female-friendly firms 6 to 10 years after graduation, a window corresponding to prime childbearing years. This suggests female peers facilitate sorting into supportive firm environments when family-work tradeoffs become most acute. Once at female-friendly firms, women attain senior management positions at higher rates.

Q: Does the increase in female senior managers reflect easier paths (smaller firms, lower pay, non-P&L roles)? A: No. The effect is significant for both small (under 500 employees) and large (over 5,000 employees) firms, with no significant effect on the firm size of employment itself. There is no consistent pattern of women being promoted in firms with higher or lower average compensation. The increase in female senior managers includes those with Profit and Loss responsibilities, indicating these are substantive management positions.

Q: In which industries is the effect largest, and what does this imply? A: The effect is concentrated in male-dominated industries (consulting, tech, finance), with no significant effect in female-dominated industries (consumer goods, healthcare). The difference between coefficients is significant at the 3% level. Entry rates into male-dominated industries are not significantly affected, suggesting the mechanism is higher promotion rates within these industries rather than differential sorting into them. The authors interpret this as evidence that female MBA networks are most valuable where women face greater barriers to informal workplace networks.

Q: What does the survey evidence reveal about mechanisms? A: Among 283 survey respondents (10% response rate), three mechanisms emerge: information sharing about gender-specific employer attributes and policies; raising ambitions and self-confidence through role modeling; and increased perceived support from male MBA peers as section female share rises. Women with more female peers are also more likely to work at the same firms as their female section peers, especially female-friendly ones, consistent with referral and information-sharing channels.

Q: Does the effect operate through greater attachment to the corporate pipeline (fewer career breaks, higher entry into management)? A: No. Female peers do not significantly affect employment rates, career break incidence, entry into first-level management positions, or self-employment rates. The results thus reflect higher promotion rates from first-level management into senior management, not changes in pipeline attachment.

Q: What do the randomization tests show about identification validity? A: Two randomization tests confirm as-good-as-random assignment. Following Guryan et al. (2009), the section-level leave-out mean female share is not significantly different from zero after controlling for the class-level leave-out mean. Following Caeyers and Fafchamps (2021), after netting out the asymptotic exclusion bias, the female share coefficient is insignificant across all specifications. A simulation test (Bietenbeck 2020) finds no statistically significant difference between the actual and simulated within-class female share distributions.

Q: What placebo tests are conducted and what do they show? A: Two placebo tests are run. First, 1,000 random reassignments of students to sections within the same class show the true estimated effect for women lies outside the distribution of placebo effects, while the null effect for men lies within it. Second, estimating the main equation for up to three years before MBA enrollment finds no consistent pre-treatment effect of female share on future female graduates, supporting the identification strategy.

Q: What is the counterfactual policy exercise and what does it imply? A: Holding the total number of female students fixed, reallocating them so that all sections contain at least 34% women would yield 2 to 5 additional female senior managers per graduating class (a 2.4% to 8.4% increase). This assumes nonlinearity in the relationship and suggests meaningful gains from rebalancing section composition without increasing overall female enrollment.

Q: How do the results compare to the Thomas (2021) finding that more male peers raise female MBA earnings? A: The authors note several differences: Thomas (2021) focuses on starting earnings while this paper studies senior management positions over 15 years; the two studies use different universities and time periods; and this paper employs gender-by-cohort fixed effects to account for time trends in female labor market outcomes. The authors suggest these design and outcome differences explain the divergent findings.

Section peers: Students assigned to the same MBA section of approximately 60 students who take core classes together and form the primary peer network; sections are assigned quasi-randomly based on alphabetical order with balance adjustments, generating exogenous variation in gender composition.

Female-friendly firms: Firms with above-median ratings on InHerSight, a crowdsourced platform where female employees rate employers on metrics including maternity leave generosity, flexible work schedules, mentorship programs, and female representation in management; defined in this paper’s own terms as firms whose cultures and policies help women balance work-family responsibilities and support career advancement.

Senior management: Positions defined as Vice President (VP), Director, Senior Vice President (SVP), or C-level executive, identified using keyword matching on exact job titles from LinkedIn CVs; distinguished from first-level management (managers and supervisors) and representing the upper rungs of the corporate management ladder.

Female share (treatment variable): The proportion of female students among an individual’s section peers, excluding the individual themselves (leave-out mean); averaged 34% with a 4 percentage point standard deviation across sections, after residualizing by graduating class.

Management gender gap: The 24 percentage point (24%) difference in the likelihood of female versus male MBA graduates holding senior management positions within 15 years of graduation; emerges immediately post-MBA and does not close over the observed horizon.

Information sharing mechanism: The channel through which female MBA peers provide gender-specific advice and information about employer policies, culture, and female-friendliness that is otherwise difficult to observe; evidenced by the co-location of women with more female peers at the same female-friendly firms as their section peers.

Exclusion bias: The systematic negative correlation between an individual’s own characteristic and her leave-out peer mean that arises mechanically when individuals cannot be their own peer under assignment without replacement; addressed via the Caeyers and Fafchamps (2021) correction in randomization tests.

Supply, Demand, Institutions, and Firms: A Theory of Labor Market Sorting and the Wage Distribution

Mon, 01 Jan 0001 00:00:00 +0000

Layer 1 — Overview

Research question. How do workforce composition (labor supply), labor demand, and minimum wage policy jointly determine the wage distribution in imperfectly competitive labor markets, and what were the quantitative contributions of each force to the dramatic decline in Brazilian wage inequality between 1998 and 2012?

Motivation. Brazil’s formal-sector wage inequality fell sharply over this period. Three candidate shocks are well-documented: (1) a large increase in educational attainment — the share of adults completing at least secondary school rose by 20 percentage points (a 68 percent increase) between 1998 and 2012; (2) labor demand shocks, primarily the commodities boom of the 2000s; and (3) a 93.7 percent (66.1 log point) real increase in the federal minimum wage. Existing frameworks analyze these shocks separately — competitive supply/demand models on one side and imperfectly competitive minimum wage models on the other — and therefore cannot detect interactions or jointly explain all observed patterns, including the novel finding that assortative matching between high-wage workers and high-wage establishments rose in 104 out of 151 microregions, a fact inconsistent with the predictions of leading minimum wage models.

Data. The paper uses the RAIS (Relação Anual de Informações Sociais), a confidential linked employer-employee dataset covering the Brazilian formal sector, together with Brazilian Census data for 1991, 2000, and 2010. Statistics are computed for 151 microregions (analogous to US commuting zones) with at least 15,000 workers in RAIS in both base years and at least 1,000 formal workers per educational group. The final sample covers 73 percent of the adult population. Firm wage premiums and assortative matching are measured via AKM two-way fixed effects regressions using the bias-corrected KSS (Kline, Saggio, Sølvsten 2018) estimator, run separately for each microregion and period on three-year panels.

Theoretical framework. The paper develops a unified general-equilibrium model featuring: (i) a task-based production function with distance-dependent complementarity between worker types; (ii) monopsony power arising from idiosyncratic worker preferences for firms, generating constant firm-level labor supply elasticity β (calibrated at 4, implying markdowns of 20 percent); (iii) heterogeneous firms differentiated by their production “blueprints” (the complexity of tasks they require), with blueprint shape parameterized as a Gamma distribution; and (iv) free firm entry, endogenous participation, and goods market general equilibrium with CES consumer preferences (elasticity σ). A key result is that firms with different blueprints exhibit different within-firm substitution patterns: worker types that are substitutes at low-skill, low-wage firms may be complements at high-skill, high-wage firms.

Estimation. A parsimonious parameterization is estimated by simultaneous-equation nonlinear least squares, targeting 26 endogenous outcomes per region (13 per period) including between- and within-group wage inequality, variance of establishment effects, covariance of worker and establishment effects, formal employment rates by education, and minimum wage bindingness. The model requires solving for equilibrium more than 15,000 times per optimization step (151 regions × 2 periods × 53 Jacobian columns). The elasticity of substitution between goods is estimated at σ = 8.36 (significantly above 1), and the aggregate labor supply parameter λ implies formal-sector elasticities of approximately 0.6–0.7 for college workers and around 1.1 for less-than-secondary workers. The model fits the data well, with R² above 0.5 for most targeted moments and perfect fit for the six moments used in the inversion procedure.

Main findings.

Demand shocks and the minimum wage are the primary drivers of falling inequality. In counterfactual simulations, the minimum wage alone (a 66.1 log point increase) reduces the variance of log wages by 0.13. Demand shocks reduce it by a further 0.18. Supply shocks (rising education) increase the variance by 0.04, leaving their net inequality-reducing contribution negligible.
Supply shocks increase assortative matching despite compressing within-firm skill premiums. Within-firm task reassignment would reduce the variance of log wages by 0.221 and the correlation between worker and establishment effects by 0.165, holding production levels and firm entry fixed. However, scale, entry, and price adjustments — driven by the large estimated σ = 8.36 > β + 1 = 5 — reallocate skilled labor toward high-wage, skill-intensive firms, counteracting within-firm compression and raising assortative matching by 0.189. These two channels largely offset each other.
Concurrent supply and demand changes attenuate minimum wage impacts by roughly half. When the minimum wage is the only shock, it would have reduced the variance of log wages by 0.13; in the presence of supply and demand changes, its incremental contribution is approximately 0.07. Minimum wage effects on sorting (which would reduce assortative matching when acting alone) disappear when accompanied by supply and demand transformations.
Minimum wage effects are concentrated in the bottom two productivity deciles. Wage effects for workers in productivity deciles three through ten from the minimum wage are approximately 1 percent or less once all channels are considered. Strong wage gains are concentrated at the bottom, primarily through the monopsony channel. The wage-posting channel (within-firm returns to skill) reduces wages for low- and middle-skill workers and raises them at the top two deciles due to the reallocation of low-skilled workers toward high-wage firms, which reduces those workers’ marginal products there.
Cross-firm differences in substitution patterns generate non-standard minimum wage spillovers. Conditional on the task demands of the firm employing them, a pair of worker types may be substitutes in low-skill firms and complements in high-skill firms. This firm-heterogeneity channel causes minimum wage impacts to be non-monotone across the productivity distribution, contrasting with the smooth inequality-reducing effects predicted by both competitive task-based models and frictional minimum wage models.

Layer 2 — Q&A

Q1: What is the novel empirical fact that motivates the unified framework? A: Using KSS bias-corrected AKM decompositions performed separately for each of 151 microregions, the paper documents that assortative matching — measured as the correlation between worker and establishment fixed effects — rises in 104 out of 151 regions between 1998 and 2012. The covariance term accounts for less than 7 percent of the average decline in the variance of log wages. This finding is inconsistent with the leading imperfectly competitive minimum wage model (Engbom and Moser 2022), in which minimum wages reduce assortative matching. It is also inconsistent with purely competitive supply/demand models, which have no role for firm wage premiums or sorting. The divergence from prior national-level studies (which do not find rising sorting) is explained by the fact that national-level sorting conflates geographical sorting with supply-demand dynamics.

Q2: What is the key mechanism through which the task-based production function generates cross-firm differences in substitution patterns? A: In the task-based production function, each firm assigns workers to tasks assortatively — lower types handle lower-complexity tasks, higher types handle higher-complexity tasks, with cutoff thresholds determined by the firm’s blueprint. When a firm has a blueprint concentrated in complex tasks (a high-skill, high-wage firm), adjacent worker types are more differentiated in the tasks they perform, making them complements. When a firm has a blueprint concentrated in simple tasks (a low-skill, low-wage firm), adjacent worker types are assigned to a narrow, similar range of tasks and are therefore closer substitutes. The elasticity of complementarity between any pair of worker types is thus endogenous, depending on which tasks the firm uses and, in the monopsony case, on the firm’s skill intensity — a prediction validated empirically using nonroutine cognitive task content data for Brazilian occupations.

Q3: Under what conditions can a positive supply shock (rising educational attainment) widen the aggregate skill wage premium rather than compress it? A: The paper’s Proposition 4 and Corollary 2 show that a supply shock that increases the relative supply of skilled workers can widen the aggregate skill wage premium when the elasticity of substitution between goods (σ) exceeds the firm-level elasticity of labor supply plus one (β + 1). Intuitively, when σ is large, the reduction in prices for skill-intensive goods generated by the supply shock shifts consumption toward those goods, causing net entry of skill-intensive firms. If the gains in firm wage premiums earned by skilled workers reallocated to those firms outweigh the compression in within-firm productivity differentials, the aggregate skill premium can rise. This mechanism does not require non-convexities from endogenous innovation; it operates through imperfect competition and firm entry alone. In the estimated Brazilian model, σ = 8.36 substantially exceeds β + 1 = 5, so this condition holds, explaining why rising education increases rather than compresses assortative matching in the data.

Q4: How does the model generate positive employment effects from minimum wages, and how do these interact with reallocation? A: In the monopsonistic baseline without a minimum wage, firms post wages below workers’ marginal revenue products, causing some workers to choose non-employment. A minimum wage increase raises posted wages at constrained firms, shifting some workers from non-employment (or home production) to formal employment, generating positive employment effects at the margin where the minimum wage binds. Simultaneously, minimum wages price out the least productive workers at low-wage firms (disemployment), while workers in the intermediate productivity range reallocate from low- to high-wage firms, because high-wage firms have higher revenue productivity and can profitably hire workers that low-wage firms can no longer afford. The net employment elasticity for the lowest productivity decile with respect to the log minimum wage is −0.61 (Table 7), while the mean wage for that decile rises substantially through the monopsony channel.

Q5: What are the three channels through which the minimum wage affects wages and employment in the model, and what does each channel contribute? A: The paper decomposes minimum wage effects into three channels. Channel 1 (monopsony): mechanical wage increases, positive employment effects at firms where the minimum wage binds, disemployment of very low-productivity workers, and reallocation from low- to high-wage firms, holding posted wage schedules, prices, and entry fixed. This channel accounts for nearly all of the strong wage effects at the bottom two productivity deciles. Channel 2 (wage posting): firms reoptimize earnings schedules following changes in worker composition and marginal products induced by Channel 1, holding prices and entry fixed. This channel reduces wages for low- and middle-skill workers (productivity deciles 1–7) by approximately 0.01–0.02 log points and increases wages for top deciles (decile 9: +0.04, decile 10: +0.11), because reallocation of low-skill labor to high-wage firms lowers those workers’ marginal products there. Channel 3 (general equilibrium): firm entry and price responses. The fall in low-wage-firm profits causes entry of high-wage, skill-intensive firms, while the price of low-skill goods falls. General equilibrium effects generate modest positive wage effects for most workers but negative effects for very low-productivity workers due to reduced aggregate demand for low-skill labor.

Q6: Why do the minimum wage’s inequality-reducing effects diminish when accompanied by concurrent supply and demand changes? A: The paper documents that, under concurrent supply and demand transformations, the minimum wage’s reduction of the variance of log wages is approximately 0.07, roughly half the 0.13 reduction it would achieve acting alone. The attenuation occurs through interactions: supply and demand shocks raise the average productivity level of the labor market and shift workers toward high-wage, skill-intensive firms. In this altered equilibrium, the minimum wage binds less tightly (or hits a different part of the distribution), and the reallocation effects of the minimum wage that would normally reduce assortative matching are offset by the sorting-increasing effects of supply and demand changes. The estimated model shows that interactions between the minimum wage and supply/demand changes (columns 6, 7, 8 of Table 5) are economically meaningful, something undetectable without a unified framework.

Q7: How does the model’s prediction regarding minimum wage spillovers differ from Engbom and Moser (2022), and what explains the difference? A: Engbom and Moser (2022) find that the Brazilian minimum wage hike had significant wage effects extending far up the worker productivity distribution, while this paper’s model finds negligible effects (approximately 1 percent) beyond the bottom two productivity deciles. Two structural differences explain this divergence. First, Engbom and Moser (2022) assume perfect substitutability between worker types within firms, so a minimum wage increase at low-wage firms mechanically raises posted wages at all other firms to maintain relative competitiveness. In this paper’s framework, wage-posting responses at high-wage firms can be negative for low-skill workers because the inflow of reallocated low-skill workers reduces their marginal products — a channel absent under perfect substitution. Second, Engbom and Moser (2022) use a national model, allowing displaced low-skill workers to reallocate to top-productivity firms anywhere in the country, dampening disemployment; this paper’s local labor markets approach restricts reallocation to within-region boundaries, consistent with low rates of interregional migration documented for Brazil by Dix-Carneiro and Kovak (2017).

Q8: How are firm wage premiums generated in the model, and why do differences in physical productivity between firms not generate wage differentials? A: Proposition 3 establishes that wage dispersion for similar workers across firms requires either (i) differences in blueprint shapes (firm heterogeneity in skill intensity) or (ii) differences in entry costs. Differences in physical productivity (z_g) or consumer taste parameters alone are insufficient, because with equal entry costs, differences in productivity lead to additional firm entry until the marginal revenue product of labor is equalized across firm types. Wage premiums proportional to entry costs arise because optimal firm creation requires larger-scale operation for higher-entry-cost firms, and hiring more workers forces those firms to post higher wages. Additionally, skill-intensive firms (firms with blueprints tilted toward complex tasks) pay relative wage premiums for the worker types they use most intensively, and if skill intensity and entry costs co-vary, all workers at high-skill firms may receive a wage premium.

Q9: How does the estimation procedure handle unobserved regional heterogeneity in labor demand? A: Demand shocks are not directly observed; they are inferred as a residual from changes in targeted outcomes after accounting for observed supply (education shares from Census) and minimum wage changes. Five region-time-specific demand parameters — TFP (z), blueprint complexities (θ₁, θ₂), relative entry costs (F₂/F₁), and relative consumer preferences (γ₂/γ₁) — are modeled as linear functions of 1998 regional covariates (educational shares, agricultural share, manufacturing share, and initial minimum wage bindingness) with time-specific coefficients. This formulation allows unobserved demand shifters to correlate with initial educational levels, preventing incorrect attribution of demand-supply correlations to causal supply effects. Region-specific parameters (TFP in each period, education-group-specific formal employment shifters) are inverted exactly from six targeted moments within each region, eliminating incidental parameter bias.

Q10: What micro-level empirical validations does the paper conduct for the task-based model’s mechanisms? A: The paper tests four micro-level predictions using nonroutine cognitive task content data for Brazilian occupations. First, skill-intensive firms have greater demand for complex tasks (consistent with Figure 1 of the model). Second, within firms, more skilled workers are assigned to more complex tasks (Lemma 1). Third, workers who move to more skill-intensive firms are assigned more complex tasks (Lemma 2, consistent with the monopsony model’s mismatch prediction). Fourth, wage gaps between high- and low-skill firms are larger for skilled workers (Proposition 3). The paper reports finding strong support for all four predictions in the data, lending credibility to the theoretical structure and quantitative results.

Key Concepts

Task-based production function (paper’s definition): A production function in which a firm produces output by assigning workers of different types to tasks indexed by complexity. The assignment is assortatively optimal: lower-type workers handle lower-complexity tasks, with unique threshold complexities separating adjacent worker types. The critical property is distance-dependent complementarity — any pair of worker types that are “close” in skill rank are substitutes, while pairs distant in skill rank are complements. This differs from CES production functions where the elasticity of complementarity is the same for all pairs; in the task-based version, substitutability depends on endogenous assignment and thus on the firm’s blueprint.

Blueprint (paper’s definition): A function b_g(x) that specifies the density of tasks of each complexity level x required to produce one unit of good g. It is the fundamental source of firm heterogeneity in the model: firms producing goods with blueprints tilted toward complex tasks are more skill-intensive, hire workers of higher average type, and pay higher wages. The paper parameterizes blueprints as Gamma distributions with shape parameter θ_g indexing average task complexity; firms with higher θ_g are more skill-intensive.

Firm wage premium (paper’s definition): The component of wages at a given establishment that accrues equally to all workers at that firm regardless of their type, measured as the establishment fixed effect ψ_j in AKM two-way fixed effects regressions. In this model, firm wage premiums arise from heterogeneity in blueprints (skill intensity) and entry costs, not from differences in TFP or consumer tastes. Under monopsony, firms with higher entry costs must operate at larger scale and post higher wages; blueprint heterogeneity generates differential wage premiums by skill type.

Sorting / assortative matching (paper’s definition): The correlation between the worker fixed effect (ν_i,r capturing worker skill) and the establishment fixed effect (ψ_j capturing firm wage premium) in the AKM decomposition, measured as Cov(ν_i,r, ψ_{J(i,r,τ)} | r). In this paper’s framework, sorting arises because firms with blueprints demanding complex tasks (high-wage firms) have a comparative advantage in employing high-skill workers; labor market sorting can therefore change over time due to supply, demand, or minimum wage shocks, even without changes in search frictions.

Monopsony power / markdown (paper’s definition): Arising from idiosyncratic worker preferences for firms (modeled as a nested logit), firms face upward-sloping labor supply curves with constant firm-level elasticity β. Optimal posted wages equal a constant markdown β/(β+1) of the marginal revenue product of labor, set to β = 4 (implying a 20 percent markdown). The macro elasticity of formal sector labor supply is governed by a separate parameter λ, estimated from the data, yielding aggregate formal-sector supply elasticities of approximately 0.6–0.7 for college workers and around 1.1 for less-educated workers.

Wage posting responses (paper’s definition): The second channel of minimum wage effects, in which firms reoptimize their entire earnings schedule following the wage-composition changes induced by the minimum wage’s mechanical and reallocation effects (Channel 1), while keeping goods prices and firm entry fixed. Because task-based production functions are concave, changes in factor proportions (due to reallocation of low-skill workers to high-wage firms) alter marginal products of all worker types within those firms, causing firms to adjust all posted wages — not just those directly constrained by the minimum wage.

Distance-dependent complementarity (paper’s definition): The property, proven as a Corollary to Proposition 1, that for a fixed worker type h, the partial elasticity of complementarity between h and any other type h’ is strictly increasing in h’ for h’ ≥ h (more distant high types are stronger complements) and strictly decreasing in h’ for h’ ≤ h (more distant low types are weaker substitutes / stronger complements). This pattern results from the division of labor: adding a very different worker type allows specialization gains that do not arise when adding similar-type workers competing for the same tasks.

The Earnings and Labor Supply of U.S. Physicians

Mon, 01 Jan 0001 00:00:00 +0000

Overview

Research Question. What do U.S. physicians earn, how is that earnings variation structured across geography and specialty, and how much does government healthcare payment policy shape those earnings and — through them — physicians’ labor supply and long-run talent allocation?

Data. The paper builds a novel administrative panel by merging the universe of U.S. federal individual income tax returns (2005–2017) with: the National Plan and Provider Enumeration System (NPPES) physician registry; Medicare billing records with procedure-level Relative Value Unit (RVU) rates (2012–2017); restricted-use American Community Survey responses; Social Security Administration demographic records; and medical school ranking and graduation data. The main sample covers 11.6 million physician-year observations for 965,000 unique physicians aged 20–70.

Earnings Facts. In 2017, average physician total individual income was $350,000 (median $265,000); the distribution is right-skewed — the top 1% of age-40–55 physicians averages $4.0 million. Physicians in aggregate earned $297 billion in pre-tax dollars, equaling 8.6% of total U.S. healthcare spending. The age-earnings profile is steep: earnings are approximately $60,000 during residency, rise to roughly $185,000 by the early thirties, and peak near $425,000 at age 50. Business income — systematically underreported in survey data (ACS estimates are approximately $140,000 lower than tax data during peak career years, almost entirely due to non-reporting of business income) — accounts for nearly one-quarter of earnings at age 50. Earnings differ sharply across specialties: primary care physicians average $201,200 (ages 40–55), about half the sample mean, while surgeons earn roughly twice as much.

Geographic Pattern. Contrary to the pattern for lawyers and workers broadly, physician earnings are not highest on the coasts. A movers-based event study (physicians who changed commuting zones once during 2005–2017) finds that roughly 70% of the cross-location income difference is driven by place rather than worker composition. A two-way fixed-effects variance decomposition reveals pronounced negative physician-location sorting: high-earning physicians tend to locate in lower-income commuting zones, while lower-earning physicians locate in higher-income areas — the opposite of the pattern for lawyers. Medicare’s relatively weak adjustment of reimbursement rates for local costs (the empirical elasticity of the Geographic Adjustment Factor to median household income is 0.09, versus 0.33 for a broader local price index) can, by the authors’ estimates, account for approximately one-third of this unusual geographic earnings pattern.

Government Influence — Medicare Price Changes. Using procedure-specific RVU changes as a simulated instrument for each physician’s Medicare price exposure, the authors find that a 10% increase in the Medicare price instrument leads to a 2.4% increase in professional earnings of physicians aged 40–55. The behavioral supply response is substantial: physicians bill 4.4% more RVUs (supply elasticity of 0.4 after netting out the mechanical component), of which 3.9% reflects more unique procedures and the rest a shift toward higher-paid procedures. Nearly all of the procedure-level supply increase (3.4 out of 3.8 percentage points) comes from treating additional patients rather than more frequent treatment of existing patients. Converting to pass-through: physicians retain $62 of each $100 in additional Medicare spending directly, or approximately $25 of each $100 of any insurance spending once Medicare’s documented spillover into private insurance rates is accounted for. For physicians aged 56–70, a 10% increase in earnings driven by reimbursement changes reduces retirement probability by 0.5 percentage points in that year.

Government Influence — ACA Insurance Expansion. Using county-level variation in pre-ACA uninsurance rates (as of 2013) as a source of differential exposure to the ACA’s Medicaid expansions and Marketplace subsidies (in 24 states expanding Medicaid in 2014 or early 2015), the authors estimate that a 10 percentage point higher baseline uninsurance rate led to 3.9% higher physician earnings four years post-expansion. Scaling by the first stage (a 10 p.p. higher uninsurance rate translating to 4.96 p.p. higher insurance coverage post-expansion), the implied elasticity of physician earnings to the insurance rate is 0.41. The ACA expansion also reduced retirement probability — a 10 p.p. higher insurance coverage rate leads to a 1 p.p. decline in retirement probability — consistent with a medium-run retirement-to-income elasticity of approximately −1.1. In aggregate, 6% of the $110 billion in annual ACA insurance expansion spending accrued to physicians personally, slightly below their 8.6% baseline share of healthcare spending.

Talent Allocation. Specialty choice is sticky and entry-restricted. The authors estimate a discrete-choice model of specialty choice using graduates of top-5 medical schools — physicians with effectively unconstrained specialty access — and an aggregate model using USMLE Step 1 score buckets as ability proxies. At the top of the ability distribution, higher specialty earnings strongly attract physicians: increasing primary care physicians’ hourly income from $98 to $168 per hour (the level of medicine subspecialists) would raise the share of top-5 medical school graduates choosing primary care by approximately 20 percentage points (nearly doubling their representation in primary care). Moving down the USMLE score distribution, the earnings coefficient falls monotonically and turns negative for the lowest score groups — consistent with the model’s prediction that entry restrictions cause higher-paying specialties to displace lower-ability applicants as earnings rise, rather than simply attracting more entrants. A more modest counterfactual — raising internal medicine earnings to dermatology levels — raises the average USMLE score in internal medicine by 10 points (from 230.2 to 239.6).

Scope Conditions. The earnings estimates are for the period 2005–2017. Pass-through estimates use a short-run price instrument; long-run pass-through may differ depending on private market spillovers and entry. The ACA analysis is restricted to 24 early-expanding states. The specialty-choice model is estimated on medical graduates entering the residency match; the extensive margin of entering medicine itself is not modeled. Health outcome effects of changing physician ability distributions are not estimated.

Q&A

Q1: What is the level and composition of physician earnings in the tax data, and how do they compare to survey-based estimates?

In 2017, average physician total individual income was $350,000 and median was $265,000; the top 1% of age-40–55 physicians earned $4.0 million on average, more than twice the average of the top 5%. Business income constitutes nearly one-quarter of earnings at age 50 and is concentrated among top earners: 80% of physicians in the top 1% have business income exceeding $25,000, versus 35% overall. ACS survey data for the same physicians underestimate earnings by approximately $140,000 (roughly one-third of the administrative mean) during peak career years, driven entirely by non-reporting of business income on the extensive margin.

Q2: What share of total U.S. healthcare spending do physician earnings represent, and what does this imply for policy?

Physicians in aggregate earned $297 billion pre-tax in 2017, equaling 8.6% of total U.S. healthcare spending (approximately $913 of the average American’s $10,611 annual healthcare expenditure). After applying a 30% income tax rate, after-tax physician earnings equal approximately 6% of total healthcare spending, or roughly 1% of GDP. The authors note this provides an upper bound on the magnitude of savings available from policies aimed at reducing physician incomes as a strategy for lowering overall healthcare spending.

Q3: How does the age-earnings profile of physicians evolve, and what drives growth during peak years?

Physician earnings average approximately $60,000 during residency, rise to roughly $185,000 by the early thirties, and peak near $425,000 at age 50, before declining gradually to approximately $270,000 in the late 60s. Growth during peak earning years (ages 40–55) is driven almost entirely by business income: average wages are approximately flat at $285,000 across this age range, while business income and the probability of filing Schedule C rise steadily.

Q4: How large and unusual is the geographic pattern of physician earnings, and what is the causal role of location?

Physician earnings are highest in lower-income states (not on the coasts), unlike lawyers and the broader workforce. A movers event study finds that approximately 70% of the cross-commuting-zone income difference is attributable to location rather than worker characteristics; within specialty the estimate rises to approximately 85%. A two-way fixed-effects variance decomposition (with limited-mobility-bias corrections following Andrews et al. 2008 and Kline et al. 2020) reveals pronounced negative physician-location sorting, with the corrected covariance between individual and location effects being 0.6–0.8 times the variance of location effects in magnitude but opposite in sign — a pattern that reverses to positive sorting when the same methods are applied to lawyers.

Q5: What instrument is used to identify the causal effect of Medicare price changes on physician earnings, and why is it valid?

The authors construct a physician-year “Medicare price instrument” by fixing each physician’s service mix at its 2012–2017 average and then multiplying those fixed quantities by annually-updated RVU rates, summing over services. Because the fixed quantity weights exclude behavioral responses, and because national RVU changes from CMS periodic reviews affect physicians differentially according to their pre-determined service mix, variation across physicians and over time is plausibly exogenous to individual physicians’ income shocks. Year-by-specialty fixed effects absorb common specialty-level price trends.

Q6: What are the magnitudes of the earnings and labor supply responses to Medicare price changes?

A 10% increase in the Medicare price instrument raises earnings of 40–55 year-old physicians by 2.4% (reduced-form), with a 2SLS elasticity of income to billed RVUs of 0.17. The total-RVU billing coefficient of 1.437 implies a supply elasticity of 0.437 (subtracting 1 for the mechanical component). At the procedure level, a 10% price increase for a specific code leads to 3.8% more billings for that code, of which 3.4 percentage points reflects treating additional patients. For physicians aged 56–70, a 10% earnings increase reduces that year’s retirement probability by 0.5 percentage points.

Q7: How does the ACA insurance expansion affect physician earnings and retirement, and what is the implied pass-through?

Counties with a 10 percentage point higher pre-ACA uninsurance rate saw 3.9% higher physician earnings by 2017 (four years post-expansion). Scaled by the first stage (4.96 p.p. higher coverage), the elasticity of physician earnings to insurance coverage is 0.41. A 10 p.p. higher insurance coverage rate leads to a 1 p.p. lower retirement probability post-expansion (medium-run elasticity of retirement to income of approximately −1.1). In aggregate, 6% of $110 billion in annual ACA expansion spending — roughly $7.1 billion, or about $8,400 per physician — accrued to physicians.

Q8: How does the earnings-specialty choice relationship vary across the physician ability distribution?

In the individual-level discrete-choice model estimated on top-5 medical school graduates (likely unconstrained in specialty choice), the coefficient on hourly earnings is 0.014. In the aggregate score-group model, the implied earnings coefficient is 0.016 for USMLE scores above 260 and declines monotonically to −0.008 for scores at or below 190. This negative coefficient for low scorers is consistent with the theoretical prediction that higher earnings attract high-ability physicians, leaving fewer slots for lower-ability applicants due to binding entry restrictions — not a reversal of preferences.

Q9: What are the quantitative implications for specialty choice if primary care incomes were raised to subspecialty levels?

Raising primary care hourly income from $98 to $168 (the level of medicine subspecialists) would increase the share of top-5 medical school graduates choosing primary care by approximately 20 percentage points (about 48% would enter primary care, versus the current share), nearly doubling their representation. Nearly half of these reallocations would come from procedural specialties. An analogous exercise raising internal medicine earnings to dermatology levels shifts the average USMLE score in internal medicine from 230.2 to 239.6 — a 10-point increase — as higher-scoring applicants displace lower-scoring ones within a fixed slot constraint.

Q10: What is the pass-through from Medicare reimbursements to physician earnings, and how does it compare to rent-sharing elsewhere?

Direct estimates imply physicians retain $62 of each $100 in additional Medicare spending. Accounting for Medicare’s documented spillover into private insurance rates (following Clemens and Gottlieb 2017), the pass-through drops to $25 per $100 of total insurance spending. The authors note this is substantially higher than the modest rent-sharing found for average workers in response to firm-level shocks (Card et al. 2018), but comparable to rent-sharing with high-skilled workers benefiting from patent rents (Kline et al. 2019).

Q11: Can Medicare’s geographic pricing policy explain the unusual geographic earnings pattern for physicians?

The elasticity of Medicare’s Geographic Adjustment Factor (GAF) to commuting zone median household income is 0.09, compared to 0.33 for a broader local price index. Using the authors’ short-run estimate that a 10% increase in Medicare prices raises earnings by 2.4%, a counterfactual simulation shows that if the GAF-to-income elasticity rose to 0.33 (aligning Medicare rates with the general cost-of-living gradient), the geographic physician earnings pattern would more closely resemble that of lawyers. The authors estimate that the gap in Medicare’s local cost adjustment explains approximately one-third of the unusual physician earnings geography, conditional on the short-run pass-through estimate.

Q12: How does the theoretical model of specialty choice and entry restrictions guide the empirical predictions?

The model features a unit mass of physicians with heterogeneous ability (Pareto-distributed) and idiosyncratic specialty preferences (exponentially distributed). Physicians choose whether to specialize in period 1; government sets reimbursement rates in period 2; physicians choose labor supply in period 3. With a fixed number of residency slots, higher specialty earnings raise the ability cutoff for entry (rationing by ability). This generates a key nonmonotonic empirical prediction: higher-ability physicians respond positively to earnings increases (choosing a specialty more frequently), while lower-ability physicians respond negatively (displaced by the shift upward in the ability cutoff). The model also implies that demand shocks are not moderated by contemporaneous entry, so incumbents capture the full rent — motivating the estimated pass-through.

Key Concepts

Medicare Price Instrument (Simulated RVU Instrument). A physician-year measure of Medicare payment exposure constructed by holding each physician’s service mix fixed at its 2012–2017 average and multiplying those fixed quantities by time-varying national RVU rates, then summing across services. This purges the instrument of behavioral responses, creating exogenous cross-physician variation in price exposure arising from the interaction of fixed service mix with national RVU policy changes.

Relative Value Unit (RVU). The unit by which Medicare defines and reimburses each physician service in the Physician Fee Schedule. RVUs are intended to reflect the time, effort, and resources required to provide each service, but are subject to periodic review by CMS’s RVU Update Committee (RUC) and influenced by political factors. Changes in RVUs translate directly into changes in Medicare reimbursement rates for affected services.

Pass-Through (Reimbursement to Earnings). The share of an additional dollar of Medicare (or insurance) spending that accrues to physicians personally as earnings, after accounting for practice costs, intermediaries, and behavioral responses. The paper estimates $62 per $100 of direct Medicare spending or $25 per $100 of total insurance spending (the latter accounting for Medicare’s spillover into private rates).

Negative Physician-Location Sorting. The empirical finding — robust to limited-mobility-bias corrections — that higher-ability (higher-earning) physicians disproportionately locate in lower-income commuting zones, while lower-earning physicians concentrate in higher-income areas. This is the opposite of the pattern for lawyers and for worker-firm matching in the broader labor literature. The paper attributes part of this pattern to Medicare’s incomplete geographic adjustment of reimbursement rates.

Ability Cutoff (am) in Residency Matching. In the paper’s theoretical model, the minimum ability level required to gain entry into a restricted-entry specialty. Because the number of residency slots is fixed, the cutoff rises when a specialty’s relative earnings increase (attracting more high-ability applicants), displacing lower-ability physicians who would otherwise have entered. This makes the earnings-specialty relationship nonmonotonic across the ability distribution.

Business Income (Pass-Through Entity Income). Income from physician-owned practices organized as sole proprietorships, S-corporations, or partnerships, reported on Schedule C or through pass-through entities rather than on Form W-2. In the tax data, business income accounts for nearly one-quarter of physician earnings at career peak and is the main source of earnings for top physicians, but is systematically underreported in survey data (ACS), leading to a roughly one-third underestimate of total earnings during peak years.

Geographic Adjustment Factor (GAF). A Medicare policy parameter that multiplies the national RVU rate to adjust physician reimbursements for local input costs (specifically physicians’ work, practice expenses, and malpractice). The paper documents that the GAF’s elasticity to local median household income is 0.09 — far below the 0.33 elasticity of the general local price index — constituting an effective subsidy to rural and lower-income markets relative to higher-income areas.

The Effect of Education Policy on Crime: An Intergenerational Perspective

Mon, 01 Jan 0001 00:00:00 +0000

This paper studies the intergenerational effects of education policy on crime, asking whether a compulsory schooling reform that reduced crime among those directly exposed also reduced crime among their children. The authors exploit the staggered municipal rollout of Sweden’s comprehensive school reform, implemented gradually between 1949 and 1962 across more than 1,000 municipalities, which increased compulsory schooling by one to two years, abolished tracking into academic and vocational streams after 6th grade, and introduced a uniform national curriculum. The parent generation consists of all individuals born in Sweden between 1945 and 1955 (approximately 447,000 men and 450,000 women), and their children form the child generation (426,721 sons observed from age 15 to 29). Crime is measured by administrative conviction records from the Swedish National Council for Crime Prevention covering 1973–2010.

The empirical strategy is difference-in-differences, comparing changes in conviction rates across cohorts in municipalities that implemented the reform at different times, with treatment assigned based on the parent’s birth municipality to avoid endogenous sorting bias. Standard errors are clustered at the municipality level. Parallel trends validity is supported by three tests: results are unchanged when municipality-specific linear trends are included, placebo tests using incorrect reform dates yield effects indistinguishable from zero, and residuals from crime regressions show no correlation with municipality-specific trends.

The main finding is a significant 0.79 percentage point (pp) decline in conviction rates among sons of fathers exposed to the reform (p-value < 0.002), representing a 3.4 percent reduction relative to baseline. The decline spans multiple crime types: violent crime fell by 0.27 pp, traffic-related crime by 0.45 pp, fraud by 0.22 pp, and other offenses by 0.41 pp — percentage reductions of three to six percent across categories. Multiple convictions fell by 0.43 pp (5.8 percent). These second-generation effects are driven entirely by paternal exposure: the impact of maternal reform exposure is an order of magnitude smaller and statistically insignificant, and the difference between paternal and maternal effects is itself significant (p-value 0.048 for any conviction, 0.009 for multiple convictions). Effects on daughters in the child generation are much smaller, with only the residual “other crime” category showing a significant 0.129 pp (15.5 percent) decline.

The asymmetry between paternal and maternal transmission is explained by the first-generation effects of the reform. For men, the reform increased schooling by 0.32 years, earnings by approximately 1 percent, the probability of white-collar employment by 1.2 percent, cognitive skills by 0.14 standard deviations, noncognitive skills by 0.17 standard deviations, spousal earnings by 1,022 SEK per year, and overall household income by approximately 1 percent. For women, the reform increased education by 0.21 years but did not raise earnings, household income, or white-collar employment, and did not reduce their already low crime rates. Only 13 percent of women in the 1945–55 cohorts were at or below the compulsory schooling threshold, versus 20 percent of men, substantially limiting the reform’s bite for women.

A mediation analysis decomposes the intergenerational transmission through three channels: fathers’ education accounts for 64.8 percent of the indirect effect, the decline in paternal crime accounts for 18.5 percent, and the increase in household disposable income accounts for 16.7 percent. The direct effect (unexplained by these mediators) accounts for 48 percent of the total effect. The paper also documents that children of treated fathers attended schools with lower peer crime rates and lived in neighborhoods with lower youth crime rates, supporting a neighborhood and peer effects channel alongside human capital and role-model channels.

Scope conditions: the study covers male children observed to age 29 in Sweden; results apply to a context of near-universal administrative records, a specific postwar schooling reform, and cohorts born 1945–1955 in a Nordic welfare state.

Q: What is the magnitude of the intergenerational crime reduction caused by the reform?

A: Sons of fathers exposed to the reform experienced a 0.79 pp decline in conviction rates (p-value < 0.002), corresponding to a 3.4 percent reduction relative to the baseline conviction rate of approximately 24 percent for the child generation by age 29. Multiple convictions fell by 0.43 pp, a 5.8 percent reduction. These magnitudes are similar in percentage terms to the direct crime reduction the reform caused among fathers themselves.

Q: Does the reform’s intergenerational effect on crime differ by the sex of the treated parent?

A: Yes. The intergenerational effect is driven entirely by paternal exposure to the reform: the effect of maternal exposure is an order of magnitude smaller and insignificant at any conventional significance level. The difference between paternal and maternal effects is statistically significant, with p-values of 0.048 for any conviction and 0.009 for multiple convictions. The paper attributes this asymmetry to the much weaker first-generation effects of the reform on women’s earnings, household income, crime rates, and neighborhood sorting.

Q: Which crime types declined significantly among sons of treated fathers?

A: Significant declines were found in violent crime (−0.27 pp, Romano-Wolf p-value 0.09), traffic-related crime (−0.45 pp, RW p-value 0.057), fraud (−0.22 pp, RW p-value 0.09), and other offenses (−0.41 pp, RW p-value 0.047), each representing a three-to-six percent reduction relative to the mean incidence of that crime type. Property crime and drug-related crime did not show significant declines.

Q: What were the direct effects of the reform on the parent generation’s human capital?

A: For men, the reform increased schooling by 0.32 years, earnings by approximately 1 percent, the probability of white-collar employment by 1.2 percent, cognitive skills by 0.14 standard deviations, and noncognitive skills by 0.17 standard deviations, all measured at military enlistment. Spousal earnings increased by 1,022 SEK per year and overall household income rose by approximately 1 percent. For women, education increased by 0.21 years and marriage market matches improved, but earnings, household income, and white-collar employment probability did not increase significantly.

Q: Why did the reform have stronger first-generation effects on men than on women?

A: The average share of individuals at or below the compulsory schooling threshold — the margin at which the reform was binding — was 20 percent for men but only 13 percent for women in the 1945–55 cohorts. Because fewer women were constrained by the old compulsory schooling limit, the reform increased their education by less and produced smaller downstream effects on earnings and labor market outcomes.

Q: What are the three channels through which the reform reduces child crime, and what is the relative contribution of each?

A: The paper identifies three channels: (1) the human capital channel, whereby increased parental education raises household income and child human capital; (2) the role model channel, whereby reduced paternal crime participation directly reduces son’s crime; and (3) the neighborhood and peer effects channel, whereby higher income enables sorting into lower-crime neighborhoods and better schools. The mediation analysis attributes 64.8 percent of the indirect effect to fathers’ increased education, 18.5 percent to the decline in paternal crime, and 16.7 percent to the increase in household disposable income. The direct effect unexplained by these three mediators accounts for 48 percent of the total effect.

Q: What is the role model effect, and how strong is it in the parent generation?

A: The role model channel operates through the strong intergenerational persistence in crime participation: sons are 2.06 times more likely to participate in crime if their fathers have been convicted (Hjalmarsson and Lindquist, 2012). The reform reduced the incidence of any conviction among treated men by 1.5 pp and repeat convictions by 1.5 pp — the latter representing an approximately 8 percent decline from a lower base. For women, the reform produced no reduction in crime, providing no analogous role model improvement through the maternal channel.

Q: How does neighborhood and school peer quality change for children of treated fathers versus treated mothers?

A: Sons of fathers exposed to the reform moved to neighborhoods with lower youth crime rates (−0.087 pp) and attended schools with lower peer crime rates (−0.077 pp). In contrast, sons of mothers exposed to the reform experienced higher neighborhood crime rates (p-value 0.06) and higher school peer crime rates (p-value 0.01), the opposite direction. This asymmetry helps explain why only paternal treatment generates significant second-generation crime reductions.

Q: What happens to other outcomes for children of treated fathers beyond crime?

A: Sons experienced a 1.2 percentile increase in school GPA (RW p-value 0.05), a 2.3 pp increase in employment (RW p-value 0.04), a matching 2.3 pp decline in unemployment benefit receipt, a reduction in hospitalization of 2.4 days (17 percent, RW p-value 0.02), and a decline in prescribed drugs of 31 doses (2.8 percent, RW p-value 0.09). The decline in prescribed drugs for sons is driven by nervous system drugs and painkillers, pointing to improved mental health. Daughters of treated fathers show a significant reduction in welfare dependency but no other significant improvements.

Q: How does the paper validate the parallel trends assumption?

A: Three tests are reported. First, including municipality-specific linear trends leaves the main coefficient unchanged (p-value 0.85 for the trend terms themselves). Second, placebo contrasts using incorrect reform implementation dates produce effects indistinguishable from zero for all tested dates. Third, graphical inspection of regression residuals shows no correlation with municipality-specific trends. Together these provide strong support for the identifying assumption.

Q: Are the results sensitive to using a linear probability model instead of a nonlinear model?

A: A Monte Carlo experiment was conducted replicating observed crime rates across municipalities and imposing the estimated average treatment effect. Assuming the true data-generating process is a probit model, the linear probability model biases the estimated average effect upward by only 5 percent — a difference that is statistically indistinguishable from zero in the actual data — validating the OLS approach.

Q: What is the broader policy implication of the findings?

A: The results show that well-designed education policies can reduce crime not only among the directly treated generation but also among their children, amplifying the social benefits of reform across generations. The authors interpret this as consistent with the theoretical framework of Becker and Tomes (1979) on intergenerational transmission of human capital, and suggest that education policy evaluations that focus only on the treated generation substantially understate total social returns.

Intergenerational transmission of education reform effects: the phenomenon whereby an education policy that raises parental human capital produces improvements in children’s outcomes — including crime — through multiple channels including resource increases, parental role modeling, and neighborhood sorting, beyond any direct policy exposure of the child generation.

Comprehensive school reform (Sweden, 1949–1962): a nationally mandated restructuring of compulsory schooling that extended required attendance by one to two years, abolished selection into academic and vocational tracks after 6th grade, and introduced a uniform national curriculum, rolled out staggered across 1,055 Swedish municipalities.

Human capital channel: the mechanism by which increased parental education raises earnings and household income, enabling greater investments in children’s development and exploiting complementarity between parental and child human capital in the skill production function, thereby raising children’s opportunity cost of crime.

Role model channel: the mechanism by which reduced parental crime participation directly reduces children’s crime, operating through the transmission of norms and information across generations; identified empirically by the strong intergenerational correlation in convictions (sons with convicted fathers are 2.06 times more likely to be convicted themselves).

Neighborhood and peer effects channel: the mechanism by which increased parental income from the reform enables sorting into residential neighborhoods and schools with lower youth crime rates, exposing children to peers less involved in illegal activities and thereby reducing their own crime participation.

Mediation analysis: a decomposition method following Heckman, Pinto, and Savelyev (2013) that quantifies the share of a total treatment effect accounted for by specific intermediate variables (here: fathers’ education, fathers’ crime participation, and household disposable income) versus the direct unexplained effect.

Conviction rate: the proportion of individuals in a given generation and observation window who received at least one criminal conviction in Swedish administrative records; used as the primary outcome measure because it captures offenses that led to a court appearance, excluding minor infractions resolved by direct fine.

The Impact of EITC on Education, Labour Market Trajectories, and Inequalities

Mon, 01 Jan 0001 00:00:00 +0000

This paper studies the effect of the Earned Income Tax Credit (EITC) on educational attainment and labor market trajectories through two complementary approaches. Using policy discontinuities at U.S. state borders—exploiting variation in state EITC generosity set as a percentage of the federal EITC—the paper finds that an increase in the state EITC leads to a statistically significant increase in the high school dropout rate. The mechanism is that a tax credit targeted at low-wage (low-skilled) workers increases the value of low-skilled employment and reduces the relative return to schooling, generating a powerful disincentive to pursue long-term studies. A structural life-cycle matching model with directed search and endogenous educational choices, search intensities, hirings, hours worked, and separations is developed to quantify the long-run general equilibrium effects: in the long run, EITC reduces the proportion of high-skilled workers, with ambiguous effects on income inequality that depend on the competing channels through which EITC affects both the supply and demand sides of the labor market.

Summary based on a working paper version, AI-assisted and human-reviewed. See the linked published article for the authoritative version.

In depth

Q1. What is the empirical strategy for identifying the effect of EITC on education?

The paper identifies the causal effect of state EITC on education by exploiting policy discontinuities at U.S. state borders, comparing contiguous PUMA pairs on opposite sides of state borders that differ in state EITC generosity. State EITC rates are set as a percentage of the federal EITC and have varied considerably since the mid-1980s. Borrowing from the minimum wage literature (Dube et al., 2010; Hagedorn et al., 2015), the border-discontinuity design controls for local labor market conditions that vary continuously across state borders while isolating the effect of the discrete EITC policy difference.

Q2. What is the labor market mechanism linking EITC to education?

EITC raises the value of low-skilled employment by directly increasing the earnings of low-wage workers, which in turn reduces the relative return to investing in education, generating a powerful disincentive to pursue long-term studies. When directed search is present—as supported by recent empirical studies—educational decisions affect both job-finding probabilities and labor incomes over the life cycle. EITC’s subsidization of low-skilled work contracts the education premium in this framework, making the forgone earnings cost of staying in school larger relative to the low-skilled employment option supported by the EITC.

Q3. What does the life-cycle matching model contribute?

The structural life-cycle matching model with directed search and endogenous educational choices, search intensities, hirings, hours worked, and separations quantifies the general equilibrium and long-run effects of EITC that purely reduced-form studies cannot capture—including the feedback of an expanded low-skilled labor force on equilibrium wages and job creation. The model endogenizes labor demand, capturing both household responses (education, hours, search intensity) and firms’ responses (job creation and destruction). It is solved and estimated to replicate the life-cycle profile of labor market variables.

Q4. What are the long-run implications for inequality?

In the long run, EITC reduces the proportion of high-skilled workers in the economy, with ambiguous effects on income inequality because of offsetting channels: EITC directly increases earnings of low-skilled workers, but by expanding the supply of low-skilled labor it may also depress low-skilled wages; additional channels through unemployed workers’ search effort and employed workers’ hours further complicate the net effect. The model is used to determine the optimal design of the EITC that balances the income-support objective against these unintended long-run effects.

Key concepts

state EITC : a supplement to the federal Earned Income Tax Credit set as a fixed percentage of the federal credit; varies across states; used in this paper as the identification source for the effect of EITC generosity on education via border discontinuities. directed search : a labor market framework in which workers and firms direct their search to specific submarkets with posted wages; in this setting, educational choice affects both job-finding probabilities and wages over the life cycle, amplifying the disincentive effects of EITC on education relative to random-search models. education-EITC disincentive : the mechanism by which EITC targeted at low-wage workers raises the relative value of low-skilled employment and reduces the return to schooling, generating an increase in high school dropout rates as a side effect of the anti-poverty policy.

The Power of Proximity to Coworkers

Mon, 01 Jan 0001 00:00:00 +0000

This paper studies how physical proximity to coworkers affects on-the-job training and productivity, using software engineers at a Fortune 500 online retailer observed from 2019 to 2024. The authors exploit two quasi-experimental shocks to proximity: the office closures of 2020, which eliminated proximity differentials that previously existed across team types, and the firm’s subsequent return-to-office (RTO) mandates in 2022 and 2023, which restored proximity for co-located teams while leaving geographically-distributed teams apart. The core identification strategy is a difference-in-differences design comparing engineers whose teams were co-located in a single headquarters building to those whose teams were split across two buildings a ten-minute walk apart — a distinction that became immaterial once offices closed.

The central finding is that sitting near teammates substantially increases the digital feedback engineers receive on their code. Before the office closures, engineers on co-located teams received 23.9% (1.92 comments per program) more code review feedback than engineers on multi-building teams. Once offices closed, this advantage narrowed by 18.3% (1.47 comments per program, p-value = 0.0026). The lost comments were disproportionately those predicted by a machine-learning classifier to be helpful, actionable, well-reasoned, and impactful, with high-quality comments declining by 21–23% — exceeding the overall volume decline. Face-to-face and digital communication are complements, not substitutes: proximate engineers drew on a wider pool of reviewers and asked 48.4% more follow-up questions, a differential that vanished once offices closed.

Proximity’s effects are highly heterogeneous. Gains in feedback are concentrated among less-tenured, younger, and female engineers — those with the most to learn. Junior engineers on co-located teams lost 2.03 more comments per program upon office closure than junior engineers already on distributed teams (p-value = 0.001); young engineers lost 2.47 more comments (p-value = 0.0001). Female engineers lost 38.9% more comments than their distributed female counterparts (p-value < 0.0001), partly because women stop asking as many people for feedback when they cannot do so in person.

Proximity improves code quality for inexperienced engineers. Around the second RTO (three days per week), engineers on co-located teams became 2.2 percentage points less likely to add files subsequently deleted — a measure of churn — and 1.4 pp less likely to introduce bugs, relative to distributed teams (p-values of 0.041 and 0.022 respectively). These gains were roughly twice as large for less-tenured and younger engineers. The benefits persist: engineers who spent more pre-closure time on co-located teams continued to write higher-quality code during the fully remote period.

However, mentorship is costly for those who provide it. Senior engineers on co-located teams wrote 0.76 fewer programs per month in the main codebase before closures (p-value = 0.0005), a gap that closed when offices did and widened again during the second RTO. The firm faces a fundamental tradeoff: proximity accelerates junior engineers’ human capital development while reducing experienced engineers’ immediate coding output.

These dynamics shape hiring. The firm shifted toward hiring older, more experienced engineers during closures — buying talent it could no longer build in-house — and back toward younger hires once offices reopened. Nationally, young college graduates in remotable occupations (classified per Dingel and Neiman, 2020) experienced a 0.88 pp increase in unemployment between 2017–2019 and 2022–2024, while older graduates saw a marginal decline of 0.11 pp. A triple-difference estimate finds a 0.65 pp greater increase in young workers’ unemployment in remotable versus non-remotable occupations (p-value = 0.029), a pattern that predates generative AI diffusion and is robust to controlling for AI exposure. Back-of-the-envelope, remote work accounts for an estimated 64% of the total unemployment increase among young college graduates over this period.

The paper also documents that proximity is fragile: a ten-minute walk between two buildings reduces feedback as much as being multiple states away, and even a single distant teammate imposes negative externalities on those who remain co-located, reducing their feedback by 1.71 comments per program (p-value = 0.095) via a “one Zoom, all Zoom” norm.

Q: What is the main identification strategy for the office-closure analysis, and what is the key parallel-trends evidence?

A: The authors compare engineers on co-located teams (all members in one headquarters building) to those on multi-building teams (split across two buildings a ten-minute walk apart), before and after the March 2020 office closures. Co-located teams lost more proximity when offices closed, while multi-building teams experienced a smaller shock, enabling a difference-in-differences design. Pre-closure trends in feedback are parallel across the two team types (Figure I), supporting the identifying assumption. Standard errors are clustered by team, the unit of treatment assignment.

Q: How large is the effect of proximity on total code review feedback, and how is it broken down by feedback source?

A: Before closure, co-located engineers received 23.9% (1.92 comments per program) more feedback than multi-building engineers. The DiD estimate indicates that losing proximity reduced feedback by 18.3% (1.47 comments per program, p-value = 0.0026, Column 3 of Table II). This decline stems entirely from reduced feedback from teammates; there is no detectable effect on feedback from engineers on other teams — a placebo check that supports the identification strategy and rules out explanations based on differential project complexity.

Q: How does proximity affect the quality — not just the quantity — of code review comments?

A: Using a gradient-boosted decision tree trained on 5,377 human-labeled comments, the authors predict comment quality across all 174,014 comments. Losing proximity reduced comments predicted to be helpful, well-reasoned, actionable, and likely to change the code by 21–23% — exceeding the 18.3% overall volume decline. The residual comments were lower quality: 2.9 pp fewer were helpful (p-value = 0.039), 1.7 pp fewer explained their reasoning (p-value = 0.094), and 1.9 pp fewer were likely to change the code (p-value = 0.072).

Q: What mechanisms drive the complementarity between face-to-face interaction and digital feedback?

A: Proximity increases feedback on both the extensive and intensive margins. On the extensive margin, co-located engineers draw on a wider pool of reviewers, returning less frequently to the same commenter. On the intensive margin, losing proximity reduces follow-up questions by 48.4% (0.12 questions per program, p-value = 0.0083), accounting for roughly half of the total feedback decline. The other half comes from reduced initial reviewer feedback. References to other communication channels (e.g., Slack) within code reviews also decline when proximity is lost, confirming that face-to-face and digital communication are complements.

Q: How small a physical barrier is sufficient to reduce feedback substantially?

A: A ten-minute walk between two buildings on the same headquarters campus reduces feedback by as much as being multiple states away — both groups receive significantly less feedback than engineers whose entire team sits in the same building (Figure Ib). This finding aligns with research on academics showing that different floors or buildings reduce coauthorship, and extends it to daily teammates sharing projects.

Q: What are the externality effects of a single distant teammate?

A: Through the firm’s implicit “one Zoom, all Zoom” norm, even one teammate in a different location shifts all team meetings to video calls. Engineers in the same building exchange 14.5% less feedback when even one teammate is in another building versus when all teammates are co-located (p-value = 0.037). When a new hire transforms a co-located team into a multi-building one, feedback between the original co-located teammates drops by 1.71 comments per program (p-value = 0.095); adding a new co-located hire produces no such decline.

Q: How does the effect of proximity on feedback differ by engineer tenure, age, and gender?

A: Less-tenured engineers on co-located teams lost 2.03 more comments per program upon closure than less-tenured engineers on distributed teams (p-value = 0.001). Young engineers (under 29) on co-located teams lost 2.47 more comments per program than young distributed engineers (p-value = 0.0001). Female engineers on co-located teams lost 38.9% (3.71) more comments than female engineers on distributed teams (p-value < 0.0001), partly because women draw feedback from 14.7% fewer people when proximity is lost (p-value = 0.0078), compared to a negligible 2.6% decline for men. The extra feedback women receive in person is of higher quality, not rude or condescending.

Q: How is the effect of proximity on code quality identified using the RTO design, and what are the magnitudes?

A: The RTO design compares engineers on co-located (same-city) teams to geographically-distributed teams across three periods: full closure, first RTO (two days per week), and second RTO (three days per week). The authors predict γ_closed ≈ 0 (office assignment irrelevant when closed) and γ_2nd_RTO > γ_1st_RTO (more in-office days means more proximity). Both predictions are confirmed. During the second RTO, co-located engineers were 2.2 pp less likely to add files later deleted (p-value = 0.041) and 1.4 pp less likely to introduce bugs (p-value = 0.022), with effects roughly twice as large for less-tenured and younger engineers.

Q: Does the benefit of co-location on code quality persist after remote work resumes?

A: Yes. After all engineers returned to remote work, those who had been on co-located teams pre-closure were 2.37 pp less likely to write disposable code (p-value = 0.013) and 3.09 pp less likely to introduce bugs (p-value = 0.0012). Code quality improves monotonically with the number of pre-closure months spent on co-located teams (Figure A.5). These gaps persist when including current team fixed effects, meaning within the same post-closure team, the previously co-located engineer writes higher-quality code.

Q: What is the cost of mentorship for senior engineers, and how does it manifest in coding output?

A: Senior engineers on co-located teams wrote 0.76 fewer programs per month in the main codebase when offices were open (p-value = 0.0005). Once offices closed, this gap disappeared, and senior engineers who lost proximity to their teammates saw a relative increase in output of 0.58 programs per month (p-value = 0.0014). During the second RTO, engineers with more than sixteen months of tenure on co-located teams wrote fewer programs, while no significant difference emerged for less-tenured engineers. Overall, the DiD estimate indicates losing proximity to teammates increases immediate output by 0.48 programs per month (p-value = 0.0002).

Q: How does the firm’s hiring age distribution respond to changes in proximity?

A: When offices were closed, the firm shifted toward hiring older engineers: the share of hires under age 29 fell from over half pre-closure to less than a third during the closure. After the RTOs, the firm shifted back toward younger hires. Geographic variation reinforces this: headquarters-campus hires were 7–10 years younger than those hired into distributed roles when offices were open; this gap narrowed substantially during closures when everyone was far from teammates.

Q: Does proximity affect which engineers are poached by other firms?

A: Yes. During the office closures, 1.2% of co-located engineers were poached per month, compared to 0.9% of multi-building engineers of similar tenure, age, and engineering group (p-value = 0.044). By the end of the closure period, nearly a quarter of co-located engineers had been poached versus a sixth of multi-building engineers. There is a dose response: more pre-closure time on co-located teams predicts higher poaching rates. The effect is concentrated among younger and female engineers, consistent with their feedback building more transferable general human capital. Tenure does not moderate the poaching effect, consistent with less-tenured engineers’ feedback being more firm-specific.

Q: What does national unemployment data show about the scarring effects of remote work on young workers?

A: Between 2017–2019 and 2022–2024, young college graduates (under 29) in remotable occupations experienced a 0.88 pp increase in unemployment (p-value < 0.00001), while older graduates in the same occupations saw a marginal decline of 0.11 pp (p-value = 0.053). A triple-difference regression finds a 0.65 pp greater increase in young workers’ unemployment in remotable versus non-remotable occupations (p-value = 0.029). Back-of-the-envelope, scaling this estimate by the 61% share of young graduates in remotable jobs predicts a 0.4 pp increase in young college graduates’ overall unemployment — equal to 64% of the realized 0.63 pp increase.

Q: Is the unemployment increase among young workers in remotable jobs driven by generative AI rather than remote work?

A: The authors argue against AI as the primary driver on two grounds. First, the uptick in young workers’ unemployment in remotable occupations predates the rapid diffusion of generative AI. Second, the differential increase is not concentrated among occupations with the highest AI task exposure. The triple-difference estimate is robust to controlling for occupational AI exposure using the Eisfeldt, Schubert and Zhang (2023) index. The authors acknowledge that AI may become more important as it diffuses further.

Q: How do young workers’ own office attendance decisions reflect the value of proximity?

A: At the partner firm, engineers under 29 were 8.8 pp (37.6%) more likely to come into the office during the RTOs than older engineers when on co-located teams (solid line in Figure VIIa). This difference was roughly halved on geographically-distributed teams (p-value of difference = 0.0085), indicating that the draw is specifically proximity to teammates. Co-located managers raised attendance by 2.6 pp, while co-located teammates raised it by 5.1 pp. Nationally, Stack Overflow survey data show nearly half of engineers under 25 are in the office each day, versus a quarter of older engineers (p-value < 0.00001).

Q: What does the paper imply about why remote work was rare before the pandemic despite workers’ stated preferences for it?

A: The paper offers a resolution: firms may have recognized that the value of the office lies in training for tomorrow and improving the quality — not the quantity — of work today. Remote work boosts immediate output, especially for experienced workers, but it reduces mentorship and long-run skill development. The tradeoff between current and future productivity, and between individual and collective returns to human capital, explains why firms historically resisted remote work even when workers preferred it and short-run output was unaffected.

Q: What are the implications for gender equity in remote work?

A: The findings suggest remote work has ambiguous gender effects. While remote work may help working mothers remain in the workforce, it appears costly for young women’s professional development, which is especially sensitive to physical proximity. Women receive substantially more high-quality feedback when co-located, draw feedback from a wider network in person, and lose disproportionately more feedback when proximity is lost. Young female engineers on co-located teams were also disproportionately poached — suggesting their human capital gains from co-location are more general and transferable.

Code review feedback: The digital comments engineers exchange when reviewing each other’s code before it is merged into the live codebase; the paper’s primary measure of on-the-job training and mentorship investment, distinct from mere volume because the authors also classify comments by helpfulness, reasoning, actionability, and expected impact using supervised machine learning.

Co-located team: A team in which all members are assigned to the same office building; the treatment group in the difference-in-differences designs, distinguished from multi-building teams (split across two headquarters buildings, a ten-minute walk apart) and geographically-distributed teams (members in different cities or permanently remote).

One Zoom, all Zoom norm: The implicit team practice of holding all meetings virtually if any single teammate cannot be physically present; the mechanism by which one distant colleague generates negative externalities for the remaining co-located teammates, reducing their in-person interaction and feedback.

Proximity fragility: The finding that even small physical barriers — a ten-minute walk between buildings — reduce feedback as much as being multiple states away, implying that the relationship between physical distance and mentorship is highly nonlinear near zero.

Churn (disposable code): Files that are added by an engineer but deleted within the subsequent six months, either because the code was poorly structured or because it introduced a feature later abandoned; used as one of two code quality proxies in the RTO analysis (occurring in 15% of programs).

Bugs (immediate reversions): Programs that are immediately and fully reverted after being merged, typically indicating the engineer’s changes precipitated an emergency requiring rollback to an earlier version; used as the more serious of the two code quality proxies (occurring in 3.5% of programs).

Scarring effects: The persistent adverse impact on young workers’ human capital and labor market outcomes from reduced mentorship during the remote work period; manifested both as lower code quality at the individual level and higher unemployment rates nationally among young college graduates in remotable occupations.

Remotable occupation: An occupation classified by Dingel and Neiman (2020) as feasibly performed from home; used to construct the national triple-difference analysis comparing age gaps in unemployment across remotable and non-remotable jobs before and after the pandemic.

The Productivity of Professions: Evidence from the Emergency Department

Mon, 01 Jan 0001 00:00:00 +0000

This paper studies the productivity of nurse practitioners (NPs) versus physicians performing overlapping tasks in Veterans Health Administration (VHA) emergency departments (EDs), exploiting a quasi-experiment created by the VHA’s December 2016 grant of full practice authority to NPs. The identification strategy instruments patient assignment to NPs versus physicians using quasi-random variation in the number of NPs on duty on a given ED-day, conditional on ED-by-time-category fixed effects. The sample covers 1.1 million ED visits across 44 VHA EDs from January 2017 to January 2020, seen by 1,348 physicians and 156 NPs. The instrument is validated by demonstrating balance in patient observable characteristics across values of the instrument, stability of IV estimates across 256 combinations of patient covariate controls, and absence of spillover effects from NP presence onto physician performance.

On average in the ED setting, NPs increase patient length of stay by 11 percent (approximately 18 additional minutes) and raise the cost of the ED visit by 7 percent (approximately $66 per visit). NPs raise the 30-day preventable hospitalization rate by 0.25 percentage points, a 20 percent increase relative to the mean. No statistically significant effect on 30-day mortality is detected (95 percent confidence interval: -0.34 to 0.11 percentage points). OLS estimates carry the opposite sign because NPs are assigned healthier patients in observational data; the IV design corrects for this selection.

The average NP-physician performance gap varies systematically by case complexity and severity. For the highest-complexity quartile of cases (by Elixhauser comorbidities), NPs increase ED costs by 12 percent and length of stay by 28 percent. For cases at or above the 95th percentile of severity (based on 30-day mortality by diagnosis), NPs increase ED costs by 25 percent, length of stay by 99 percent, and admissions by 26 percentage points (42 percent relative to the mean), while reducing 30-day preventable hospitalization by 3 percentage points — suggesting that NPs’ higher care intensity partially offsets worse intrinsic skill for the most severe cases. For lower-complexity cases, the cost and length-of-stay gaps are smaller, but NPs still significantly raise preventable hospitalizations.

NPs exhibit clinical decision-making patterns consistent with lower diagnostic skill: they are more likely to order consults (2.6 percentage points, or 11 percent of the mean), CT scans (1.2 percentage points, or 8.3 percent), and X-rays (2.0 percentage points, or 6.9 percent). NPs lower opioid prescriptions by 1.8 percentage points (20 percent of the mean) and raise antibiotic prescriptions by 4.0 percentage points (6.3 percent of the mean), consistent with threshold adjustment under lower diagnostic skill with asymmetric error costs. Downstream, patients treated by NPs incur similar opioid use disorder rates despite lower opioid prescribing, and higher infection-related return visit rates despite higher antibiotic prescribing.

Counterfactual analysis finds that allocating one quarter of ED patients to NPs increases net spending by $129 million per year to the VHA after accounting for NPs’ lower wages (approximately half of physicians’). However, deploying NPs exclusively to the least-complex quarter of cases reduces net spending to approximately one-fifth of this amount.

A distributional analysis deconvolving provider-specific IV estimates reveals that within-profession productivity variation substantially exceeds the average between-profession gap. The interquartile range in annual spending attributable to provider productivity within each profession is approximately $900,000, roughly three times the mean annual spending difference between the average NP and the average physician. A randomly chosen NP outperforms a randomly chosen physician in up to 38 percent of pairs. Within professions, individual provider productivity shows essentially no relationship with wages or case complexity assigned, whereas between professions, case assignment and wages are strongly sorted by professional class.

Q: What is the core research question? A: The paper asks whether NPs and physicians, who perform overlapping tasks in the ED but differ sharply in training, selectivity, and pay, differ in productivity, and how that average between-profession difference compares to productivity variation within each profession. It also asks what mechanisms drive any observed gap and how case assignment responds to provider skill differences.

Q: What is the identification strategy and why is it credible? A: The authors instrument patient assignment to NPs with the number of NPs on duty on the ED-day, conditional on ED-by-year, ED-by-month, ED-by-day-of-week, and ED-by-hour fixed effects. Credibility rests on: provider schedules being set months in advance, decoupling NP availability from arriving patient characteristics; patient characteristics being well balanced across values of the instrument conditional on fixed effects; IV estimates being stable across all 256 covariate-control combinations; and on-duty physician and NP characteristics also being balanced across the instrument.

Q: What are the main average effects of NPs on resource use? A: IV estimates show NPs increase patient length of stay by 11 percent (approximately 18 minutes) and ED cost by 7 percent (approximately $66 per visit). There is no significant average effect on inpatient admissions in the overall sample, though NPs significantly raise admissions for high-severity cases.

Q: What is the effect of NPs on patient health outcomes? A: NPs raise 30-day preventable hospitalizations by 0.25 percentage points, a 20 percent increase relative to the mean. The 95 percent confidence interval for 30-day mortality is -0.34 to 0.11 percentage points, implying no statistically significant mortality effect in the overall sample.

Q: Why do OLS and IV estimates have opposite signs? A: In observational data, NPs treat healthier patients than physicians: NP patients are younger (60.7 versus 62.5 years), have fewer Elixhauser comorbidities (3.2 versus 3.7), and have fewer prior inpatient stays (0.4 versus 0.7). This selection causes OLS estimates of NP effects to be negative. The IV corrects for this by exploiting quasi-random variation in NP availability; IV estimates are stable across all combinations of patient controls, consistent with the instrument being orthogonal to unobservable patient health.

Q: How does the NP-physician performance gap vary with case complexity and severity? A: For the highest-complexity quartile, NPs increase length of stay by 28 percent and ED costs by 12 percent without a significant preventable hospitalization effect. For cases at or above the 95th severity percentile, NPs increase length of stay by 99 percent, ED costs by 25 percent, and admissions by 26 percentage points (42 percent relative to the mean), while reducing 30-day preventable hospitalization by 3 percentage points. For lower-complexity quartiles, NPs show smaller cost and length-of-stay effects but significantly raise preventable hospitalizations, suggesting the higher care intensity at high severity compensates for lower skill.

Q: What does the heterogeneity by severity imply for optimal case assignment? A: The pattern is consistent with skill-task matching: NPs have a comparative and absolute disadvantage in complex cases, so optimal assignment directs less complex cases to NPs and fewer patients to NPs when physicians are more available. Empirically, NPs are indeed assigned healthier patients from the available pool, and are assigned a modestly smaller share when the ED is less busy.

Q: What mechanisms explain the average NP-physician gap? A: Three mechanisms are examined. First, experience: a one-standard-deviation increase in specific experience is associated with a 5.8 percent decline in the NP-physician length-of-stay gap, and general experience with a 10 percent decline; however, experience does not significantly narrow the preventable hospitalization gap. Second, information acquisition: NPs order more consults, CT scans, and X-rays, consistent with compensating for lower diagnostic skill. Third, prescription thresholds: NPs reduce opioid prescribing by 20 percent and raise antibiotic prescribing by 6.3 percent, consistent with threshold adjustment under asymmetric error costs, but downstream outcomes are not improved correspondingly.

Q: What do prescription patterns and downstream outcomes reveal about NP diagnostic skill? A: NPs prescribe fewer opioids yet patients treated by NPs obtain similar downstream opioid use disorder rates; NPs prescribe more antibiotics yet patients treated by NPs have higher rates of return visits with infections. This pattern is consistent with NPs exhibiting higher rates of both false positives and false negatives, not merely adjusted thresholds, suggesting genuinely lower diagnostic skill rather than threshold differences alone.

Q: What do counterfactual cost calculations show? A: Allocating one quarter of ED patients to NPs raises non-wage spending by $197 million per year to the VHA; after accounting for NP wages being half of physician wages (approximately $120,000 versus $240,000 per year), net cost is still $129 million per year. Restricting NP deployment to the least-complex quarter of cases reduces net spending to approximately one-fifth of this amount, illustrating that targeted case assignment substantially improves NP cost-effectiveness.

Q: How large is within-profession productivity variation relative to between-profession differences? A: The interquartile range in annual spending attributable to provider productivity within each profession is approximately $900,000, roughly three times the mean annual spending difference between the average NP and the average physician. A randomly chosen NP outperforms a randomly chosen physician in up to 38 percent of random pairs. The authors conclude that, despite stark differences in training and selection between professions, within-profession variation dominates.

Q: Is individual provider productivity reflected in wages or case assignment within professions? A: Within each profession, provider productivity shows essentially no relationship with wages or with the complexity of assigned cases. This contrasts sharply with between-profession patterns, where professional class strongly predicts both wages (NPs earn approximately $120,000 per year versus $240,000 for physicians) and assigned case complexity. The authors interpret this as evidence of informational and organizational frictions in recognizing individual productivity within professional classes, and note that professional class is a far stronger predictor of pay and case assignment than is individual productivity.

Q: How do complier characteristics relate to the broader patient population? A: Compliers — cases whose provider type is determined by the instrument — are healthier than the average case: younger, with fewer comorbidities, fewer prior inpatient stays, and lower predicted mortality. Never-takers are riskier than the average case. There are no always-takers since patients cannot be assigned to NPs on days when no NPs are on duty.

Q: How does this paper relate to the literature on NP scope-of-practice laws? A: The scope-of-practice literature estimates general-equilibrium effects of allowing NPs greater autonomy, including labor reallocation between professions. This paper instead estimates the partial-equilibrium causal effect of assigning a patient to an NP versus a physician, holding the broader labor market fixed. The two literatures are complementary: the heterogeneity findings here suggest that scope-of-practice expansions may be more beneficial in lower-complexity primary care settings where the NP-physician performance gap is smaller.

Q: What are the policy implications of the findings? A: Three implications are highlighted. First, the efficiency of using NPs depends critically on case assignment: deploying NPs on the least-complex cases reduces net costs to approximately one-fifth of indiscriminate deployment. Second, the substantial overlap between NP and physician productivity distributions provides support for NP use in less complex settings even within the ED context. Third, within-profession productivity variation far exceeding between-profession differences suggests that individual-level productivity assessment, rather than professional class, may be a more accurate guide to case assignment and compensation.

Quasi-experimental variation in NP availability: The identification strategy exploits day-to-day variation in the number of NPs scheduled to work in a given VHA ED, conditional on ED-by-time-category fixed effects, as an instrument for whether a patient is assigned to an NP versus a physician. Schedules are set months in advance, rendering the NP count orthogonal to arriving patient characteristics conditional on those fixed effects.

30-day preventable hospitalization: A standardized quality-of-care outcome defined by the Agency for Healthcare Research and Quality, measuring hospitalizations occurring within 30 days of ED discharge that are classified as preventable given adequate prior outpatient management. Used by the paper as the primary downstream health outcome beyond the ED visit itself.

Elixhauser comorbidities: A set of 31 binary indicators for chronic conditions (e.g., cancer, diabetes) based on medical histories in the prior 365 days, used in this paper to measure and stratify case complexity into quartiles for heterogeneity analysis.

Productivity distributions within professions: Provider-specific productivity estimates derived from a just-identified IV model that instruments assignment to individual providers by indicators for on-duty providers, then deconvolved into underlying distributions using the Efron (2016) and Kline-Rose-Walters (2022) method. These distributions characterize the spread of productivity within each professional class, separate from measurement error.

Prescription threshold adjustment: The mechanism, formalized in Chan, Gentzkow, and Yu (2022), by which providers with lower diagnostic skill optimally adjust treatment thresholds in response to asymmetric costs of false-positive versus false-negative errors. In this paper’s application, NPs lower the opioid prescription rate (where false positives carry higher costs: addiction and overdose) and raise the antibiotic prescription rate (where false negatives carry higher costs: untreated infection), but downstream outcomes do not improve correspondingly.

Skill-task matching: The organizational economics principle (Acemoglu and Autor 2011) that efficiency requires assigning more complex tasks to higher-skilled workers. The paper documents that between professions, case assignment broadly follows this principle (NPs receive less complex patients on average), but within professions, essentially no matching between individual provider productivity and case complexity is observed.

Full practice authority (VHA, December 2016): The VHA policy that allowed NPs to treat patients independently without physician supervision at VHA facilities, superseding state-level restrictions. This policy change defines the start of the paper’s sample period and establishes the institutional context in which the quasi-experiment occurs, as it removed the requirement for physician oversight that previously constrained NP independence.