I18 | Macro Paper Warehouse

Designing Dynamic Reassignment Mechanisms: Evidence from GP Allocation

Mon, 01 Jan 0001 00:00:00 +0000

This paper studies the design of dynamic reassignment mechanisms—centralized systems that must not only provide good initial matches but also accommodate changes in agents’ preferences over time. The empirical setting is Norway’s system for allocating patients to general practitioners (GPs), where every individual is assigned a specific GP whose panel has a binding capacity cap. Since 2016, Norway has allowed patients to join waitlists for oversubscribed GPs while retaining their spot on their current GP’s panel, with reassignment proceeding strictly first-come, first-served (FCFS) as vacancies arise.

The paper makes three contributions. First, it provides direct evidence of unrealized gains from trade: in December 2019, 15 percent of the 133,332 patients then standing on waitlists could have been immediately reassigned via a single run of the Top-Trading Cycles (TTC) algorithm, which identifies not only bilateral swaps but arbitrary cycles. A mechanical simulation holding patient choices fixed shows that running TTC monthly from November 2016 through December 2019 would have left 23 percent fewer patients on waitlists by end-2019, with average waiting times among reassigned patients 29 percent shorter.

Second, the paper introduces a dynamic TTC mechanism and clarifies why static properties do not carry over. In the static case, TTC is both strategy-proof and Pareto-improving (Shapley and Scarf, 1974; Roth, 1982). In a dynamic setting, neither property holds. Repeated TTC is not strategy-proof because patients’ GP choices affect how long they wait. More importantly, TTC may leave some patients worse off: a panel slot that would have gone to the first person on a waitlist under FCFS may instead go to a later-arriving patient who can form a trading cycle, effectively de-prioritizing patients whose GPs are undersubscribed. In the mechanical simulation, 4.5 percent of patients face longer waiting times under TTC.

Third, the paper estimates a structural model of patient attention and GP choice using monthly Norwegian administrative data covering 4.78 million patients and 6,470 GP panels (2014–2019), restricting estimation to the Trondelag region (approximately 8 percent of the country). The model specifies: a Poisson attention process (patients consider switching only when an attention shock arrives); preferences over GPs as a function of travel time, GP fixed effects, and match characteristics; and a belief model mapping observed waitlist lengths into expected waiting times. Parameters are recovered via a Gibbs sampler with Metropolis-Hastings for the discount rate. Key estimates: the annual discount factor is approximately 0.91; a female patient under 45 would travel 7.3 minutes farther to see a female GP (6.3 minutes for a female patient over 45); GP fixed effects have a standard deviation of 31 minutes’ travel-time equivalent; idiosyncratic taste shocks have a standard deviation of 12.6 minutes.

The paper then simulates a stationary equilibrium for each counterfactual mechanism. Under the status quo in stationary equilibrium, 9.4 percent of patients are on a waitlist, 82.2 percent of GPs have a waitlist, and average expected waiting time is 16.7 months. Introducing TTC reduces average waiting time to 14.1 months and raises mean patient welfare by the equivalent of 0.75 minutes’ travel time (more than 13 percent of the gain achievable under a no-capacity-constraints benchmark). Over half of this gain (0.4 minutes) comes directly from patients obtaining geographically closer GPs. Benefits are concentrated among younger patients, female patients, and recent movers; rural patients gain 2.1 minutes. However, patients with undersubscribed GPs face waiting times that rise from 16.7 to 22.8 months and are worse off by the perpetuity equivalent of 0.8 minutes.

Two modified mechanisms are evaluated. Deferred Acceptance (DA), which strictly respects FCFS priority, achieves essentially no improvement over the status quo, illustrating a fundamental trade-off between eliminating envy and exploiting gains from trade. A “TTC with Priority” (TTCP) mechanism, which gives priority for panel vacancies to patients with undersubscribed GPs before running TTC, achieves 61 percent of TTC’s welfare gains (0.46 minutes flow payoff; 1.08 minutes NPV) while leaving patients with undersubscribed GPs no worse off than under the status quo. A benchmark simulation eliminating waitlists altogether raises mean welfare slightly (0.19 minutes) but lowers median welfare (−0.60 minutes), with gains concentrated among highly mismatched patients.

Q: What is the core market failure the paper documents? A: Norway’s waitlist mechanism assigns panel vacancies strictly first-come, first-served without allowing patients to trade. This creates a “double coincidence of wants” problem: patients can simultaneously be on each other’s waitlists but cannot swap. In December 2019, 15 percent of 133,332 waiting patients could have been immediately reassigned via a single TTC run. A mechanical simulation shows that monthly TTC would have left 23 percent fewer patients on waitlists by end-2019 and reduced average realized waiting times among reassigned patients by 29 percent.

Q: Why does TTC fail to be strategy-proof in a dynamic setting? A: In the static case, TTC gives every agent an assignment at least as good as their endowment, making truthful reporting a dominant strategy. In a dynamic setting, a patient’s choice of GP determines not only which GP they receive but also how long they wait — patients who choose less-demanded GPs reach the front of the waitlist faster. This creates incentives to misreport preferences strategically, breaking strategy-proofness. The paper shows this formally and builds it into the equilibrium model by requiring patients to optimize over both GP choice and expected waiting time.

Q: Why does dynamic TTC harm some patients relative to the status quo? A: Under FCFS, the first person on a waitlist is guaranteed the next available slot on the target GP’s panel. Under TTC, a patient who arrived later but whose current GP is oversubscribed can form a trading cycle that redirects that slot, effectively jumping the queue. Patients with undersubscribed GPs — whose panel endowment is not a scarce resource that others want — cannot form cycles and are systematically de-prioritized. In the stationary equilibrium, their expected waiting time rises from 16.7 to 22.8 months, and they are worse off by the perpetuity equivalent of 0.8 minutes’ travel time.

Q: What are the main parameter estimates and what do they imply? A: The annual discount factor is estimated at approximately 0.91 once GP fixed effects are included (rising to near 0.95 without them, because more desirable GPs have longer waitlists). Gender homophily is worth 6.3–7.3 minutes of travel time for female patients under 45. Age homophily is worth approximately 1 minute. The standard deviation of GP fixed effects is 31 minutes and idiosyncratic shocks are 12.6 minutes, both in travel-time equivalents, indicating substantial horizontal differentiation across GPs and across patients’ idiosyncratic tastes.

Q: How important are moves as a driver of GP switching? A: Moves are the dominant driver. Among non-movers, older men consider switching just once every 25 years; temporary residents consider switching approximately once every 7.5 years (1.084 percent per month). Among patients who moved more than 30 minutes, a temporary resident has an 18.59 percent monthly probability of considering switching in the month of or month after the move. For a permanent resident making a long-distance move, the cumulative attention probability over the 8 months surrounding the move rises to 34 percent (versus 22 percent for a short-distance move). In the data, 26 percent of waitlist users moved municipality during 2017–2019, versus 6 percent of non-switchers.

Q: What does the stationary equilibrium under the status quo look like? A: In the long-run stationary equilibrium, 9.4 percent of patients are on a waitlist, 82.2 percent of GPs have a waitlist, and the average expected waiting time to switch GPs is 16.7 months. Each month, 2,299 patients on average draw attention shocks; 85.2 percent of these choose to join a waitlist, while the remainder either switch to an open GP or stay with their current GP. The average attentive patient expects to successfully obtain their chosen GP after 16.8 months.

Q: What are the distributional consequences of TTC across patient subgroups? A: Female patients benefit especially because they are more likely to be attentive (and thus use waitlists) than males. Recent movers gain 2.3 minutes’ travel-time equivalent. Patients who have never moved still gain 1.0 minutes. Rural patients gain 2.1 minutes (larger than average), reflecting their longer baseline travel times and greater geographic mismatch potential. Urban patients also benefit but less so. The one group that is harmed is patients with undersubscribed GPs, who face longer waits and a welfare loss of 0.8 minutes perpetuity equivalent.

Q: Why does the Deferred Acceptance mechanism fail to improve on the status quo? A: DA strictly respects FCFS waiting-time priority: no patient may be reassigned to a GP for whom another patient has been waiting longer. This means DA can only execute swaps in which all patients ahead of each participant on their respective waitlists are also reassigned in the same month. In practice, this virtually never occurs, so DA reassigns almost no patients earlier than the status quo Waitlists mechanism. The result illustrates a fundamental trade-off: fully respecting FCFS priority eliminates nearly all gains from trade.

Q: How does TTCP restore fairness while preserving most of the efficiency gains? A: TTCP modifies TTC by prioritizing patients with undersubscribed GPs over those with oversubscribed GPs when assigning panel vacancies, while still respecting the constraint that patients cannot be assigned a GP they prefer less than their current one. This gives patients with undersubscribed GPs a compensating advantage in the queue that offsets their inability to trade via cycles. TTCP achieves 0.46 minutes’ mean flow payoff improvement versus 0.75 for TTC (61 percent of TTC’s gains), and an NPV measure of 1.08 minutes versus 1.25 for TTC. Patients with undersubscribed GPs are left no worse off than under the status quo.

Q: What happens when waitlists are eliminated entirely? A: Under No Waitlists, attentive patients may only choose among GPs with open panels at the moment of attention. Mean welfare rises slightly (0.19 minutes) because patients spend less time mismatched while waiting, but median welfare falls by 0.60 minutes. The gains are concentrated among a minority of highly mismatched patients who prefer limited choice with no waiting over broader choice with long waits, while most patients prefer the option to wait for a more preferred GP. The authors note this may partly explain why formal waitlists are rare in other primary care systems.

Q: What is the welfare benchmark and how large are the gains? A: The benchmark is a “No Caps” scenario in which all panel caps are removed, representing the maximum achievable improvement. The mean welfare gain from TTC (0.75 minutes) represents more than 13 percent of this upper bound. The “Truthful TTC” benchmark, where patients submit full preference lists, yields 1.04 minutes, but its gains are also concentrated: the median patient is no better off than under the status quo Waitlists mechanism.

Q: What are the scope conditions for these findings? A: The demand model is estimated on the Trondelag region of Norway (approximately 8 percent of the national population) over 2017–2019, a period when waitlists were growing rapidly rather than in steady state. Counterfactual comparisons are made in a stationary equilibrium calibrated to Trondelag. The model excludes patients under 16 (whose enrollment is managed by parents). The partially capitated payment structure and fixed panel caps are institutional features specific to Norway, though similar systems exist in Canada, the UK, Italy, and Sweden. GP characteristics are held fixed in the model. The analysis abstracts from health outcomes, focusing on preference-based welfare from GP assignment.

Top-Trading Cycles (TTC) algorithm: A centralized reassignment algorithm that takes agents’ preference lists and objects’ priority lists as inputs, has each agent “point to” their preferred object and each object “point to” their highest-priority current or waiting agent, identifies cycles of mutual pointing, and executes the trades in those cycles simultaneously. In the paper’s static application, TTC is both Pareto-improving (every participant receives an assignment at least as good as their endowment) and strategy-proof. In the dynamic setting studied here, neither property holds.

Dynamic TTC mechanism: A mechanism that runs the TTC algorithm repeatedly at the end of each period after naturally arising vacancies have been filled from waitlists. Because patients’ GP choices affect how long they wait — not only which GP they receive — this mechanism is not strategy-proof and may leave patients with undersubscribed GPs worse off than under strictly FCFS waitlists.

TTC with Priority (TTCP): A modified version of dynamic TTC that changes the priority ordering so that patients with undersubscribed current GPs are prioritized above patients with oversubscribed GPs when panel vacancies are allocated. This modification preserves patients’ endowment rights but compensates the group harmed by standard TTC. In the paper’s simulations, TTCP achieves 61 percent of TTC’s mean welfare gains while leaving patients with undersubscribed GPs no worse off than under the status quo.

Patient attention model: A model in which patients consider switching GPs only when they receive a Poisson-distributed attention shock. Attention rates vary by observable characteristics (age, gender, temporary vs. permanent residency, whether and how far the patient recently moved). The model interprets any switch request as evidence of both an attention shock and a preference for the requested GP over the current one. Patients who do not request switches may be either inattentive or attentive but satisfied.

Horizontal differentiation (GP preference heterogeneity): The extent to which different patients prefer different GPs for reasons unrelated to overall GP quality — primarily driven by geographic proximity, gender homophily (worth 6.3–7.3 travel-time-equivalent minutes for young female patients), and age similarity (approximately 1 minute). Horizontal differentiation is the fundamental source of gains from trade: if all patients preferred the same GP, there would be no mutual-benefit swaps to find.

Deferred Acceptance (DA) algorithm: The patient-proposing DA algorithm, which strictly respects FCFS waiting-time priority: no patient may be reassigned ahead of another patient who has been waiting longer for the same GP. In the dynamic context, DA achieves essentially no welfare improvement over the status quo because its strict respect for priority eliminates nearly all trading opportunities, illustrating the trade-off between envy-freeness and efficiency.

Double coincidence of wants: The situation in which two (or more) patients are simultaneously on each other’s waitlists and would mutually benefit from trading GP assignments, but cannot do so under the current mechanism because there is no vacancy on either panel. The paper’s direct evidence of this phenomenon — 15 percent of waiters could be immediately reassigned via one TTC run — motivates the counterfactual analysis.

Germs in the Family: The Short- and Long-Term Consequences of Intra-Household Disease Spread

Mon, 01 Jan 0001 00:00:00 +0000

This paper studies the short- and long-term consequences of intra-household respiratory disease transmission from older to younger siblings in Danish families. The central research questions are: (1) how do respiratory illnesses spread from preschool-aged older siblings to younger infant siblings during the first year of life, and (2) how does respiratory disease exposure during infancy causally affect younger siblings’ long-term economic, human capital, and health outcomes?

The study uses population-level Danish administrative data covering 1,230,180 children from 37 birth cohorts (1981–2017), linking records from the National Patient Register, income and labor market registers, education registers, and psychiatric care registers. The identification strategy combines birth order variation in respiratory disease vulnerability with within-municipality variation in local respiratory disease prevalence among children aged 13–71 months. The authors construct a municipality-level disease exposure index—cumulative respiratory hospitalizations per 100 children aged 13–71 months in a child’s municipality over their first 12 months of life—and estimate the differential effect of this index on younger versus older siblings, controlling for municipality fixed effects, birth year-month fixed effects, and an extensive set of individual and family background characteristics.

The descriptive findings are stark: younger siblings have 2–3 times higher rates of hospitalization for acute respiratory conditions during their first year of life compared to older siblings at the same age, with the gap largest at ages two and three months. The gap is larger for winter births, shorter birth spacing, and when older siblings attend childcare centers—all patterns consistent with the older sibling serving as a disease vector.

On the causal estimates, moving from the 25th to the 75th percentile of the disease exposure index distribution increases the younger sibling’s acute respiratory hospitalizations in the first year of life by 0.023 (32.9 percent above the sample mean), with effects more than twice as large for exposure in the first six months compared to the second six months.

In the long run, an interquartile increase in first-year respiratory disease exposure reduces younger siblings’ wage earnings (conditional on employment) at ages 25–32 by 0.8 percent and total income by 0.8 percent, and reduces their income percentile rank by 0.3 percentage points. There is no significant effect on labor force participation at the extensive margin. Effects on earnings are approximately twice as large when exposure is measured in the first six months of life. These earnings effects are comparable in magnitude to those from a 10 percent reduction in birth weight or a 9 percent increase in ambient air pollution at birth, and correspond to roughly two-thirds of the adult earnings impact of in utero exposure to the 1918 Spanish Influenza. When the disease index interaction is included, the main birth order coefficient declines by approximately 70 percent, suggesting intra-household disease transmission is an important channel underlying the documented birth order earnings disadvantage.

Additional findings include: a 0.5 percentage point reduction in high school graduation and a 0.6 percentage point reduction in college graduation (interquartile effects); a 0.01 standard deviation penalty in ninth grade Danish test scores; a 20 percent increase (0.016 per hundred per year) in chronic respiratory hospitalizations at ages 16–26; and a 6.1 percent increase (0.5 additional visits per hundred per year) in psychiatric clinic visits at ages 16–26. Breastfeeding mitigates short-term effects, with 15 months of breastfeeding sufficient to entirely offset the elevated hospitalization risk.

Scope conditions: findings apply to second-born relative to first-born children in Danish sibling pairs with at least 11 months birth spacing; long-term estimates are net of parental compensatory responses and any immunity benefits, and thus represent lower bounds of the uncompensated biological impact of respiratory illness in infancy.

Q: What is the magnitude of the birth order gap in acute respiratory hospitalizations during infancy, and what patterns support an intra-household transmission mechanism? A: Younger siblings have 2–3 times higher hospitalization rates for acute respiratory conditions in the first year of life compared to older siblings at the same age, with the gap especially large at ages two and three months. The gap is larger for winter births (when respiratory viruses circulate more), for siblings with shorter birth spacing, and when the older sibling attends a childcare center. Hospitalizations for non-infectious digestive diseases and injuries show no analogous birth order differences, ruling out differential parental healthcare-seeking as an explanation.

Q: How is the disease exposure index constructed and what variation does it exploit? A: The index is the cumulative count of acute respiratory hospitalizations per 100 children aged 13–71 months in a child’s municipality over their first 12 months of life, with the older sibling excluded from the count when applicable. It exploits irregular spatial and temporal waves of respiratory viruses (such as RSV and influenza) across Danish municipalities. The interquartile range of this index captures meaningful variation in community disease burden faced by infants across different places and years.

Q: What is the first-stage relationship between the disease index and infant hospitalizations? A: Moving from the 25th to the 75th percentile of the disease index increases younger siblings’ acute respiratory hospitalizations in the first year of life by 0.023 (a 32.9 percent increase relative to the sample mean), while the effect on older siblings is substantially smaller. The interaction coefficient in the preferred specification implies that one additional hospitalization per 100 community children aged 13–71 months raises the younger sibling’s hospitalization count by 0.012 more than the older sibling’s. Effects are more than twice as large for exposure in the first compared to the second six months of life.

Q: What are the estimated long-term effects on adult earnings, and how do they compare to benchmarks in the literature? A: An interquartile increase in first-year respiratory disease exposure reduces younger siblings’ wage earnings at ages 25–32 by 0.8 percent and total income by 0.8 percent, with a 0.3 percentage point reduction in income percentile rank. These magnitudes are comparable to a 1 percent earnings reduction from a 10 percent birth weight reduction (Black et al., 2007), a 1 percent earnings reduction from a 9 percent increase in ambient air pollution (Isen et al., 2017b), and roughly two-thirds of the in utero Spanish Influenza effect (Almond, 2006).

Q: Does the birth order earnings disadvantage reflect intra-household disease transmission? A: When the interaction between birth order and the disease index is excluded, the regression finds a 1.9 percent birth order earnings disadvantage for second-born children (consistent with Black et al., 2005 range of 1.2–4.2 percent). When the interaction is included, the main birth order coefficient declines by approximately 70 percent, suggesting that disease transmission from older to younger siblings is an important channel driving the birth order earnings penalty.

Q: Are effects larger for exposure in the first versus second six months of life? A: Yes, consistently across all outcomes. The interaction coefficient for acute respiratory hospitalizations is more than twice as large when exposure is measured in the first versus second six months. Effects on wage earnings are approximately 60 percent larger for first-half exposure, and effects on income rank are two to three times larger. This is consistent with biomedical evidence that infants’ immune systems mature around six months when solid food introduction begins.

Q: What are the effects on educational outcomes? A: An interquartile increase in first-year respiratory disease exposure reduces the likelihood of high school graduation by 0.5 percentage points (0.6 percent at the sample mean) and college graduation by 0.6 percentage points (1.7 percent at the sample mean), with effects approximately 60 percent larger when measuring first-half exposure. A 0.01 standard deviation reduction in ninth grade Danish test scores is also found. A back-of-the-envelope calculation using Danish returns to schooling suggests the reduction in educational attainment can explain approximately half of the estimated earnings effect.

Q: What are the effects on chronic respiratory and mental health outcomes? A: An interquartile increase in first-year exposure increases chronic respiratory hospitalizations (asthma, COPD) at ages 16–26 by 0.016 per hundred per year (20 percent above the sample mean), with significant increases also apparent at ages one to two. For mental health, the same exposure is associated with 0.5 additional psychiatric clinic visits per hundred per year at ages 16–26 (6.1 percent above the sample mean), with effects becoming more significant in the early twenties. Effects on mental health from this paper are smaller than those estimated for more extreme fetal and early childhood shocks such as Ramadan exposure or maternal bereavement.

Q: What does the acute respiratory trajectory look like beyond infancy? A: Elevated acute respiratory hospitalizations persist at age one, then there is a reduction at ages two to three consistent with an immunity formation hypothesis, but this protective effect disappears by age four. There is no significant increase or decrease in acute respiratory hospitalizations at older ages, in contrast to the persistent increase found for chronic respiratory conditions.

Q: What heterogeneity is found in short-term effects? A: Effects on infant respiratory hospitalizations are larger for low birth weight children, for male infants (consistent with the fragile male hypothesis), for siblings with shorter birth spacing, and for sibling pairs where the older child attends childcare. The monotonic decline in effect size with increasing birth spacing is the opposite of what would be predicted if differential parental time investment were the main mechanism, supporting intra-household disease spread as the operative channel.

Q: What is the role of breastfeeding as a moderator? A: Using supplementary data on breastfeeding duration (covering 2009–2016, matched to 7.6 percent of the sample), the authors find that the impact of disease exposure on younger siblings’ infancy hospitalizations declines significantly with longer breastfeeding duration. A linear specification implies that 15 months of breastfeeding entirely offsets the elevated hospitalization risk from higher disease exposure. Second-born children breastfed for less than half a month are particularly vulnerable to acute respiratory infections.

Q: How do the authors validate the identifying assumption? A: Three validation exercises are used. First, results are robust to adding municipality-specific linear and quadratic trends and maternal fixed effects. Second, using family background characteristics as outcomes in the interaction regression, at most two of fourteen coefficients are significant in any specification, and all effect sizes are less than one percent of sample means. Third, using alternative disease indices based on non-infectious digestive diseases and injuries shows no differential effects for younger siblings, ruling out a parental healthcare-seeking confound.

Q: What are the policy implications? A: The authors highlight breastfeeding support policies (paid family leave, workplace lactation accommodations), RSV vaccination campaigns for pregnant women and monoclonal antibody prophylaxis for infants, sick pay regulations, and childcare attendance policies as levers to reduce infant respiratory disease burden. They argue that current cost-benefit evaluations of such policies likely undercount the long-term human capital and earnings benefits. The COVID-19 pandemic illustrates the mechanism: restrictions reduced RSV spread during 2020 potentially benefiting infants with older siblings, while the subsequent RSV surge in 2021–2022 may have exposed later cohorts to above-average disease burden.

Respiratory Disease Exposure Index: A municipality-level cumulative measure of acute respiratory hospitalizations per 100 children aged 13–71 months assigned to each child over their first 12 months of life (or first and second six months separately), designed to proxy for community respiratory disease burden faced by infants from slightly older children, with the child’s own older sibling excluded from the count.

Intra-Household Disease Transmission: The mechanism by which preschool-aged older siblings, exposed to respiratory viruses in group childcare settings, bring home those viruses and infect younger infant siblings who are in a vulnerable stage of immune and brain development, creating a within-family externality in health outcomes.

Differential Birth Order Effect (Identification): The quasi-experimental design exploits the interaction between birth order (younger siblings are more exposed to older siblings’ illnesses) and local disease prevalence variation to identify causal impacts, netting out the main effects of both birth order and local disease environment through municipality and birth year-month fixed effects.

Immunity Formation Hypothesis: The conjecture that early respiratory disease exposure may have a protective effect on later acute respiratory illness through immune system training; supported in the data by reduced acute hospitalizations at ages two to three, though this protection disappears by age four and does not prevent chronic respiratory disease development.

Dynamic Complementarities with Sibling Health Spillovers: An extension of the Cunha-Heckman framework: while standard models incorporate investment complementarities across time periods for a given child, this paper’s findings imply that sibling health spillovers create differential returns to early-life health investments by birth order, since disease asymmetries between older and younger siblings are not incorporated in existing theoretical models.

Net Long-Term Effects: The estimated long-run impacts incorporate not only the direct biological effects of respiratory illness on the younger sibling but also any parental compensatory responses and immunity benefits; thus they represent lower bounds of the uncompensated biological impact, as parental compensation would attenuate the measured sibling difference.

Pigovian Transport Pricing in Practice

Mon, 01 Jan 0001 00:00:00 +0000

This paper reports on the MOBIS experiment, a large-scale randomized controlled trial (RCT) implementing a multi-modal Pigovian transport pricing scheme in urban areas of German- and French-speaking Switzerland. The central research question is whether a first-best transport pricing scheme — one that charges users the full marginal external costs of their travel choices, varying across time, space, and mode — generates meaningful behavioral responses, and how those responses compare to a pure information intervention.

The study recruited participants from urban areas, requiring them to be between 18 and 65 years old and to use a car at least two days per week. After contacting over 90,000 individuals and an initial online screening of 21,800 respondents, 3,656 participants completed the RCT. Each participant agreed to have their daily travel tracked via a smartphone app (“Catch-My-Day”) for eight weeks: four weeks of observation followed by four weeks of treatment. Assignment to treatment and control groups was fully randomized without stratification.

The pricing treatment gave participants a budget equal to their observed external costs during the observation period plus a 20% buffer, from which the external costs of their actual travel were deducted in real time; any remaining balance was theirs to keep. External costs were computed across all modes using official Swiss Federal Roads Office monetization factors, including congestion (via a MATSim-based average marginal cost approach), CO2 climate costs (CHF 136.08/ton), health costs from air pollution (PM10 and NOx), and accident and physical activity effects for active and public modes. Public transport also carried a peak-hour surcharge of CHF 0.10/km for congested zone-pairs. A second “information-only” treatment provided identical information about external costs but imposed no financial charge. A control group received only weekly summaries of kilometers traveled by mode.

The regression framework is a difference-in-differences specification with person, calendar-day, and day-of-study fixed effects, estimated in levels for external-cost outcomes (due to negative values from walking’s net external benefit) and via Poisson Pseudo-Maximum Likelihood for non-negative outcomes.

The pricing treatment reduced total external costs by CHF 0.215 per day (p < 0.01), a 5.1% reduction relative to the control group. The average private cost of transport for the control group during the treatment period was CHF 25.72 per day; the external cost was CHF 4.22 per day, implying that Pigovian pricing raised total transport costs by 16.4% on average. The implied price elasticity of external costs with respect to this price increase is -0.31. The reduction is attributable to mode substitution toward public transport and active modes and to departure time shifting away from peak hours, but not to a reduction in total distance traveled.

The information-only treatment produced a coefficient of -0.087, which is not statistically significant at conventional levels for the full sample. The differential effect of adding pricing to information is -0.127 (marginally significant, p < 0.1), with the pricing increment particularly important for reducing congestion costs. Sensitivity analysis shows that removing the control group and time fixed effects inflates the before-vs.-after elasticity to between -0.57 and -0.71, substantially larger than the preferred estimate of -0.31, underscoring the importance of the experimental design.

Heterogeneity analysis reveals that men respond more strongly than women, German speakers more than French speakers, participants under 30 more than older participants, and those with above-median altruistic values respond significantly even to information alone. Correct knowledge of the definition of external costs (present in 45% of the sample) is a key driver of the pricing treatment effect. These scope conditions — mode availability, urban Swiss context, short 4-week treatment window, mandatory car use eligibility, and the specific external cost monetization framework — bound the generalizability of the elasticity estimate.

Q: What is the main treatment effect of the Pigovian pricing scheme on external transport costs? A: The pricing treatment reduced total external costs by CHF 0.215 per day, which is a 5.1% reduction relative to the control group (p < 0.01). About half of the reduction came from health costs, with congestion and climate costs following in magnitude. The implied elasticity of external costs with respect to the Pigovian price increase is -0.31, meaning a 10% increase in total transport costs from Pigovian pricing would reduce external costs by approximately 3.1% in the short run.

Q: How was the Pigovian price increase calculated, and what was its magnitude relative to private costs? A: The average private cost of transport for the control group during the treatment period was CHF 25.72 per day, and the average external cost was CHF 4.22 per day. The external cost thus represents 16.4% of total (private plus external) transport costs, and dividing the 5.1% reduction in external costs by this 16.4% price increase yields the elasticity of -0.31.

Q: What mechanisms drove the reduction in external costs? A: The reduction resulted from a combination of mode substitution — a shift away from car use toward public transport and active modes — and departure time shifting away from peak hours. Critically, total distance traveled did not decline; the behavioral adjustment operated entirely through changes in how and when people traveled, not in how much.

Q: What was the effect of the information-only treatment? A: The information-only treatment produced a coefficient of -0.087 CHF per day, which was not statistically significant at conventional levels for the full sample. It was statistically significant only for subgroups, notably participants with above-median altruistic values. The differential effect of adding pricing to information (alpha_P minus alpha_I = -0.127) was marginally significant (p < 0.1) and was particularly concentrated in congestion cost reductions, suggesting that the monetary incentive is especially important for internalizing the congestion externality.

Q: Why is the control group critical, and how does removing it affect the estimated elasticity? A: The tracking data show a seasonal negative trend in external costs over the study period; without a control group, this trend would be incorrectly attributed to the treatment, inflating the estimated effect. When both day-of-study and calendar-day fixed effects are removed (approximating a before-vs.-after design without a control group), the estimated elasticity rises to between -0.57 and -0.71, roughly double the preferred estimate of -0.31. This highlights that most prior studies in the literature, which lack control groups, are likely to overestimate treatment effects.

Q: What heterogeneity is observed in the treatment response? A: Men respond more strongly than women to both treatments, with the gender gap particularly pronounced for congestion costs. German speakers respond more strongly than French speakers. Participants under age 30 show stronger responses than older participants. Those scoring above the median on an altruistic values index respond significantly not only to pricing but also to information alone. Participants who correctly defined external costs (45% of the sample) drive the pricing treatment effect; a causal forest analysis confirms knowledge of external costs, age below 30, and language region as key heterogeneity drivers.

Q: How were external costs computed across modes, and what are the key monetization parameters? A: For private road transport, GPS tracks were map-matched using Graphhopper and processed via MATSim modules; emission factors came from the HBEFA 3.3 database, and congestion was assessed via an average marginal cost approach incorporating spillback effects. Externalities were monetized at CHF 136.08/ton for CO2, CHF 515,497–1,358,461/ton for PM10 (rural vs. urban), CHF 7,109/ton for NOx (regional), and a value of travel time savings of CHF 25.77/hour. For other modes, per-km values from the Swiss Federal Roads Office were applied. Walking carries net external benefits (negative external costs), while cycling carries small net external costs because accident costs exceed physical activity benefits.

Q: How was public transport priced in the experiment, and why was it simplified? A: A second-best zonal peak-hour surcharge of CHF 0.10/km was applied to public transport stages between zone-pairs experiencing peak demand, with peak windows set at 7–9 am and 5–7 pm. Full first-best pricing of public transport crowding was deemed infeasible because crowding effects are highly heterogeneous spatially and temporally, often concentrated in very short windows on specific lines, making aggregate distribution unreasonable.

Q: Was there evidence of gaming the mode detection system? A: Because participants could manually correct the app’s algorithmic mode assignments — and the pricing group had an incentive to overclaim low-cost modes — the potential for strategic misreporting was examined. While the analysis could not rule out some gaming, the main results were shown to be robust to excluding potential gamers, suggesting that gaming did not materially distort the treatment effect estimates.

Q: What does the study imply for transport pricing policy? A: The elasticity of -0.31 provides a benchmark for policymakers: a full Pigovian pricing scheme that raises total transport costs by about 16% can be expected to reduce external costs by about 5% in the short run in an urban context. The finding that congestion costs respond more to pricing than to information alone suggests the monetary component is essential for this externality. Heterogeneous responses — particularly the weaker responses by women and French speakers — have distributional implications. The experiment is a proof of concept that first-best transport pricing can generate meaningful behavioral responses, but scaling it would require addressing privacy concerns from GPS tracking, technical infrastructure, and political economy challenges.

Pigovian transport pricing: A pricing scheme that charges each user the marginal external costs of their transport choices — including health, climate, congestion, and noise costs — as they vary across time, space, and mode, intended to internalize the gap between private and social costs of travel.

External costs of transport: Costs borne by society rather than the individual traveler, including congestion (delay imposed on others), climate damages (CO2 emissions), health costs (local air pollution, accidents), and noise; in this paper, computed in real time from tracked trips using official Swiss monetization values.

Average treatment effect (ATE): The difference-in-differences estimate of the causal effect of the pricing or information treatment on outcomes, identified from the randomized assignment and controlling for person, calendar-day, and day-of-study fixed effects.

Mode substitution: The behavioral response in which travelers shift from higher-external-cost modes (primarily car) to lower-external-cost modes (public transport, walking, cycling) in response to pricing, as distinct from reducing total travel distance.

Departure time shifting: The behavioral response in which travelers adjust when they depart to avoid peak-hour congestion surcharges, contributing to reduced congestion externalities without reducing total distance traveled.

Information-only treatment: An experimental arm receiving identical information about external costs as the pricing group but facing no financial charge, used to isolate the informational component of the pricing treatment from the monetary incentive component.

Source text origin: pdf

The Earnings and Labor Supply of U.S. Physicians

Mon, 01 Jan 0001 00:00:00 +0000

Overview

Research Question. What do U.S. physicians earn, how is that earnings variation structured across geography and specialty, and how much does government healthcare payment policy shape those earnings and — through them — physicians’ labor supply and long-run talent allocation?

Data. The paper builds a novel administrative panel by merging the universe of U.S. federal individual income tax returns (2005–2017) with: the National Plan and Provider Enumeration System (NPPES) physician registry; Medicare billing records with procedure-level Relative Value Unit (RVU) rates (2012–2017); restricted-use American Community Survey responses; Social Security Administration demographic records; and medical school ranking and graduation data. The main sample covers 11.6 million physician-year observations for 965,000 unique physicians aged 20–70.

Earnings Facts. In 2017, average physician total individual income was $350,000 (median $265,000); the distribution is right-skewed — the top 1% of age-40–55 physicians averages $4.0 million. Physicians in aggregate earned $297 billion in pre-tax dollars, equaling 8.6% of total U.S. healthcare spending. The age-earnings profile is steep: earnings are approximately $60,000 during residency, rise to roughly $185,000 by the early thirties, and peak near $425,000 at age 50. Business income — systematically underreported in survey data (ACS estimates are approximately $140,000 lower than tax data during peak career years, almost entirely due to non-reporting of business income) — accounts for nearly one-quarter of earnings at age 50. Earnings differ sharply across specialties: primary care physicians average $201,200 (ages 40–55), about half the sample mean, while surgeons earn roughly twice as much.

Geographic Pattern. Contrary to the pattern for lawyers and workers broadly, physician earnings are not highest on the coasts. A movers-based event study (physicians who changed commuting zones once during 2005–2017) finds that roughly 70% of the cross-location income difference is driven by place rather than worker composition. A two-way fixed-effects variance decomposition reveals pronounced negative physician-location sorting: high-earning physicians tend to locate in lower-income commuting zones, while lower-earning physicians locate in higher-income areas — the opposite of the pattern for lawyers. Medicare’s relatively weak adjustment of reimbursement rates for local costs (the empirical elasticity of the Geographic Adjustment Factor to median household income is 0.09, versus 0.33 for a broader local price index) can, by the authors’ estimates, account for approximately one-third of this unusual geographic earnings pattern.

Government Influence — Medicare Price Changes. Using procedure-specific RVU changes as a simulated instrument for each physician’s Medicare price exposure, the authors find that a 10% increase in the Medicare price instrument leads to a 2.4% increase in professional earnings of physicians aged 40–55. The behavioral supply response is substantial: physicians bill 4.4% more RVUs (supply elasticity of 0.4 after netting out the mechanical component), of which 3.9% reflects more unique procedures and the rest a shift toward higher-paid procedures. Nearly all of the procedure-level supply increase (3.4 out of 3.8 percentage points) comes from treating additional patients rather than more frequent treatment of existing patients. Converting to pass-through: physicians retain $62 of each $100 in additional Medicare spending directly, or approximately $25 of each $100 of any insurance spending once Medicare’s documented spillover into private insurance rates is accounted for. For physicians aged 56–70, a 10% increase in earnings driven by reimbursement changes reduces retirement probability by 0.5 percentage points in that year.

Government Influence — ACA Insurance Expansion. Using county-level variation in pre-ACA uninsurance rates (as of 2013) as a source of differential exposure to the ACA’s Medicaid expansions and Marketplace subsidies (in 24 states expanding Medicaid in 2014 or early 2015), the authors estimate that a 10 percentage point higher baseline uninsurance rate led to 3.9% higher physician earnings four years post-expansion. Scaling by the first stage (a 10 p.p. higher uninsurance rate translating to 4.96 p.p. higher insurance coverage post-expansion), the implied elasticity of physician earnings to the insurance rate is 0.41. The ACA expansion also reduced retirement probability — a 10 p.p. higher insurance coverage rate leads to a 1 p.p. decline in retirement probability — consistent with a medium-run retirement-to-income elasticity of approximately −1.1. In aggregate, 6% of the $110 billion in annual ACA insurance expansion spending accrued to physicians personally, slightly below their 8.6% baseline share of healthcare spending.

Talent Allocation. Specialty choice is sticky and entry-restricted. The authors estimate a discrete-choice model of specialty choice using graduates of top-5 medical schools — physicians with effectively unconstrained specialty access — and an aggregate model using USMLE Step 1 score buckets as ability proxies. At the top of the ability distribution, higher specialty earnings strongly attract physicians: increasing primary care physicians’ hourly income from $98 to $168 per hour (the level of medicine subspecialists) would raise the share of top-5 medical school graduates choosing primary care by approximately 20 percentage points (nearly doubling their representation in primary care). Moving down the USMLE score distribution, the earnings coefficient falls monotonically and turns negative for the lowest score groups — consistent with the model’s prediction that entry restrictions cause higher-paying specialties to displace lower-ability applicants as earnings rise, rather than simply attracting more entrants. A more modest counterfactual — raising internal medicine earnings to dermatology levels — raises the average USMLE score in internal medicine by 10 points (from 230.2 to 239.6).

Scope Conditions. The earnings estimates are for the period 2005–2017. Pass-through estimates use a short-run price instrument; long-run pass-through may differ depending on private market spillovers and entry. The ACA analysis is restricted to 24 early-expanding states. The specialty-choice model is estimated on medical graduates entering the residency match; the extensive margin of entering medicine itself is not modeled. Health outcome effects of changing physician ability distributions are not estimated.

In depth

Q1. What is the level and composition of physician earnings in the tax data, and how do they compare to survey-based estimates?

In 2017, average physician total individual income was $350,000 and median was $265,000; the top 1% of age-40–55 physicians earned $4.0 million on average, more than twice the average of the top 5%. Business income constitutes nearly one-quarter of earnings at age 50 and is concentrated among top earners: 80% of physicians in the top 1% have business income exceeding $25,000, versus 35% overall. ACS survey data for the same physicians underestimate earnings by approximately $140,000 (roughly one-third of the administrative mean) during peak career years, driven entirely by non-reporting of business income on the extensive margin.

Physicians in aggregate earned $297 billion pre-tax in 2017, equaling 8.6% of total U.S. healthcare spending (approximately $913 of the average American’s $10,611 annual healthcare expenditure). After applying a 30% income tax rate, after-tax physician earnings equal approximately 6% of total healthcare spending, or roughly 1% of GDP. The authors note this provides an upper bound on the magnitude of savings available from policies aimed at reducing physician incomes as a strategy for lowering overall healthcare spending.

Q3. How does the age-earnings profile of physicians evolve, and what drives growth during peak years?

Physician earnings average approximately $60,000 during residency, rise to roughly $185,000 by the early thirties, and peak near $425,000 at age 50, before declining gradually to approximately $270,000 in the late 60s. Growth during peak earning years (ages 40–55) is driven almost entirely by business income: average wages are approximately flat at $285,000 across this age range, while business income and the probability of filing Schedule C rise steadily.

Q4. How large and unusual is the geographic pattern of physician earnings, and what is the causal role of location?

Physician earnings are highest in lower-income states (not on the coasts), unlike lawyers and the broader workforce. A movers event study finds that approximately 70% of the cross-commuting-zone income difference is attributable to location rather than worker characteristics; within specialty the estimate rises to approximately 85%. A two-way fixed-effects variance decomposition (with limited-mobility-bias corrections following Andrews et al. 2008 and Kline et al. 2020) reveals pronounced negative physician-location sorting, with the corrected covariance between individual and location effects being 0.6–0.8 times the variance of location effects in magnitude but opposite in sign — a pattern that reverses to positive sorting when the same methods are applied to lawyers.

Q5. What instrument is used to identify the causal effect of Medicare price changes on physician earnings, and why is it valid?

The authors construct a physician-year “Medicare price instrument” by fixing each physician’s service mix at its 2012–2017 average and then multiplying those fixed quantities by annually-updated RVU rates, summing over services. Because the fixed quantity weights exclude behavioral responses, and because national RVU changes from CMS periodic reviews affect physicians differentially according to their pre-determined service mix, variation across physicians and over time is plausibly exogenous to individual physicians’ income shocks. Year-by-specialty fixed effects absorb common specialty-level price trends.

Q6. What are the magnitudes of the earnings and labor supply responses to Medicare price changes?

A 10% increase in the Medicare price instrument raises earnings of 40–55 year-old physicians by 2.4% (reduced-form), with a 2SLS elasticity of income to billed RVUs of 0.17. The total-RVU billing coefficient of 1.437 implies a supply elasticity of 0.437 (subtracting 1 for the mechanical component). At the procedure level, a 10% price increase for a specific code leads to 3.8% more billings for that code, of which 3.4 percentage points reflects treating additional patients. For physicians aged 56–70, a 10% earnings increase reduces that year’s retirement probability by 0.5 percentage points.

Q7. How does the ACA insurance expansion affect physician earnings and retirement, and what is the implied pass-through?

Counties with a 10 percentage point higher pre-ACA uninsurance rate saw 3.9% higher physician earnings by 2017 (four years post-expansion). Scaled by the first stage (4.96 p.p. higher coverage), the elasticity of physician earnings to insurance coverage is 0.41. A 10 p.p. higher insurance coverage rate leads to a 1 p.p. lower retirement probability post-expansion (medium-run elasticity of retirement to income of approximately −1.1). In aggregate, 6% of $110 billion in annual ACA expansion spending — roughly $7.1 billion, or about $8,400 per physician — accrued to physicians.

Q8. How does the earnings-specialty choice relationship vary across the physician ability distribution?

In the individual-level discrete-choice model estimated on top-5 medical school graduates (likely unconstrained in specialty choice), the coefficient on hourly earnings is 0.014. In the aggregate score-group model, the implied earnings coefficient is 0.016 for USMLE scores above 260 and declines monotonically to −0.008 for scores at or below 190. This negative coefficient for low scorers is consistent with the theoretical prediction that higher earnings attract high-ability physicians, leaving fewer slots for lower-ability applicants due to binding entry restrictions — not a reversal of preferences.

Q9. What are the quantitative implications for specialty choice if primary care incomes were raised to subspecialty levels?

Raising primary care hourly income from $98 to $168 (the level of medicine subspecialists) would increase the share of top-5 medical school graduates choosing primary care by approximately 20 percentage points (about 48% would enter primary care, versus the current share), nearly doubling their representation. Nearly half of these reallocations would come from procedural specialties. An analogous exercise raising internal medicine earnings to dermatology levels shifts the average USMLE score in internal medicine from 230.2 to 239.6 — a 10-point increase — as higher-scoring applicants displace lower-scoring ones within a fixed slot constraint.

Direct estimates imply physicians retain $62 of each $100 in additional Medicare spending. Accounting for Medicare’s documented spillover into private insurance rates (following Clemens and Gottlieb 2017), the pass-through drops to $25 per $100 of total insurance spending. The authors note this is substantially higher than the modest rent-sharing found for average workers in response to firm-level shocks (Card et al. 2018), but comparable to rent-sharing with high-skilled workers benefiting from patent rents (Kline et al. 2019).

Q11. Can Medicare’s geographic pricing policy explain the unusual geographic earnings pattern for physicians?

The elasticity of Medicare’s Geographic Adjustment Factor (GAF) to commuting zone median household income is 0.09, compared to 0.33 for a broader local price index. Using the authors’ short-run estimate that a 10% increase in Medicare prices raises earnings by 2.4%, a counterfactual simulation shows that if the GAF-to-income elasticity rose to 0.33 (aligning Medicare rates with the general cost-of-living gradient), the geographic physician earnings pattern would more closely resemble that of lawyers. The authors estimate that the gap in Medicare’s local cost adjustment explains approximately one-third of the unusual physician earnings geography, conditional on the short-run pass-through estimate.

Q12. How does the theoretical model of specialty choice and entry restrictions guide the empirical predictions?

The model features a unit mass of physicians with heterogeneous ability (Pareto-distributed) and idiosyncratic specialty preferences (exponentially distributed). Physicians choose whether to specialize in period 1; government sets reimbursement rates in period 2; physicians choose labor supply in period 3. With a fixed number of residency slots, higher specialty earnings raise the ability cutoff for entry (rationing by ability). This generates a key nonmonotonic empirical prediction: higher-ability physicians respond positively to earnings increases (choosing a specialty more frequently), while lower-ability physicians respond negatively (displaced by the shift upward in the ability cutoff). The model also implies that demand shocks are not moderated by contemporaneous entry, so incumbents capture the full rent — motivating the estimated pass-through.

Key Concepts

Medicare Price Instrument (Simulated RVU Instrument). A physician-year measure of Medicare payment exposure constructed by holding each physician’s service mix fixed at its 2012–2017 average and multiplying those fixed quantities by time-varying national RVU rates, then summing across services. This purges the instrument of behavioral responses, creating exogenous cross-physician variation in price exposure arising from the interaction of fixed service mix with national RVU policy changes.

Relative Value Unit (RVU). The unit by which Medicare defines and reimburses each physician service in the Physician Fee Schedule. RVUs are intended to reflect the time, effort, and resources required to provide each service, but are subject to periodic review by CMS’s RVU Update Committee (RUC) and influenced by political factors. Changes in RVUs translate directly into changes in Medicare reimbursement rates for affected services.

Pass-Through (Reimbursement to Earnings). The share of an additional dollar of Medicare (or insurance) spending that accrues to physicians personally as earnings, after accounting for practice costs, intermediaries, and behavioral responses. The paper estimates $62 per $100 of direct Medicare spending or $25 per $100 of total insurance spending (the latter accounting for Medicare’s spillover into private rates).

Negative Physician-Location Sorting. The empirical finding — robust to limited-mobility-bias corrections — that higher-ability (higher-earning) physicians disproportionately locate in lower-income commuting zones, while lower-earning physicians concentrate in higher-income areas. This is the opposite of the pattern for lawyers and for worker-firm matching in the broader labor literature. The paper attributes part of this pattern to Medicare’s incomplete geographic adjustment of reimbursement rates.

Ability Cutoff (am) in Residency Matching. In the paper’s theoretical model, the minimum ability level required to gain entry into a restricted-entry specialty. Because the number of residency slots is fixed, the cutoff rises when a specialty’s relative earnings increase (attracting more high-ability applicants), displacing lower-ability physicians who would otherwise have entered. This makes the earnings-specialty relationship nonmonotonic across the ability distribution.

Business Income (Pass-Through Entity Income). Income from physician-owned practices organized as sole proprietorships, S-corporations, or partnerships, reported on Schedule C or through pass-through entities rather than on Form W-2. In the tax data, business income accounts for nearly one-quarter of physician earnings at career peak and is the main source of earnings for top physicians, but is systematically underreported in survey data (ACS), leading to a roughly one-third underestimate of total earnings during peak years.

Geographic Adjustment Factor (GAF). A Medicare policy parameter that multiplies the national RVU rate to adjust physician reimbursements for local input costs (specifically physicians’ work, practice expenses, and malpractice). The paper documents that the GAF’s elasticity to local median household income is 0.09 — far below the 0.33 elasticity of the general local price index — constituting an effective subsidy to rural and lower-income markets relative to higher-income areas.

The Effects of Medical Debt Relief: Evidence from Two Randomized Experiments

Mon, 01 Jan 0001 00:00:00 +0000

Layer 1: Overview

Research Question

This paper asks whether relieving downstream medical debt — debt that has been sold to third-party debt collectors — causes improvements in financial outcomes, mental and physical health, and healthcare utilization for recipients. The question is motivated by a large correlational literature documenting strong associations between medical debt and adverse outcomes, and by the rapid expansion of government and private debt relief programs that, as of mid-2024, had committed or planned over $14.6 billion in relief.

Data and Design

The authors partnered with RIP Medical Debt (a non-profit that purchases and forgives medical debt for government and private donors) to conduct two randomized controlled trials between March 2018 and October 2020. In total the experiments relieved medical debt with a face value of $169 million for 83,401 people.

Hospital debt experiment: RIP purchased a random subset of debt from a large for-profit hospital system at the juncture when the hospital would normally sell accounts to a debt collector (approximately one year after the medical service). The purchase price was 5.5 cents per dollar of face value. The treatment group consisted of 14,377 people who received $19 million in face-value relief (average of $1,321 per person). The 61,496-person control group had their debt pursued by the collector under normal protocol.
Collector debt experiment: RIP purchased a random subset of older debt already under collection on the secondary market for several years, at a price of less than one cent per dollar. The treatment group consisted of 69,024 people who received $150 million in face-value relief (average of $2,167 per person). The 68,014-person control group retained their debt.
Credit reporting sub-experiment: Partway into the collector debt experiment, the debt collector ceased reporting medical debt to the credit bureaus, reflecting an industry-wide trend. The authors isolate 2,761 accounts (6.8% of wave 1) that were reported prior to treatment assignment to estimate the effects of debt relief when accounts would have been counterfactually reported, compared to the subsequent no-reporting environment.

Outcomes are tracked using quarterly depersonalized credit bureau data from TransUnion (spanning at least four quarters before to four quarters after treatment), collections account data on future bill accrual, and a multimodal survey of 2,888 hospital debt experiment respondents measuring mental and physical health, healthcare utilization, and financial wellness. The primary credit-bureau outcome is the number of accounts past due; the primary survey outcome is the share with at least moderate depression (PHQ-8).

Main Findings

Credit market outcomes (main experiments): In both the hospital and collector debt experiments — where there is no counterfactual credit bureau reporting — debt relief has no average effect on financial distress, credit access, or credit utilization. The effect on the number of accounts past due is -0.01 (statistically insignificant; 95% CI excludes effects smaller than -0.04, relative to a control mean of 1.20). Effects on credit card balances (95% CI: -$42 to $47 relative to a mean of $1,481) and auto loan balances (95% CI: -$235 to $148 relative to a mean of $8,020) are similarly precise nulls. These null effects hold for the hospital debt sample (younger debt, 1.3 years old on average) and the collector debt sample (older debt, 7.0 years old on average), and across all preregistered subgroups.
Credit reporting sub-experiment: When control group accounts are counterfactually reported, debt relief immediately raises credit scores by an economically small average of 3.4 points (p-value 0.021), with a larger 13.8-point increase (p-value 0.008) for persons with no other debt in collections. Credit limits grow gradually, reaching $340 (15.3% of the post-reporting control mean of $2,231; p-value 0.010) after the no-reporting period begins, with larger effects for those with no other debt in collections. Once control group reporting ceases, both the credit score and credit limit effects converge to zero for those with other debts in collections. No effects on borrowing or financial distress measures are detected in this sub-experiment.
Collections account outcomes (bill repayment): Debt relief causes a statistically significant 1.1 percentage-point increase in the probability of having another unpaid bill sent to collections (6.6% of the control mean of 16.2%; p-value < 0.05) and a $15 increase in the dollar amount of future medical debt sent to collections (7.2% of the control mean of $208). The increase is almost entirely attributable to pre-relief medical services, indicating reduced repayment of existing bills rather than greater healthcare utilization.
Survey outcomes: There are no detectable average effects on depression (primary outcome), anxiety, stress, subjective well-being, or general health. Debt relief raises the share with at least moderate depression by a statistically insignificant 3.2 percentage points (p-value 0.097; control mean 45.0%); a 95% CI rules out a reduction of more than 0.6 percentage points, well below the 7.0 percentage-point improvement predicted by the median expert respondent. There are similarly null effects on healthcare utilization and financial wellness as measured in the survey.

Scope Conditions

The study focuses specifically on downstream medical debt in collections — debt that has already been through the hospital billing cycle and sold to third-party collectors. Results do not necessarily apply to upstream debt relief (e.g., financial assistance programs applied closer to the time of the medical event), nor to populations with different baseline financial profiles. The credit reporting results are most relevant to the prior regime of widespread reporting; under the current environment in which most medical debt has been removed from credit reports, the credit-access channel is largely foreclosed.

In depth

Q1. Why did the authors focus specifically on downstream medical debt in collections, and how does this define the scope of their study?

The authors focus on downstream medical debt because this is the target of essentially all large-scale government and private relief programs working with RIP Medical Debt, and because it is the category of debt that is most comprehensively observable. Downstream medical debt is defined as bills that have been or are about to be sold by the healthcare provider to a third-party debt collector. This focus excludes upstream unpaid bills still held by the hospital, bills being paid over time, and medical expenses charged to credit cards. The distinction matters because prior literature on hospital financial assistance programs finds substantial benefits from upstream interventions that relieve debt closer to the precipitating medical event; the authors’ null results are explicitly scoped to the downstream, post-collection stage.

Q2. Why did the purchase price of medical debt (5.5 cents per dollar for hospital debt, less than 1 cent per dollar for collector debt) suggest caution about expected financial impacts ex ante?

The authors argue that in a competitive market, the purchase price of medical debt reflects the sum of expected recovery rates and collection costs. A price of 5.5 cents per dollar implies that actual recovery (what collectors expect to collect from patients) is very low. Even if all of the expected recovery is passed through to the patient as a financial benefit, the direct liquidity gain from debt forgiveness is a small fraction of the debt’s face value. For the collector debt experiment, where the purchase price is less than 1 cent per dollar, the expected direct financial benefit to recipients is even smaller. The authors note that survey respondents expected to pay 54% of their outstanding medical debt and thought it fair to pay 37%, suggesting that perceived (rather than actual) payment obligations may be what connects medical debt to financial behavior.

Q3. How was random assignment implemented in the hospital debt experiment, and what design features ensure the validity of the experiment?

Within each of 18 waves between August 2018 and October 2020, RIP received a portfolio of unpaid bills from the hospital system. Persons were grouped at the individual level and stratified by the amount of debt, state of residence, insurance status, and a collections score predicting repayment likelihood. Within strata, persons were randomly assigned to treatment or control, with approximately 20% treated per wave (varying with donor funding). The hospital was unaware of the intervention, eliminating scope for selection of particularly uncollectible accounts. Treatment notification occurred via two letters sent approximately three and six weeks post-purchase. Balance tests confirm successful randomization: all p-values on baseline characteristics are above 0.05, and F-tests fail to reject joint balance.

Q4. What was the credit reporting sub-experiment and how was it identified?

The debt collector in the collector debt experiment historically reported medical debt to the credit bureaus but largely ceased doing so before the first intervention wave (March 2018), reflecting broader industry concerns about CFPB enforcement and data integrity risk. However, a subset of accounts — 2,761 accounts (6.8% of wave 1, with virtually identical match rates across treatment and control) — were still being reported until 2019 Q1 (three quarters after wave 1 and one quarter after wave 2). This created a natural sub-experiment: for this subset, treatment group accounts were removed from credit reports immediately upon debt relief, while control group accounts continued to be reported for three more quarters before also being removed. The authors identify reported accounts by matching dollar amounts in collections account data to credit bureau tradeline data in the four quarters prior to intervention, and use this variation to estimate effects separately for the “reporting” and “no-reporting” periods.

Q5. What are the exact estimated effects on credit scores and credit limits in the credit reporting sub-experiment?

During the three quarters when control group accounts are still reported to credit bureaus, debt relief raises credit scores by an average of 3.4 points (p-value 0.021) for the full reporting subsample. The effect is concentrated among those with no other debt in collections: 13.8 points (p-value 0.008) versus 1.2 points (p-value 0.440) for those with other debt in collections. Credit limits increase gradually, reaching $340 (15.3% of the post-reporting control mean of $2,231; p-value 0.010) by the four quarters after control group reporting ceases. Among persons with no other debt in collections, this credit limit effect grows to $922 (23% of the control mean; p-value 0.070). Once control group reporting stops, both the credit score effect and the credit limit growth converge to zero for persons with other debts in collections. The event study coefficients show the credit limit effect growing approximately linearly over five quarters post-intervention before leveling out.

Q6. How does the paper rule out the possibility that medical debt relief increases healthcare utilization, thereby causing more future medical bills?

The collections account analysis separates future debt accrual into debt associated with pre-relief medical services (which can only result from reduced repayment of existing bills) and post-relief medical services (which could reflect either increased utilization or changed repayment of new bills). Panel B of Table VI shows that virtually all of the increased debt sent to collections — a $15 increase and 1.1 percentage-point increase in the probability of any future collection — is attributable to pre-relief services. Panel C shows statistically insignificant increases in future debt from post-relief services. The authors therefore attribute the effect to reduced payment of existing bills and conclude they “cannot rule in or rule out effects on healthcare utilization” for the post-relief services channel, but the dominant mechanism is behavioral change in repayment of already-incurred debt.

Q7. What are the three mechanisms proposed to explain the reduction in repayment of existing medical bills, and which mechanism is rejected?

The authors offer three candidate mechanisms for the 6.6% relative increase in the probability of future bill collections: (i) an expectations mechanism, in which beneficiaries reduce payments because they anticipate future debt relief from similar charitable programs; (ii) a targeting mechanism, drawing on Dobkin et al. (2018), in which patients tolerate a certain level of indebtedness — relieving some debt creates “room” in their debt budget, so they reduce payment of remaining bills to return to that target level; and (iii) a confusion mechanism, in which recipients mistakenly believe the relief applied to non-forgiven bills (the notification letter explicitly stated “the forgiveness is for this outstanding bill only” but patients may not have internalized this). The income effect or “flypaper” mechanism — the idea that financial relief of existing debt frees up mental-account resources for paying medical bills, thereby increasing repayment — is explicitly rejected by the data, as the effect goes in the direction of less repayment, not more.

Q8. What did the expert survey predict, and how did those predictions compare to the experimental estimates?

An expert survey conducted between April and May 2022 — after the interventions were completed but before results were released — asked academics, non-profit staff, hospital revenue-cycle practitioners, and policymakers to predict the impact of the hospital debt experiment. The median expert predicted a 7.0 percentage-point reduction in depression (8.0 points when weighted by confidence), a 10.2 percentage-point reduction in borrowing (13.7 points when confidence-weighted), and meaningful improvements in healthcare access. In total, 75.6% of respondents predicted medical debt relief is at least a moderately valuable use of charity resources, and 51.1% thought it very or extremely valuable. The authors estimate a statistically insignificant 3.2 percentage-point increase in depression (not a decrease), and a 95% confidence interval that rules out a reduction in depression of more than 0.6 percentage points — far below the 7.0 percentage-point expert prediction.

Q9. What survey methodology was used, and what response rate was achieved?

The survey, administered by NORC at the University of Chicago, targeted a random subset of 14,922 hospital debt experiment participants who entered the study after September 2019 (waves 6-18) and owed at least $500. The protocol spanned 13 weeks and included five postal mailings (including a $2 upfront incentive and a $5 incentive with the paper survey), twice-weekly email reminders, certified mail delivery of the full survey instrument, and telephone interviews by a US-based call center. Respondents received a $50 completion incentive. The protocol achieved a 19.4% response rate, with 68% responding via web, 10% via telephone, and 23% via mail. The survey was titled “Health and Financial Wellness Study” and made no reference to RIP Medical Debt to avoid priming respondents. Respondents were surveyed on average 13 months after treatment assignment (interquartile range 10 to 17 months).

Q10. What heterogeneity in survey outcomes was detected, and how do the authors interpret the anomalous depression finding for high-debt recipients?

Across all four preregistered heterogeneity dimensions (medical debt amount, age of debt, age of person, amount of other debt in collections), null effects on survey outcomes were found in 15 of 16 subgroups. The exception is persons in the fourth quartile of medical debt eligible for relief, for whom debt relief caused a statistically significant 12.4 percentage-point increase in depression (p-value 0.002) relative to a control mean of 45.9%, with similar patterns for anxiety, stress, subjective well-being, and general health. The authors consider this may be a statistical fluke given the null results across all other 15 groups. They also note potential parallels with findings from unconditional cash transfer experiments, where the receipt of transfers raised the salience of financial deprivation without addressing its underlying causes. A charity-stigma mechanism (recipients did not request the assistance) is also considered. The authors caution against giving this result undue weight in the overall assessment.

Q11. How does the paper position downstream debt relief relative to upstream interventions, and what does prior evidence suggest about upstream alternatives?

The authors highlight that their null results do not extend to upstream medical debt relief. Adams et al. (2022), studying a hospital financial assistance program at Kaiser Permanente that bundled debt relief with reductions in cost-sharing close to the time of the medical event, found substantial increases in high-value healthcare utilization. The Oregon Health Insurance Experiment (Baicker et al. 2013) found that Medicaid reduced depression by 9 percentage points among low-income uninsured adults. The authors suggest several reasons why downstream relief may fail: the intervention occurs too late after the precipitating event (approximately 15 months after the medical service in the hospital debt experiment, and about 7 years in the collector debt experiment), patients may have habituated to the stress of debt collections, the relief amount may be too small relative to overall financial distress, and the direct financial benefit is inherently limited by the low market price of collections-stage debt.

Q12. How do the authors address concerns about differential survey response and external validity?

Treated persons were a statistically insignificant 1.3 percentage points more likely to respond to the survey (p-value 0.056). The authors address this in two ways. First, they estimate specifications that (i) add rich observable controls and (ii) use speed of survey response as a proxy for unobserved response propensity; neither exercise changes the estimates meaningfully. Second, to probe external validity, they test for heterogeneous effects by predicted response propensity (from a logistic regression of a response indicator on baseline characteristics) and by speed of response; neither yields evidence of differential effects for non-respondents. They also compare credit bureau treatment effects for the full hospital debt sample, the survey outreach sample, and the survey respondent sample and find similar estimates across all three groups.

Key Concepts

Downstream medical debt: Medical bills that have already been sent to third-party debt collectors by the healthcare provider after the initial billing cycle, as distinguished from upstream unpaid bills still held by the hospital at or near the time of the medical event. The paper studies debt at this late stage specifically because it is the target of most large-scale relief programs.

Credit reporting sub-experiment: An embedded quasi-experiment within the collector debt RCT, exploiting the fact that a subset of accounts (6.8% of wave 1) were still being reported to credit bureaus at the time of intervention while the debt collector had already ceased reporting for the remaining accounts. This allows separate estimation of debt relief effects with and without counterfactual credit bureau reporting, using the period until 2019 Q1 (when the collector stopped reporting entirely) as the “reporting” window.

Downstream bill repayment effect: The paper’s finding that debt relief increases the probability of a subsequent unpaid medical bill being sent to collections. The paper attributes this primarily to reduced repayment of existing pre-relief medical bills rather than to increased healthcare utilization, consistent with an expectations, targeting, or confusion mechanism — and inconsistent with an income or flypaper effect that would increase repayment.

Targeting a level of indebtedness: A behavioral model (drawn from Dobkin et al. [2018]) in which patients implicitly target a certain level of indebtedness. Under this model, relieving some debt creates headroom in the patient’s implicit debt budget, leading to reduced repayment of remaining bills to restore the targeted level of total indebtedness.

Expert survey (pre-results): A structured elicitation of predicted treatment effects conducted between April and May 2022 — after the interventions were completed but before results were released — from academics, non-profit practitioners, hospital revenue-cycle managers, and policymakers. Used as a benchmark to quantify how far the causal estimates fall below prevailing beliefs, and to document that the null results were ex ante surprising to informed observers.

PHQ-8 (Patient Health Questionnaire-8): An eight-item validated clinical screen for depression, used as the paper’s primary preregistered survey outcome. An indicator for “at least moderate depression” on the PHQ-8 is the main mental health measure against which the debt relief treatment effect is estimated.

Multimodal survey: A survey protocol combining five postal mailings, twice-weekly email reminders, certified mail delivery of a paper survey instrument, and US-based call center telephone interviews, designed to maximize response rates in a hard-to-reach low-income population with medical debt in collections.

The Productivity of Professions: Evidence from the Emergency Department

Mon, 01 Jan 0001 00:00:00 +0000

This paper studies the productivity of nurse practitioners (NPs) versus physicians performing overlapping tasks in Veterans Health Administration (VHA) emergency departments (EDs), exploiting a quasi-experiment created by the VHA’s December 2016 grant of full practice authority to NPs. The identification strategy instruments patient assignment to NPs versus physicians using quasi-random variation in the number of NPs on duty on a given ED-day, conditional on ED-by-time-category fixed effects. The sample covers 1.1 million ED visits across 44 VHA EDs from January 2017 to January 2020, seen by 1,348 physicians and 156 NPs. The instrument is validated by demonstrating balance in patient observable characteristics across values of the instrument, stability of IV estimates across 256 combinations of patient covariate controls, and absence of spillover effects from NP presence onto physician performance.

On average in the ED setting, NPs increase patient length of stay by 11 percent (approximately 18 additional minutes) and raise the cost of the ED visit by 7 percent (approximately $66 per visit). NPs raise the 30-day preventable hospitalization rate by 0.25 percentage points, a 20 percent increase relative to the mean. No statistically significant effect on 30-day mortality is detected (95 percent confidence interval: -0.34 to 0.11 percentage points). OLS estimates carry the opposite sign because NPs are assigned healthier patients in observational data; the IV design corrects for this selection.

The average NP-physician performance gap varies systematically by case complexity and severity. For the highest-complexity quartile of cases (by Elixhauser comorbidities), NPs increase ED costs by 12 percent and length of stay by 28 percent. For cases at or above the 95th percentile of severity (based on 30-day mortality by diagnosis), NPs increase ED costs by 25 percent, length of stay by 99 percent, and admissions by 26 percentage points (42 percent relative to the mean), while reducing 30-day preventable hospitalization by 3 percentage points — suggesting that NPs’ higher care intensity partially offsets worse intrinsic skill for the most severe cases. For lower-complexity cases, the cost and length-of-stay gaps are smaller, but NPs still significantly raise preventable hospitalizations.

NPs exhibit clinical decision-making patterns consistent with lower diagnostic skill: they are more likely to order consults (2.6 percentage points, or 11 percent of the mean), CT scans (1.2 percentage points, or 8.3 percent), and X-rays (2.0 percentage points, or 6.9 percent). NPs lower opioid prescriptions by 1.8 percentage points (20 percent of the mean) and raise antibiotic prescriptions by 4.0 percentage points (6.3 percent of the mean), consistent with threshold adjustment under lower diagnostic skill with asymmetric error costs. Downstream, patients treated by NPs incur similar opioid use disorder rates despite lower opioid prescribing, and higher infection-related return visit rates despite higher antibiotic prescribing.

Counterfactual analysis finds that allocating one quarter of ED patients to NPs increases net spending by $129 million per year to the VHA after accounting for NPs’ lower wages (approximately half of physicians’). However, deploying NPs exclusively to the least-complex quarter of cases reduces net spending to approximately one-fifth of this amount.

A distributional analysis deconvolving provider-specific IV estimates reveals that within-profession productivity variation substantially exceeds the average between-profession gap. The interquartile range in annual spending attributable to provider productivity within each profession is approximately $900,000, roughly three times the mean annual spending difference between the average NP and the average physician. A randomly chosen NP outperforms a randomly chosen physician in up to 38 percent of pairs. Within professions, individual provider productivity shows essentially no relationship with wages or case complexity assigned, whereas between professions, case assignment and wages are strongly sorted by professional class.

Q: What is the core research question? A: The paper asks whether NPs and physicians, who perform overlapping tasks in the ED but differ sharply in training, selectivity, and pay, differ in productivity, and how that average between-profession difference compares to productivity variation within each profession. It also asks what mechanisms drive any observed gap and how case assignment responds to provider skill differences.

Q: What is the identification strategy and why is it credible? A: The authors instrument patient assignment to NPs with the number of NPs on duty on the ED-day, conditional on ED-by-year, ED-by-month, ED-by-day-of-week, and ED-by-hour fixed effects. Credibility rests on: provider schedules being set months in advance, decoupling NP availability from arriving patient characteristics; patient characteristics being well balanced across values of the instrument conditional on fixed effects; IV estimates being stable across all 256 covariate-control combinations; and on-duty physician and NP characteristics also being balanced across the instrument.

Q: What are the main average effects of NPs on resource use? A: IV estimates show NPs increase patient length of stay by 11 percent (approximately 18 minutes) and ED cost by 7 percent (approximately $66 per visit). There is no significant average effect on inpatient admissions in the overall sample, though NPs significantly raise admissions for high-severity cases.

Q: What is the effect of NPs on patient health outcomes? A: NPs raise 30-day preventable hospitalizations by 0.25 percentage points, a 20 percent increase relative to the mean. The 95 percent confidence interval for 30-day mortality is -0.34 to 0.11 percentage points, implying no statistically significant mortality effect in the overall sample.

Q: Why do OLS and IV estimates have opposite signs? A: In observational data, NPs treat healthier patients than physicians: NP patients are younger (60.7 versus 62.5 years), have fewer Elixhauser comorbidities (3.2 versus 3.7), and have fewer prior inpatient stays (0.4 versus 0.7). This selection causes OLS estimates of NP effects to be negative. The IV corrects for this by exploiting quasi-random variation in NP availability; IV estimates are stable across all combinations of patient controls, consistent with the instrument being orthogonal to unobservable patient health.

Q: How does the NP-physician performance gap vary with case complexity and severity? A: For the highest-complexity quartile, NPs increase length of stay by 28 percent and ED costs by 12 percent without a significant preventable hospitalization effect. For cases at or above the 95th severity percentile, NPs increase length of stay by 99 percent, ED costs by 25 percent, and admissions by 26 percentage points (42 percent relative to the mean), while reducing 30-day preventable hospitalization by 3 percentage points. For lower-complexity quartiles, NPs show smaller cost and length-of-stay effects but significantly raise preventable hospitalizations, suggesting the higher care intensity at high severity compensates for lower skill.

Q: What does the heterogeneity by severity imply for optimal case assignment? A: The pattern is consistent with skill-task matching: NPs have a comparative and absolute disadvantage in complex cases, so optimal assignment directs less complex cases to NPs and fewer patients to NPs when physicians are more available. Empirically, NPs are indeed assigned healthier patients from the available pool, and are assigned a modestly smaller share when the ED is less busy.

Q: What mechanisms explain the average NP-physician gap? A: Three mechanisms are examined. First, experience: a one-standard-deviation increase in specific experience is associated with a 5.8 percent decline in the NP-physician length-of-stay gap, and general experience with a 10 percent decline; however, experience does not significantly narrow the preventable hospitalization gap. Second, information acquisition: NPs order more consults, CT scans, and X-rays, consistent with compensating for lower diagnostic skill. Third, prescription thresholds: NPs reduce opioid prescribing by 20 percent and raise antibiotic prescribing by 6.3 percent, consistent with threshold adjustment under asymmetric error costs, but downstream outcomes are not improved correspondingly.

Q: What do prescription patterns and downstream outcomes reveal about NP diagnostic skill? A: NPs prescribe fewer opioids yet patients treated by NPs obtain similar downstream opioid use disorder rates; NPs prescribe more antibiotics yet patients treated by NPs have higher rates of return visits with infections. This pattern is consistent with NPs exhibiting higher rates of both false positives and false negatives, not merely adjusted thresholds, suggesting genuinely lower diagnostic skill rather than threshold differences alone.

Q: What do counterfactual cost calculations show? A: Allocating one quarter of ED patients to NPs raises non-wage spending by $197 million per year to the VHA; after accounting for NP wages being half of physician wages (approximately $120,000 versus $240,000 per year), net cost is still $129 million per year. Restricting NP deployment to the least-complex quarter of cases reduces net spending to approximately one-fifth of this amount, illustrating that targeted case assignment substantially improves NP cost-effectiveness.

Q: How large is within-profession productivity variation relative to between-profession differences? A: The interquartile range in annual spending attributable to provider productivity within each profession is approximately $900,000, roughly three times the mean annual spending difference between the average NP and the average physician. A randomly chosen NP outperforms a randomly chosen physician in up to 38 percent of random pairs. The authors conclude that, despite stark differences in training and selection between professions, within-profession variation dominates.

Q: Is individual provider productivity reflected in wages or case assignment within professions? A: Within each profession, provider productivity shows essentially no relationship with wages or with the complexity of assigned cases. This contrasts sharply with between-profession patterns, where professional class strongly predicts both wages (NPs earn approximately $120,000 per year versus $240,000 for physicians) and assigned case complexity. The authors interpret this as evidence of informational and organizational frictions in recognizing individual productivity within professional classes, and note that professional class is a far stronger predictor of pay and case assignment than is individual productivity.

Q: How do complier characteristics relate to the broader patient population? A: Compliers — cases whose provider type is determined by the instrument — are healthier than the average case: younger, with fewer comorbidities, fewer prior inpatient stays, and lower predicted mortality. Never-takers are riskier than the average case. There are no always-takers since patients cannot be assigned to NPs on days when no NPs are on duty.

Q: How does this paper relate to the literature on NP scope-of-practice laws? A: The scope-of-practice literature estimates general-equilibrium effects of allowing NPs greater autonomy, including labor reallocation between professions. This paper instead estimates the partial-equilibrium causal effect of assigning a patient to an NP versus a physician, holding the broader labor market fixed. The two literatures are complementary: the heterogeneity findings here suggest that scope-of-practice expansions may be more beneficial in lower-complexity primary care settings where the NP-physician performance gap is smaller.

Q: What are the policy implications of the findings? A: Three implications are highlighted. First, the efficiency of using NPs depends critically on case assignment: deploying NPs on the least-complex cases reduces net costs to approximately one-fifth of indiscriminate deployment. Second, the substantial overlap between NP and physician productivity distributions provides support for NP use in less complex settings even within the ED context. Third, within-profession productivity variation far exceeding between-profession differences suggests that individual-level productivity assessment, rather than professional class, may be a more accurate guide to case assignment and compensation.

Quasi-experimental variation in NP availability: The identification strategy exploits day-to-day variation in the number of NPs scheduled to work in a given VHA ED, conditional on ED-by-time-category fixed effects, as an instrument for whether a patient is assigned to an NP versus a physician. Schedules are set months in advance, rendering the NP count orthogonal to arriving patient characteristics conditional on those fixed effects.

30-day preventable hospitalization: A standardized quality-of-care outcome defined by the Agency for Healthcare Research and Quality, measuring hospitalizations occurring within 30 days of ED discharge that are classified as preventable given adequate prior outpatient management. Used by the paper as the primary downstream health outcome beyond the ED visit itself.

Elixhauser comorbidities: A set of 31 binary indicators for chronic conditions (e.g., cancer, diabetes) based on medical histories in the prior 365 days, used in this paper to measure and stratify case complexity into quartiles for heterogeneity analysis.

Productivity distributions within professions: Provider-specific productivity estimates derived from a just-identified IV model that instruments assignment to individual providers by indicators for on-duty providers, then deconvolved into underlying distributions using the Efron (2016) and Kline-Rose-Walters (2022) method. These distributions characterize the spread of productivity within each professional class, separate from measurement error.

Prescription threshold adjustment: The mechanism, formalized in Chan, Gentzkow, and Yu (2022), by which providers with lower diagnostic skill optimally adjust treatment thresholds in response to asymmetric costs of false-positive versus false-negative errors. In this paper’s application, NPs lower the opioid prescription rate (where false positives carry higher costs: addiction and overdose) and raise the antibiotic prescription rate (where false negatives carry higher costs: untreated infection), but downstream outcomes do not improve correspondingly.

Skill-task matching: The organizational economics principle (Acemoglu and Autor 2011) that efficiency requires assigning more complex tasks to higher-skilled workers. The paper documents that between professions, case assignment broadly follows this principle (NPs receive less complex patients on average), but within professions, essentially no matching between individual provider productivity and case complexity is observed.

Full practice authority (VHA, December 2016): The VHA policy that allowed NPs to treat patients independently without physician supervision at VHA facilities, superseding state-level restrictions. This policy change defines the start of the paper’s sample period and establishes the institutional context in which the quasi-experiment occurs, as it removed the requirement for physician oversight that previously constrained NP independence.

Why Doesn't the United States Have National Health Insurance?

Mon, 01 Jan 0001 00:00:00 +0000

This paper investigates a critical juncture in the development of national health insurance (NHI) in the United States: the post-World War II period when most peer nations moved to establish comprehensive public coverage while the U.S. did not. The authors examine the causal role of the American Medical Association (AMA), which in 1949 hired Whitaker & Baxter’s Campaigns, Inc. — the country’s first political public relations firm — to direct a nationwide campaign opposing NHI and promoting private (voluntary) health insurance (PHI).

The Campaign had two main components. First, a physician outreach component in which AMA members distributed pamphlets to patients warning against “socialized medicine” and encouraging enrollment in private plans, and acted as liaisons to local civic organizations to solicit resolutions against NHI sent to elected officials (nearly 50 million pieces of material were sent to physicians). Second, a mass newspaper advertising component, in which a standard ad was placed across newspapers nationwide, with an additional $19 million (approximately $240 million in current dollars) in coordinated tie-in advertising from roughly 23,000 corporations and industry associations. The messaging framed NHI as “un-American” and associated private insurance with “freedom” and “the American way,” providing little substantive information about insurance products.

The authors construct novel measures of Campaign exposure by combining (a) per capita pamphlets distributed by AMA physicians and (b) per capita advertising circulation scaled by local newspaper readership, using archival data from the Whitaker & Baxter Archives (Sacramento), the National Archives (Washington D.C.), digitized AMA Medical Directories, the N.W. Ayer & Son’s Newspaper Directory, and newly discovered Blue Shield enrollment data from AMA Council on Medical Service annual reports covering 1946–1954.

The primary estimation strategy exploits spatial variation in Campaign intensity combined with its timing, using event studies with state and year fixed effects and design controls for income per capita and unionization. The identifying assumption — that Campaign intensity was conditionally as-good-as-randomly assigned — is supported by balance tests showing no pre-Campaign correlation between exposure and enrollment or sociodemographic characteristics (with the exception of Black population share), and by the historical record that the Campaign was organized hastily following Truman’s unexpected 1948 electoral victory.

Main findings: A one standard deviation increase in Campaign exposure explains approximately 20% of the post-Campaign increase in PHI enrollment, corresponding to roughly 14 million additional enrollees — an effect comparable in magnitude to increasing average per capita income by approximately $100 (about 7 percent). On public opinion, a one standard deviation increase in Campaign exposure led to a six percentage point decline in popular support for NHI per Gallup survey wave, a reversal occurring against a backdrop of 69% pre-Campaign approval that was trending upward. For context, this six-point magnitude approximates the entire gap in NHI support between union and non-union households, or one-third the racial gap in support. Campaign intensity also predicts civic organizations passing resolutions favoring PHI, Republican legislators adopting speech semantically similar to Campaign propaganda, and — by 1952 — AMA members being five times more likely to donate to the Eisenhower-Nixon ticket than non-AMA physicians, with donation rates increasing in Campaign intensity.

Scope conditions: The analysis covers 48 U.S. states from 1946 to 1954, ending at the 1954 IRS tax code change that expanded commercial insurers’ market share. The enrollment data capture Blue Shield (physician-run) plans specifically; the paper explicitly notes that commercial insurer granular data are unavailable for the main Campaign period. The authors argue that multiple subsequent factors — middle-class acquisition of private coverage reducing demand for a public option, incumbent interests defending the status quo, and the persistent ideological linkage of private insurance with freedom — help explain why NHI was not adopted in subsequent decades, though these persistence mechanisms are outside the paper’s direct empirical scope.

Q: What was the AMA’s Campaign, and what prompted it? A: In response to Harry Truman’s unexpected 1948 presidential victory alongside a Democratic Congress — and with a majority of informed voters favoring NHI — the AMA hired Whitaker & Baxter’s Campaigns, Inc. to run the National Education Campaign (NEC). The Campaign had two components: physician outreach (pamphlet distribution to patients, liaison to civic organizations) and mass newspaper advertising. The AMA paid Whitaker & Baxter approximately $1.2 million per year in current terms, and coordinated an additional $19 million in 1950 dollars (roughly $240 million today) in tie-in advertising from allied corporations and trade groups.

Q: How is Campaign exposure measured, and how is it validated as conditionally exogenous? A: Campaign exposure combines two standardized components: per capita pamphlets distributed by AMA physicians (pamphlet quantity from W&B archives scaled by state AMA membership share) and per capita advertising circulation scaled by local newspaper readership (share of adults with more than five years of schooling). The two components are summed and standardized. Exogeneity is supported by balance tables showing no pre-Campaign correlation between exposure and enrollment or Gallup opinion, by the absence of discontinuous changes in income or unionization at Campaign onset, and by the historical fact that Campaign logistics relied on pre-existing networks assembled hastily in response to Truman’s unanticipated victory.

Q: What is the main effect of the Campaign on private health insurance enrollment? A: A one standard deviation increase in Campaign exposure is associated with a two percentage point increase in the share enrolled in PHI in the preferred specification (Column 4 of Table 1, which includes income, unionization, state fixed effects, and year fixed effects; coefficient 0.020, se 0.007, significant at 1%). This accounts for approximately 20% of the overall post-Campaign increase in PHI enrollment, corresponding to roughly 14 million new enrollees. The pre-Campaign coefficient is not statistically significant (coefficient 0.004, se 0.005), and the F-test p-value for pre-trends is 0.958.

Q: What is the effect of the Campaign on public opinion toward NHI? A: Using Gallup survey data, a one standard deviation increase in Campaign exposure led to an approximately six percentage point decline in individual support for NHI legislation per survey wave, against a pre-Campaign approval level of 69% that was trending upward. The F-test p-value for pre-trends in the Gallup event study is 0.179. This six-point effect is approximately equal to the gap in NHI support between union and non-union households, and approximately one-third the racial gap in support.

Q: What evidence links the Campaign to civic organizations and the legislative process? A: The Campaign’s archives document all civic organizations “on record against compulsory health insurance,” meaning they had passed resolutions in favor of PHI. The authors find a positive relationship between Campaign intensity and civic organizations passing such resolutions at the county level. Resolutions sent to elected officials were traced to the Congressional Record and to physical folders in the National Archives; their semantic similarity to AMA-WB propaganda is confirmed. Republican legislators’ speech in the 81st Congress shows increased similarity to Campaign language in proportion to Campaign intensity in their district or state, while Democrat legislators do not show this pattern. NHI and the AMA experienced spikes in mention frequency in the Congressional Record during this period.

Q: Did the Campaign affect physician political behavior beyond the clinic? A: By 1952, when the Republican platform had fully adopted the AMA’s position, AMA members were approximately five times more likely to donate to the Eisenhower-Nixon ticket than non-AMA physicians, with donation probability increasing in Campaign intensity. The authors digitized the donor list from the National Professional Committee for Eisenhower (NPCE) — a separate lobbying entity created because the AMA legally could not endorse candidates — and linked approximately 80% of physician donors to the AMA Medical Directory.

Q: What alternative explanations for PHI growth does the paper address, and how? A: The standard literature attributes PHI growth to the 1942 Stabilization Act wage freeze (which left benefits unconstrained), collective bargaining rights clarified in the late 1940s, and the 1954 IRS tax exemption for employer-paid premiums. The authors include income per capita and unionization as core design controls and show that their Campaign exposure coefficient is stable across specifications with and without these controls (coefficients of 0.025 and 0.020 in Table 1 Columns 1–2 vs. 3–4, respectively). The analysis stops in 1954 before the tax change, and the authors note that by 1952 roughly 63% of households already had some form of medical expense insurance.

Q: What is the conceptual mechanism through which the Campaign operated? A: The authors adapt Sobbrio (2011)’s indirect lobbying model. Voters hold uniform priors over whether NHI enactment yields net positive or negative social surplus. The private-sector advocate (AMA-WB) sends messages that shift voters’ posterior beliefs toward the negative-surplus state and, simultaneously, encourage PHI enrollment, which reduces voters’ private valuation of a public option. Because citizens were likely unaware of the coordinated tie-in advertising across industries and the financial motivation behind physician messaging, the framing operated through naive belief updating. The public-sector advocate (Truman administration, Committee for the Nation’s Health) was vastly outresourced — the CNH raised only $104,000 in 1949 — and faced legal constraints on executive lobbying.

Q: What advertising tactics specifically characterized the Campaign, and what do they imply about mechanisms? A: Campaign pamphlets and ads provided little or no substantive information about insurance products (coverage, eligibility, cost) and instead tied health insurance to ideological symbols: “freedom,” “the American way,” “the voluntary way,” and warnings about “socialized medicine.” Word clouds from Campaign materials confirm “America” and “freedom” as dominant terms. The authors connect this to behavioral models of advertising (Mullainathan, Schwartzstein and Shleifer 2008) whereby advertisers create or exploit associations to influence product beliefs. The absence of informational content is consistent with effects operating through ideology and identity rather than rational product evaluation.

Q: What explains why the U.S. did not adopt NHI in subsequent decades after the immediate Campaign period? A: The authors offer three mechanisms (discussed outside their main empirical scope): First, as middle-class Americans obtained PHI through employers, demand for a public option diminished — the model formalizes this as reduced private valuation of NHI. Second, incumbents who benefit from the private status quo — Blue Cross Blue Shield, AMA, American Hospital Association, and pharmaceutical companies, which today comprise four of the top ten direct federal lobbyists — actively work to maintain it (Acemoglu, Egorov and Sonin 2021). Third, the Campaign’s ideological framing proved durable: ideologically similar rhetoric opposing “socialized medicine” appeared in campaigns against both Clinton-era and Obama-era reform efforts, and has been linked to increased adverse selection and preventable deaths (Bursztyn et al. 2022; Galvani et al. 2022).

Q: What are the paper’s main contributions to the literature? A: The paper provides the first causal evidence on the AMA’s political role in blocking NHI at the post-WWII juncture, contributing to the economic history of U.S. social insurance development. It contributes to the advertising literature by providing credible estimates of a sustained national campaign combining trusted field agents (physicians) with mass media, and to the lobbying literature by documenting indirect lobbying — persuasion of ordinary citizens — as a distinct and effective tool alongside direct lobbying. It also documents physician behavior outside the clinical setting, showing how rents from supply-side constraints were deployed to shape the market for medical services.

Indirect lobbying: In the paper’s usage, persuasion of ordinary citizens via campaigns — as distinct from direct lobbying of policymakers — used to shift median voter beliefs and behavior to achieve legislative goals. Whitaker & Baxter are credited with creating this field through their work at Campaigns, Inc.

Campaign exposure: The paper’s composite treatment variable, constructed as the sum of two standardized components: per capita pamphlets distributed by AMA physicians (physician outreach) and per capita advertising circulation scaled by local newspaper readership (mass communications), then re-standardized to mean 0, standard deviation 1.

Tie-in advertising: Coordinated newspaper advertisements by third-party corporations and trade associations placed simultaneously with the main AMA-WB Campaign ad, sharing the “Voluntary Way is the American Way” slogan. Approximately 60% of newspapers with a main Campaign ad also had tie-in ads, averaging three per issue; third-party spending totaled approximately $19 million in 1950 dollars (~$240 million current).

Voluntary (private) health insurance: In the paper’s framing, the AMA-promoted alternative to NHI — prepaid medical service plans run by state medical societies (Blue Shield) or nonprofit hospitals (Blue Cross) — deliberately labeled “voluntary” to contrast with “compulsory” NHI, embedding the product within an ideological frame of free choice.

National Education Campaign (NEC): The AMA’s official name for the anti-NHI campaign directed by Whitaker & Baxter starting in 1949, characterized as “educational” to provide legal cover; the name itself illustrates the indirect lobbying strategy of framing political advocacy as public information.

Source text origin / abstract-only block: Not a paper-defined concept; excluded.

Naive voter updating: The paper’s modeling assumption (drawn from Sobbrio 2011) that voters held uniform priors on health insurance policy outcomes and updated beliefs via Bayesian message receipt, without awareness of coordination across industries or the financial motivation of physician messengers — making the ideological framing effective.

Physician field agents: In the Campaign’s design, AMA member physicians served as credible, trusted intermediaries who distributed pamphlets to patients and solicited civic organization resolutions, leveraging their social status to amplify the Campaign’s reach into communities where mass advertising alone would be insufficient.

I18 | Macro Paper Warehouse

Designing Dynamic Reassignment Mechanisms: Evidence from GP Allocation

Germs in the Family: The Short- and Long-Term Consequences of Intra-Household Disease Spread

Pigovian Transport Pricing in Practice

The Earnings and Labor Supply of U.S. Physicians

Overview

In depth

Q1. What is the level and composition of physician earnings in the tax data, and how do they compare to survey-based estimates?

Q2. What share of total U.S. healthcare spending do physician earnings represent, and what does this imply for policy?

Q3. How does the age-earnings profile of physicians evolve, and what drives growth during peak years?

Q4. How large and unusual is the geographic pattern of physician earnings, and what is the causal role of location?

Q5. What instrument is used to identify the causal effect of Medicare price changes on physician earnings, and why is it valid?

Q6. What are the magnitudes of the earnings and labor supply responses to Medicare price changes?

Q7. How does the ACA insurance expansion affect physician earnings and retirement, and what is the implied pass-through?

Q8. How does the earnings-specialty choice relationship vary across the physician ability distribution?

Q9. What are the quantitative implications for specialty choice if primary care incomes were raised to subspecialty levels?

Q10. What is the pass-through from Medicare reimbursements to physician earnings, and how does it compare to rent-sharing elsewhere?

Q11. Can Medicare’s geographic pricing policy explain the unusual geographic earnings pattern for physicians?

Q12. How does the theoretical model of specialty choice and entry restrictions guide the empirical predictions?

Key Concepts

The Effects of Medical Debt Relief: Evidence from Two Randomized Experiments

Layer 1: Overview

In depth

Q1. Why did the authors focus specifically on downstream medical debt in collections, and how does this define the scope of their study?

Q2. Why did the purchase price of medical debt (5.5 cents per dollar for hospital debt, less than 1 cent per dollar for collector debt) suggest caution about expected financial impacts ex ante?

Q3. How was random assignment implemented in the hospital debt experiment, and what design features ensure the validity of the experiment?

Q4. What was the credit reporting sub-experiment and how was it identified?

Q5. What are the exact estimated effects on credit scores and credit limits in the credit reporting sub-experiment?

Q6. How does the paper rule out the possibility that medical debt relief increases healthcare utilization, thereby causing more future medical bills?

Q7. What are the three mechanisms proposed to explain the reduction in repayment of existing medical bills, and which mechanism is rejected?

Q8. What did the expert survey predict, and how did those predictions compare to the experimental estimates?

Q9. What survey methodology was used, and what response rate was achieved?

Q10. What heterogeneity in survey outcomes was detected, and how do the authors interpret the anomalous depression finding for high-debt recipients?

Q11. How does the paper position downstream debt relief relative to upstream interventions, and what does prior evidence suggest about upstream alternatives?

Q12. How do the authors address concerns about differential survey response and external validity?

Key Concepts

The Productivity of Professions: Evidence from the Emergency Department

Why Doesn't the United States Have National Health Insurance?