Forthcoming | Macro Paper Warehouse

Diversion Risk, Markups, and the Financing Cost Advantage of Trade Credit

Thu, 01 Jan 2026 00:00:00 +0000

This paper provides a theory and evidence for why firms with higher markups extend more trade credit, focusing on a financing cost channel that is distinct from existing competition-based explanations. In the model, diversion risk creates a wedge between the bank borrowing rate and the deposit rate. Under cash in advance, the buyer must borrow the full invoice amount (production cost times markup); under trade credit, the seller instead borrows only her production costs. Since higher markups amplify the difference in borrowing needs between these two payment forms, they make trade credit more attractive—and this advantage strengthens with the buyer’s borrowing rate, generating a unique interaction prediction. Empirical tests using detailed Chilean export transactions matched with firm-product markup estimates (De Loecker et al. 2016 methodology) find that a one standard deviation rise in upstream markups increases trade credit by 13 days, with the extensive and intensive margins contributing roughly equally; this effect strengthens with the destination country’s borrowing costs. Results are robust to instrumenting markups with plant-product level physical productivity and replicate in U.S. Compustat data with the real Effective Fed Funds Rate as the borrowing cost proxy.

Summary of a forthcoming paper, AI-assisted and human-reviewed. See the linked original for the authoritative claims and full conditions.

In depth

Q1. Why does a higher markup make trade credit more attractive?

Under cash in advance, the buyer must pre-pay the full invoice price (production cost times markup), requiring borrowing equal to the markup times production cost; under trade credit, the seller instead borrows only her production costs to finance production while the buyer pays later from sales revenues, requiring no pre-payment borrowing at all. Because diversion risk causes banks to charge more than the deposit rate for loans, a higher markup amplifies the savings in financing costs from using trade credit rather than cash in advance, making trade credit strictly preferred whenever the markup and interest rate spread are both positive. This mechanism is operative even if the seller and buyer face identical borrowing rates and even if goods are no harder to divert than cash (distinguishing it from Burkart and Ellingsen 2004, where trade credit dominates because goods are harder to divert).

Q2. What is the unique empirical prediction that distinguishes the financing cost channel?

The model uniquely predicts that the positive effect of upstream markups on trade credit should increase with the buyer’s borrowing rate: when borrowing is expensive, the relative financing cost advantage of trade credit (which reduces total borrowing) is larger, so higher markups generate even more trade credit use. This interaction prediction distinguishes the financing cost channel from competition-based theories (Demir and Javorcik 2018; Giannetti et al. 2021) which predict higher upstream bargaining power (lower markups) → more trade credit, and allows identification even with a rich set of fixed effects because the interaction term is residual to seller, buyer, and destination fixed effects.

Q3. What do the Chilean export data show?

A one standard deviation rise in upstream markups increases trade credit by 13 days on average, with the extensive margin (probability of using trade credit) and intensive margin (trade credit maturity conditional on use) contributing roughly equally; crucially, the effect of markups on trade credit strengthens with the destination country’s borrowing costs, consistent with the unique interaction prediction of the financing cost channel. Markup estimates are constructed at the firm-product level using the De Loecker, Eeckhout, and Unger (2016) methodology applied to Chilean manufacturing survey data, which requires quantity-based information on inputs and outputs to avoid revenue-based measurement confounds; the extensive fixed effects structure (seller × product, buyer-country × product, and seller × buyer-country-year fixed effects) addresses omitted variable concerns.

Q4. How does the paper handle endogeneity of markups?

The paper instruments for firm-product markups using plant-product level physical productivity, which is a supply-side technological variable that affects markups through the cost side (more productive firms have lower marginal costs and thus higher markups for a given price) but is unlikely to directly affect payment choice; the IV results are quantitatively similar to OLS, supporting the causal interpretation of the markup effect on trade credit. Because markups estimated with revenue data can conflate productivity with demand shocks (the ‘De Loecker critique’), the Chilean quantity-based data are particularly valuable: firm-product quantities and input prices are directly observed in the manufacturing survey, enabling markup estimates that are free of revenue confounds.

Key concepts

financing cost channel of trade credit : the mechanism by which trade credit reduces the total bank borrowing needed for a transaction—because the seller borrows only production costs rather than the buyer borrowing the full invoice price—thereby lowering financing costs when diversion risk creates a borrowing-deposit rate wedge; the paper’s central contribution, distinct from competition-based explanations of trade credit provision. diversion risk and borrowing-deposit rate wedge : the risk that borrowers divert borrowed funds, which causes banks to charge a borrowing rate above the deposit rate; the spread between these rates determines the per-dollar financing cost saved by switching from cash in advance to trade credit, amplifying the role of markups in payment choice. De Loecker et al. (2016) markup estimation : a methodology for estimating markups at the firm-product level using quantity-based production data (physical inputs and outputs) rather than revenue data, avoiding the confound between productivity and demand shocks; used here to obtain the Chilean firm-product markup estimates.

Optimal Taxation of Inflation

Thu, 01 Jan 2026 00:00:00 +0000

This paper analyzes the effectiveness of a tax on inflation policy (TIP)—a fiscal instrument that would require firms to pay a tax proportional to the increase in their prices—as a complement to conventional monetary policy in a New Keynesian framework with multiple sources of inflation. The central result is that combining TIP with conventional monetary policy can implement the first-best allocation in which inflation is zero and the output gap is closed at all times under any path of shocks. Policy instruments should completely specialize: monetary policy should track the neutral rate of interest (addressing demand and productivity shocks by keeping output at its efficient level), while TIP should rise with markup and inflation expectation shocks. Unlike the 1970s view that saw TIP as a substitute for monetary policy, TIP is shown to be a complement. TIP corrects an externality in firms’ pricing decisions without exacerbating relative price distortions. Calibrated simulations suggest a reasonably calibrated TIP could lower the variance of inflation by 45% and of output by 44% relative to a Taylor-rule-only regime.

Summary of a forthcoming paper, AI-assisted and human-reviewed. See the linked original for the authoritative claims and full conditions.

In depth

Q1. What is TIP and what externality does it correct?

TIP (tax on inflation policy) is a fiscal instrument that requires firms to pay a tax proportional to the increase in their prices, and it corrects an externality in firms’ pricing decisions created by markup and inflation expectation shocks that cause private and social returns to price increases to diverge. When shocks to markups or inflation expectations create strategic price-setting incentives, firms’ individually optimal price increases exceed the socially optimal level; TIP re-aligns private with social valuations by making price increases costly. The proposal originated with Wallich and Weintraub (1971) and was widely discussed in the 1970s, but was absent from recent policy discourse until this paper revived it in a microfounded framework.

Q2. What is the complete-specialization result?

Monetary policy and TIP should completely specialize: monetary policy should track the neutral rate of interest—varying with aggregate demand and productivity shocks to keep output at its efficient level—while TIP should respond to markup and inflation expectation shocks, addressing the externalities those shocks create in firms’ pricing. This sharp division of labor arises because each instrument is best suited to a different source of inflation: monetary policy’s power lies in aggregate demand management, while TIP directly corrects the pricing externality. Under complete specialization, the first-best allocation with zero inflation and zero output gap can be implemented under any shock path.

Q3. Does TIP exacerbate relative price distortions?

In contrast with price controls, TIP is found not to exacerbate distortions in relative prices, because TIP is linear in price increases and symmetric across firms, so it does not prevent efficient relative price adjustments across sectors. In an extension with sector-specific TFP shocks requiring relative price adjustments, the paper shows analytically (under some conditions) and numerically (more generally) that TIP has no effect on relative prices across sectors. Firms that face negative productivity shocks moderate their price increases, while firms that otherwise would not change prices are incentivized to decrease them to earn a subsidy, keeping the relative price structure broadly intact.

Q4. How large are the stabilization gains from TIP?

Calibrated simulations show that the stabilization gains from using TIP alongside a Taylor rule are substantial: a reasonably calibrated TIP could lower the variance of inflation by 45% and of output by 44%, with gains especially large for markup and inflation expectation shocks. Welfare gains from TIP are smaller for TFP and demand shocks because the reduction in inflation volatility is partially offset by higher output gap volatility. These quantitative results are based on a calibrated New Keynesian model and are presented as illustrative magnitudes rather than precise empirical estimates.

Q5. What equivalent instruments does the paper consider?

The paper shows a formal equivalence between TIP, production/payroll subsidies (the more traditional tools for markup distortions), a feebate (combining a tax on price increases with a rebate to all firms), and a market for inflation permits. Subsidies can also implement the first best but entail large and persistent fiscal costs; the feebate provides incentives without increasing the average tax burden; the market for inflation permits (proposed by Lerner, 1978) minimizes fiscal authority involvement. TIP is distinguished from these alternatives by its directness and its non-distortionary effect on relative prices.

Key concepts

tax on inflation policy (TIP) : a fiscal instrument requiring firms to pay a tax proportional to the increase in their prices, designed to internalize the externality that individual firms’ price increases impose on aggregate inflation; first proposed by Wallich and Weintraub (1971). inflation externality : the divergence between private and social returns to a firm’s price increase created by markup or inflation expectation shocks; private returns include the markup gain, while social costs include the contribution to aggregate inflation, which TIP is designed to correct. complete specialization : the optimal policy regime in which monetary policy exclusively addresses demand and productivity shocks (by tracking the neutral rate) while TIP exclusively addresses markup and inflation expectation shocks; enables the first-best allocation. feebate : an instrument equivalent to TIP that combines a tax on price increases with a rebate distributed to all firms, providing anti-inflation incentives without increasing the average firm tax burden.

A Welfare Analysis of Policies Impacting Climate Change

Mon, 01 Jan 0001 00:00:00 +0000

This paper extends and applies the marginal value of public funds (MVPF) framework to evaluate the welfare consequences of 96 climate-related tax and spending policies in the United States. The MVPF is a benefit-cost ratio in which the numerator captures all benefits to individuals (measured by their willingness to pay) and the denominator captures net government costs; policies with higher MVPFs are better spending policies, while those with lower MVPFs are more efficient revenue-raising instruments.

The sample covers policies rigorously evaluated using quasi-experimental or experimental methods drawn from 18 major economics journals between January 1999 and December 2023. Policies fall into three primary categories: subsidies (wind production tax credits, residential solar, electric vehicles, hybrid vehicles, vehicle buybacks, appliance rebates, and weatherization), nudges and marketing, and revenue raisers (gasoline taxes, other fuel taxes, cap-and-trade). A selected set of international aid policies is also analyzed. The analysis applies a harmonized method for translating behavioral changes into emissions changes — using the EPA’s AVERT model for electricity-sector emissions — and a consistent set of externality valuations, including an EPA 2023 social cost of carbon (SCC) of $193 per ton of CO2 in 2020 (rising over time), with robustness checks at $76, $337, and $1,367.

The primary methodological contribution is a new sufficient statistics approach to quantifying learning-by-doing (LBD) externalities. When marginal cost of production is an isoelastic function of cumulative production and demand is an isoelastic function of price, the time path of production satisfies a second-order ordinary differential equation whose solution yields society’s willingness to pay for LBD spillovers. LBD generates two types of externalities: a price externality (lower future consumer prices) and an environmental externality (increased future take-up of clean goods). The approach requires four inputs: price elasticity of demand, elasticity of marginal cost with respect to cumulative production, cumulative production at the time of the subsidy, and product cost at the time of the subsidy.

The three main empirical findings are as follows. First, subsidies for production that directly displaces dirty electricity generation have the highest MVPFs. Wind production tax credits have an MVPF of 3.85 without LBD, rising to 5.87 with LBD. Residential solar subsidies have an MVPF of 1.45 without LBD, rising to 3.86 with LBD. EV subsidies have an MVPF of approximately 1.4 with LBD and approximately 1 without it. Consumer subsidies for appliances, weatherization, vehicle retirement, and hybrid vehicles have MVPFs around 1. Second, conservation nudges targeting electricity consumption can deliver MVPFs exceeding 5 in regions with relatively dirty electric grids, but fall below 1 in cleaner-grid regions such as California and the Northeast — and their effectiveness is expected to decline as grids decarbonize. Third, fuel taxes (gasoline, diesel, jet fuel) and cap-and-trade permit reductions are efficient revenue raisers, with nearly all having MVPFs below 1 and most below 0.7, reflecting the Pigouvian logic that current tax rates fall below the associated environmental externalities. Cap-and-trade permit reductions can produce MVPFs below zero, meaning revenue is raised while providing net positive welfare to individuals.

The paper also constructs three cost-per-ton metrics — resource cost per ton, government cost per ton, and social cost per ton — and shows they can yield substantively different and sometimes opposite rankings relative to each other and to the MVPF. For example, EV subsidies carry a government cost per ton of $1,356 (among the highest in the sample) yet an MVPF above most consumer subsidies, because that metric omits non-CO2 benefits including LBD effects. The scope of the analysis is US historical policy, with the MVPF comparison most informative when social welfare weights across beneficiary groups are treated as roughly equal.

Q: What is the MVPF framework and how does it differ from cost-per-ton analysis? A: The MVPF equals benefits to individuals (sum of willingness to pay) divided by net cost to the government. It is designed for a decision-maker maximizing social welfare subject to a budget constraint, whereas cost-per-ton metrics serve a decision-maker minimizing cost subject to a fixed CO2 reduction target. A higher MVPF means more welfare gain per dollar spent; a lower MVPF means less welfare cost per dollar of revenue raised.

Q: What are the three cost-per-ton definitions the paper distinguishes, and why do they differ? A: Resource cost per ton measures the economic resources consumed per ton of CO2 abated, independent of subsidy incidence; government cost per ton measures net government outlays per ton, omitting all non-CO2 benefits; social cost per ton subtracts non-CO2 benefits from government costs. For appliance rebates, these three values are -$2, $474, and an intermediate figure — a range that reflects whether inframarginal transfers and non-CO2 co-benefits are counted.

Q: What is the new methodological contribution regarding learning by doing? A: The paper derives a sufficient statistics result showing that when marginal production cost is an isoelastic function of cumulative production and demand is isoelastic in price, the time path of production follows a second-order ordinary differential equation. Solving this equation yields society’s willingness to pay for LBD spillovers from four observable parameters: demand price elasticity, the LBD elasticity of marginal cost with respect to cumulative production, cumulative production at the subsidy date, and unit cost at that date. This allows LBD benefits to be incorporated into both MVPF and cost-per-ton calculations without requiring a fully calibrated dynamic model.

Q: What LBD elasticities does the paper use, and where do they come from? A: Drawing on Way et al. (2022), a 1% increase in cumulative solar production is associated with a 0.319% price reduction; for wind the elasticity is 0.194%, and for EV batteries it is 0.421%. These are treated as the isoelastic parameter in the sufficient statistics formula.

Q: How does LBD affect the MVPF estimates for wind, solar, and EVs specifically? A: For wind production tax credits, the MVPF rises from 3.85 to 5.87 when LBD is included. For residential solar, it rises from 1.45 to 3.86. For EV subsidies, the MVPF rises from approximately 1 to approximately 1.4. Without LBD, EV subsidies are in line with other consumer subsidies; LBD is the primary reason EVs outperform that group.

Q: What is the baseline social cost of carbon used, and how sensitive are results to alternative values? A: The baseline SCC is $193 per ton of CO2 in 2020, following EPA 2023 guidance at a 2% discount rate. Robustness checks use $76, $337, and $1,367. Higher SCC values raise the MVPF of all subsidies in the sample, but the relative ordering — with wind PTCs above all other consumer subsidies — remains consistent across the full range.

Q: How are EV subsidies evaluated, and what accounts for their MVPF exceeding other consumer subsidies? A: The analysis uses the California EFMP program studied by Muehlegger and Rapson (2022), which finds a price elasticity of demand of -2.1 and 85% pass-through to consumers (15% captured by dealers). A $1 subsidy generates $0.85 in consumer WTP, $0.15 in dealer WTP, $0.17 in CO2 co-benefits, $0.05 in local pollution and accident co-benefits, offset by $0.10 in damages from increased electricity generation. Most benefits are non-environmental (inframarginal transfers and LBD effects on future vehicle prices), which is why the government cost per ton of $1,356 appears high while the MVPF is approximately 1.4.

Q: What drives the high MVPFs for nudges in dirty-grid regions, and what is the implication for the future? A: Conservation nudges in dirty-grid areas have high MVPFs (exceeding 5) because each kilowatt-hour of reduced consumption displaces generation from high-emission sources, amplifying the environmental benefit per dollar of program cost. In cleaner-grid regions like California and the Northeast, the same nudge displaces lower-emission generation, pushing the MVPF below 1. As grids decarbonize nationwide, the paper notes that nudge MVPFs will decline over time.

Q: How do cap-and-trade permit reductions compare to fuel taxes as revenue-raising instruments? A: Nearly all fuel taxes (gasoline, diesel, jet fuel) have MVPFs below 1, with most below 0.7, meaning they impose a welfare cost of only $0.70 per dollar of revenue raised. Cap-and-trade permit reductions can have MVPFs below zero, meaning they can raise revenue while simultaneously providing net positive welfare gains to individuals because environmental benefits from reduced emissions outweigh the permit costs borne by emitters.

Q: What do the international subsidy findings suggest, and what are their limitations? A: Subsidies for efficient charcoal cookstoves in Kenya (Berkouwer and Dean 2022) generate US-specific gains from CO2 reductions that are 37 times the net cost of the subsidy; including global benefits raises the MVPF to 323. However, the paper flags substantial uncertainty: estimated policy impacts vary widely within similar international categories, and the US-specific MVPF is highly sensitive to assumptions about the incidence of the social cost of carbon on US residents and US government tax revenue.

Q: Why does the social cost per ton metric give opposite rankings within wind, solar, and EVs relative to the MVPF? A: EVs have a social cost per ton of -$415 versus -$32 for wind PTCs, making EVs appear superior on that metric — the reverse of the MVPF ordering. The paper explains that when SCPT values are negative (policies that abate CO2 while also yielding positive non-CO2 net benefits), the metric loses its Lagrange multiplier interpretation: increased non-CO2 benefits make SCPT more negative while increased abatement makes it less negative, preventing meaningful cross-policy comparisons.

Q: What is the overall policy ranking implied by the MVPF analysis? A: From highest to lowest MVPF: international clean energy subsidies > wind production tax credits > residential solar subsidies > energy conservation nudges (dirty grids) > EV subsidies > consumer appliance and weatherization subsidies > hybrid vehicle subsidies > vehicle buyback rebates > energy conservation nudges (clean grids) > revenue raisers (gas taxes, fuel taxes, cap-and-trade). The paper notes that shifting $1 of government revenue from gas taxes (MVPF ~0.67) to wind PTCs (MVPF ~5.87) generates $5.20 in net welfare benefits to individuals, assuming equal social welfare weights across groups.

Marginal Value of Public Funds (MVPF): A benefit-cost ratio equal to the sum of individuals’ willingness to pay for a policy divided by its net cost to the government. Policies with higher MVPFs deliver greater welfare gains per dollar spent; those with lower MVPFs impose lower welfare costs per dollar of revenue raised. Used to compare spending and revenue-raising policies on a common welfare-maximizing basis.

Learning-by-Doing (LBD) Externality: The spillover by which current production of a technology lowers its future marginal cost, generating future consumer surplus (price externality) and additional future uptake with associated environmental benefits (environmental externality). Treated in this paper as an uninternalized external benefit of subsidizing current production.

Sufficient Statistics Approach to LBD: The paper’s methodological contribution — showing that when marginal cost is an isoelastic function of cumulative production and demand is isoelastic in price, the LBD welfare benefit can be computed from four observables: the demand price elasticity, the LBD cost elasticity, cumulative production at subsidy date, and unit cost at subsidy date, without requiring a fully specified dynamic model.

Resource Cost per Ton (RCPT): Economic resources consumed to produce and use a product, divided by tons of CO2 abated. Appropriate for private firms minimizing abatement cost; independent of subsidy take-up rates and inframarginal transfers.

Government Cost per Ton (GCPT): Net government outlay per ton of CO2 abated. The correct metric for a government focused exclusively on CO2 reduction at minimum fiscal cost; omits all non-CO2 welfare impacts, including co-benefits and LBD effects.

Social Cost per Ton (SCPT): Government cost net of all non-CO2 benefits, per ton of CO2 abated. Intended to capture the social cost of abatement, but loses its Lagrange multiplier interpretation when values are negative, preventing valid cross-policy comparisons in that region.

Social Cost of Carbon (SCC): The monetized damage from one additional ton of CO2 emissions. Baseline value of $193 per ton in 2020 from EPA 2023 at a 2% discount rate, rising over time. A key parameter driving MVPF levels across all policy categories; robustness checked at $76, $337, and $1,367.

Pigouvian Efficiency of Environmental Taxes: The paper quantifies that fuel taxes have MVPFs below 0.7 because current tax rates fall below the associated Pigouvian optimum — i.e., taxing polluting goods raises revenue while reducing a pre-existing negative externality, so the welfare cost of the revenue is less than one dollar per dollar raised.

Additionality and Asymmetric Information in Environmental Markets: Evidence from Conservation Auctions

Mon, 01 Jan 0001 00:00:00 +0000

This paper investigates the problem of additionality — the likelihood that a conservation action is marginal to (i.e., caused by) an incentive — in the United States Department of Agriculture’s Conservation Reserve Program (CRP), one of the largest and most mature Payments for Ecosystem Services (PES) mechanisms in the world. The CRP pays landowners $1.6–$1.8 billion per year under 10-year contracts to retire cropland and plant grass mixes, trees, or wildlife habitats, using a discriminatory scoring auction in which landowners submit bids on a menu of heterogeneous contracts ranked by a scoring rule.

The central argument is that additionality represents a form of asymmetric information. Landowners possess private knowledge about their counterfactual land use (whether they would have conserved anyway), while the auction screens only on their private cost of accepting the contract. Because lower-cost landowners are lower-cost partly because they expect to conserve regardless of the CRP, cost and additionality are positively correlated — generating adverse selection: the least costly participants to purchase are the least socially valuable. The status quo scoring rule implicitly assumes all landowners are fully additional (tau = 1), an assumption the paper tests and rejects.

The authors construct a dataset linking confidential administrative CRP bid data across seven auctions from 2009 to 2021 to satellite-derived land use classifications from the Cropland Data Layer (30m resolution) and USDA administrative land use reports. They exploit a regression discontinuity (RD) in contract awards around the winning score threshold to estimate the causal effect of CRP contracts on land use at the margin. The first-stage is close to one. The key finding is that CRP contracts reduce cropping by approximately eight percentage points at the margin, but the 100%-additional benchmark predicts a reduction of roughly 33 percentage points (matching the share of land covered by a contract at the margin). Therefore, only approximately one quarter (22–29%) of marginal auction winners are additional — meaning three-quarters would have conserved without the CRP contract.

To test for adverse selection, the authors use the 82% of rejected bidders in the 2016 auction (the most restrictive) for whom counterfactual land use is observed, constructing a landowner-specific additionality measure. They document a systematic positive correlation between bid rental rates (reflecting higher costs) and additionality, which persists conditional on rich observable characteristics including prior land use interacted with soil productivity. Contract choice further reveals additionality: tree-related contract bidders exhibit substantially lower additionality than base grassland contract bidders.

To quantify welfare implications, the authors develop and estimate a joint structural model of bidding and additionality. Costs are inferred via revealed preferences in optimal bidding (following the empirical auctions literature), and additionality is estimated as a conditional expectation function of observable characteristics and unobserved costs, matched to observed land use among rejected bidders via Method of Simulated Moments. Social benefits are taken from the CRP literature and USDA revealed preferences.

Key welfare findings: (1) Despite widespread non-additionality and adverse selection, a hypothetical uniform-price market for the base conservation contract generates social welfare gains of $14.37 per acre-year at the socially-optimal price. Setting price equal to the full social benefit B — ignoring counterfactual land use — causes welfare losses of $12.68 per acre-year, nearly eliminating the gains. (2) The status quo auction generates social welfare gains of approximately $120 million per auction relative to no market, but implements only 12% of the gains achievable under the efficient allocation. (3) Simple modifications to the scoring rule that incorporate expected additionality — via uniform adjustments and market-size reductions — close 37% of the gap between the status quo and the efficient allocation, increasing social welfare by over $300 million per auction. Nearly all gains arise from incorporating additionality into the scoring rule. These modifications are described as implementable by the USDA in practice.

Q: What is additionality, and why does it matter for conservation markets? A: Additionality is defined as the expected impact of contracting on a landowner’s conservation action — i.e., the probability that a landowner would not have conserved absent the incentive. Social surplus depends on both a landowner’s cost of accepting a contract and her additionality, but market mechanisms screen only on cost. When the lowest-cost participants are the least additional, standard procurement mechanisms fail to implement the efficient allocation, undermining the environmental and fiscal effectiveness of conservation programs.

Q: What is the rate of additionality at the margin of CRP contract awards? A: Approximately one quarter (22–29% depending on specification) of marginal auction winners are additional. The RD design shows contracts reduce cropping by about eight percentage points at the margin, compared to the 100%-additional benchmark of approximately 33 percentage points (the share of land covered by the contract at the margin). This implies three-quarters of marginal winners would have conserved without a CRP contract.

Q: What is the empirical evidence for adverse selection? A: Among rejected bidders in the 2016 auction — where additionality is directly observed for 82% of bidders — there is a systematic positive correlation between bid rental rates (reflecting higher costs of accepting the contract) and additionality. This correlation persists conditional on rich observable characteristics, including prior land use interacted with soil productivity estimates. Contract choice also reveals additionality: bidders selecting tree-related contracts have substantially lower additionality than those choosing base grassland contracts.

Q: How does soil productivity relate to additionality? A: USDA-constructed soil productivity estimates, which approximate the earning potential of a parcel, are predictive of additionality in practice, consistent with theory. Higher soil productivity is associated with lower additionality — landowners with less productive land are more likely to conserve regardless of the CRP. Soil productivity is not currently incorporated into the CRP scoring rule to rank bidders.

Q: How is the RD design validated? A: The histogram of normalized score distributions shows no bunching at the winning threshold, validating that bidders do not know the exact ex-post threshold realization. Pre-period RD coefficients are indistinguishable from zero in both the remote sensing and administrative land use data. The first stage (share of bidders with a CRP contract just above the threshold) is close to one. Treatment effect magnitudes are stable over the 10-year contract period with no evidence of attenuation, and there are no spillovers to non-bid fields.

Q: What do the social welfare calculations show for a uniform-price market? A: Despite widespread non-additionality and adverse selection, a hypothetical uniform-price market for the base conservation contract generates social welfare gains of $14.37 per acre-year at the socially-optimal uniform price. However, setting price equal to the full social benefit B — as the status quo implicitly does by assuming tau = 1 — causes welfare losses of $12.68 per acre-year, nearly eliminating all gains.

Q: How does the status quo auction perform relative to the efficient benchmark? A: The status quo auction generates social welfare gains of approximately $120 million per auction relative to no market. The efficient allocation, which awards contracts based on both landowner costs and expected social benefits (incorporating additionality), would be substantially larger. The status quo implements only 12% of the social welfare gains achievable under the efficient allocation.

Q: Can the efficient allocation be implemented by any mechanism? A: Not necessarily. Implementing the efficient allocation requires that the expected net social surplus function B·tau(c) - c be monotonically decreasing in cost, so that a standard incentive-compatible auction can rank bidders appropriately. If lower-cost landowners are sufficiently less additional that the allocation rule is non-monotone in cost, no incentive-compatible mechanism can implement the efficient allocation (per Myerson 1981). Empirically, the authors find that for the base contract the efficient allocation is in the implementable case (similar to their Figure 1a), but implementing it exactly via an incentive-compatible auction remains complex.

Q: What alternative auction designs are proposed, and how much do they improve welfare? A: The authors propose alternative scoring rules that incorporate expected additionality — through uniform adjustments to the scoring rule, reductions in market size, and differentiation among heterogeneously additional landowners based on observables such as soil productivity and contract choice. These simple modifications close 37% of the gap between the status quo and the efficient allocation, increasing social welfare by over $300 million per auction. Nearly all gains come from incorporating additionality into the scoring rule, with a large share accruing through simple uniform adjustments.

Q: How is the structural model of bidding estimated? A: Estimation proceeds in three steps. First, beliefs about the winning score threshold distribution are estimated by simulating auctions via resampling (following Hortacsu 2000). Second, landowner costs are estimated via Maximum Simulated Likelihood using revealed preference inequalities from optimal bidding in the scoring auction. Third, the additionality conditional expectation function is estimated via Method of Simulated Moments, matching observed additionality levels, its distribution across rejected bidders, its covariance with scores, and its distribution by contract choice.

Q: What sources of scoring rule variation identify the model? A: Three sources are used. A mid-mechanism policy change in the 2021 auction added carbon sequestration payments differentially across contracts, providing two bids from the same bidders under different scoring rules. A policy change around 2011 shifted Wildlife Priority Zone (WPZ) bonus points to be contract-specific. Air Quality Zone (AQZ) status shifts the level of the score. These sources provide variation in relative payments across contracts, though the authors note the variation is modest and rely also on parametric extrapolation.

Q: What assumptions are required for identification and how robust are results? A: Key assumptions include perfect compliance (validated by inspection of over 1,000 aerial photographs), no spillovers to non-bid fields (validated in Table 2), and stability of the additionality function tau(z,c,kappa) across auction years. The authors assess robustness to alternative functional forms of tau, conduct a non-parametric inversion exercise across cost quantiles, and construct alternative scoring rules using cross-auction and cross-tract variation to probe the stability assumption. Model-implied additionality at the RD margin (23%) closely matches the empirical RD estimate.

Q: Are the adverse selection and additionality findings specific to the 2016 auction? A: The 2016 auction provides the most complete view because bid fields are observed and 82% of bidders are rejected. But cross-auction evidence replicates the core patterns. RD estimates exploiting threshold variation across auctions show additionality ranging from 10–20% among lower bidders to 40–50% among higher bidders across auctions, consistent with adverse selection. Tree-contract null RD effects replicate across all auctions. Cross-tract cropping rates show similar observable heterogeneity across auctions.

Q: What is the social welfare impact of the market for conservation existing at all? A: Theoretically ambiguous because non-additional landowners may receive transfers without generating social value, and adverse selection may tilt the market toward low-additionality participants. Empirically, despite these concerns, there exist positive social welfare gains of $14.37 per acre-year at the socially-optimal uniform price for the base contract, indicating that conservation markets of this type can improve welfare even in the presence of substantial non-additionality and adverse selection.

Additionality: The expected impact of contracting on a landowner’s conservation action — formally, tau(c) = E[1 - a_i0 | c = c_i], the probability that a landowner would not have conserved absent the incentive. A landowner is additional if she would have cropped without the CRP contract; the social benefit of contracting depends only on this incremental conservation impact.

Adverse Selection: The positive correlation between landowner cost of accepting a contract and additionality. Because landowners with low costs are low-cost partly because they expected to conserve regardless of the program, lower-cost participants are less socially valuable. This upward-sloping contract value curve mirrors adverse selection in insurance markets as modeled by Einav, Finkelstein, and Cullen (2010).

Contract Value Curve: The function B·tau(F^{-1}_C(q)) plotting the expected social value of contracting at each quantile q of the cost distribution. It lies below the social benefit B due to non-additionality and slopes upward due to adverse selection. The vertical distance between the contract value and marginal cost curves equals expected social surplus B·tau(c) - c.

Efficient Allocation: The allocation that maximizes expected social surplus B·tau(c) - c by awarding contracts to landowners for whom this quantity is positive. Implementing this allocation via an incentive-compatible mechanism requires that B·tau(c) - c be monotonically decreasing in cost; if not, no standard mechanism can achieve it.

Scoring Rule: The known function s(b_i, z^s_i) that converts a landowner’s multi-dimensional bid (rental rate and contract choice) and observed characteristics into a score, determining contract awards. The status quo scoring rule implicitly assumes full additionality (tau = 1), ranking bidders as if all conservation actions are marginal to the incentive.

Source Text Origin: The classification of the text on which a summary is based — “pdf” or “oa-html” for full working paper text, or “abstract-only” which is blocked from summarization. Determines the validity and completeness of any summary produced.

Balance-Sheet Policy and the Term Premium: High-Frequency Evidence

Mon, 01 Jan 0001 00:00:00 +0000

When a central bank shrinks its balance sheet, how much do long-term interest rates actually move — and through which channel? Using minute-by-minute market data around balance-sheet announcements, the authors estimate that much of the long-rate response works through the term premium rather than through changed expectations of future short rates. The result is an estimate for their 2009–2024 sample under their identifying assumptions — evidence consistent with a term-premium channel, not a universal constant.

Summary of a forthcoming paper, AI-assisted and human-reviewed. See the linked original for the authoritative claims and full conditions.

In depth

Q1. Does balance-sheet policy move long rates through the term premium or through expected short rates?

The paper estimates that a substantial share of the long-rate response operates through the term premium, with a smaller role for revised short-rate expectations — though it frames this as identification within their window, not a structural decomposition that holds in all regimes. This sits against a literature that has split the response into a signaling channel and a portfolio-balance channel; the contribution here is using intraday yields to isolate the announcement effect from contaminating macro news.

Q2. How is the effect identified, and why high-frequency?

By measuring yield changes in narrow windows around scheduled balance-sheet announcements, so that other macroeconomic news is unlikely to move rates within the window. The maintained assumption is that within a tight enough window, the announcement is the dominant shock — a standard high-frequency identification premise. The authors note the assumption is weaker around unscheduled communications, and restrict the main sample accordingly.

Q3. What does this imply for the pace of balance-sheet runoff?

If transmission runs through the term premium, the pace and predictability of runoff plausibly matter for long rates — but the paper presents this as suggestive, stopping short of a calibrated policy rule. The reading is that quantity and communication interact, consistent with prior work on announcement effects, rather than that runoff has a single mechanical effect on yields.

Key concepts

term premium: The extra return investors require for holding a long-term bond instead of rolling over short-term ones — here, the part of long rates not explained by expected future short rates.
balance-sheet policy: A central bank changing the size or composition of its asset holdings (expansion via purchases, runoff via “quantitative tightening”) as a policy tool distinct from setting the short-term rate.
high-frequency identification: Inferring a policy action’s effect from price moves in a very short window around the announcement, on the assumption that little else moves markets inside that window.

Bottom-Up Markup Fluctuations

Mon, 01 Jan 0001 00:00:00 +0000

Layer 1 — Overview

Research Question

The paper asks how firm-level, sector-level, and aggregate markups comove with output at different levels of aggregation, and whether a single structural model can reconcile seemingly contradictory empirical findings about markup cyclicality that arise when researchers use different aggregation schemes.

Model

The authors build a granular macroeconomic model featuring oligopolistic competition with a nested constant-elasticity-of-substitution (CES) demand structure following Atkeson and Burstein (2008). The economy contains N sectors, each with a discrete number of firms competing under Cournot oligopoly with flexible prices. Firm-level markups are endogenously increasing in within-sector market shares: under Cournot, the sectoral markup is a simple function of the sector’s Herfindahl-Hirschman index (HHI), and the aggregate markup is a function of the expenditure-share-weighted average of sectoral HHIs. Firm-level productivity follows a discretized random growth (Gibrat’s law) process as in Carvalho and Grassi (2019), generating fat-tailed firm-size distributions and granular aggregate fluctuations. The baseline calibration features only idiosyncratic firm-level productivity shocks and abstracts from aggregate shocks, because—in the model—aggregate shocks that move all firms proportionately do not affect relative market shares and hence do not affect markups.

Data

The empirical analysis uses French administrative firm-level data from the FICUS-FARE datasets covering the universe of French firms from 1994 to 2019, yielding approximately 9.38 million firm-year observations across 26 years, 22 two-digit sectors, and 275 five-digit NAF sectors. Firm-level markups are estimated following De Loecker and Warzynski (2012) using a translog production function estimated by GMM (following De Ridder et al. 2024) on a subsample of approximately 220,733 firm-year observations where physical output quantity is available from the Enquete Annuelle de Production survey (2009-2019). Using quantity rather than revenue as the output measure avoids the measurement biases documented in Bond et al. (2021).

Main Findings and Quantitative Magnitudes

Markup-market-share relationship (firm level): Regressions of the change in the inverse firm markup on the change in firm market share yield a negative and significant coefficient of approximately -0.268 to -0.293 (depending on fixed-effect specification), consistent with the model prediction that markups rise with market share. Sector-level analogues yield a slope of the change in inverse sector markup on the change in sector HHI of approximately -0.37, which is simultaneously a calibration target (implying sigma = 1.8 given epsilon = 5) and an empirical moment the model closely matches (model counterpart: -0.36).
Within-between decomposition of sector markup changes: In the model under Cournot competition, changes in firm-level markups (the “within” term) account for exactly 50% of changes in sector-level markups, with between-firm reallocation accounting for the other 50%. In the French data, for the median sector, the within term accounts for 59% of changes in sector markups (interquartile range across sectors: 34%-81%).
Firm-level markup cyclicality with sector output (heterogeneous by size): The average firm’s markup is countercyclical with respect to own-sector output (beta_1 approximately -0.073 in levels specification), but this relationship reverses for large firms: firms with market shares roughly above 10% (top 0.1% of the market-share distribution) have procyclical markups (interaction coefficient beta_2 approximately 0.574 in levels). The model qualitatively and roughly quantitatively reproduces this heterogeneity.
Sector-level markup cyclicality with sector output (procyclical): Following Nekarda and Ramey (2013), sector markup changes comove positively and significantly with sector output changes: estimated coefficient of 0.160 (standard error 0.040) in first-differences. The calibrated model yields a median coefficient of 0.139 (std dev 0.057 across 5,000 simulated 25-year samples), close to the data. Consistently, sector concentration (HHI) is also procyclical with sector output (estimated coefficient 0.332, std error 0.067 in first-differences).
Sector-level markup cyclicality with aggregate output (acyclical to weakly countercyclical): Following Bils et al. (2018), the comovement between sector markups and aggregate output is fragile in sign and significance: the French data yields a point estimate of -0.239 (std error 0.116) in first-differences, marginally significant (t-stat 2.06) and with sign sensitive to detrending method. The model without aggregate shocks predicts positive comovement (median coefficient 0.165) that is not statistically different from zero across samples. Adding aggregate productivity shocks (calibrated to match French aggregate output volatility) brings the model-implied coefficient close to zero (median 0.008), with 20-30% of 25-year simulated samples displaying countercyclical sectoral markups relative to GDP—consistent with the ambiguity in the data.
Aggregate output volatility: The baseline calibration with only granular firm-level shocks generates a standard deviation of detrended aggregate output of 0.83%, equal to 26% of the 3.16% observed in the French data. (The comparable granular ratio from Carvalho and Grassi 2019 for a perfectly competitive US model is 30%.) Variable markups dampen granular aggregate volatility: the standard deviation of aggregate output under variable markups is 0.87 times that under heterogeneous-but-constant markups (95% CI: 0.82-0.97), because incomplete pass-through reduces the effective weight of large firms in the price index.
Aggregate markup volatility: In the data, the relative standard deviation of aggregate markup to aggregate output is 0.40-0.50 (depending on detrending). The model generates a relative volatility of 0.36 (median across samples). The correlation between aggregate markup and output in the data is at most 0.06; the model without aggregate shocks implies a counterfactually large median correlation of 0.91, which falls to 0.27 when aggregate TFP shocks are superimposed (with 16% of 25-year samples displaying countercyclical aggregate markups).

Scope Conditions

Results pertain to French private-sector firms (including formerly government-owned firms, most of which privatized during the sample period) across manufacturing and some non-manufacturing sectors at the national-market level. The analysis abstracts from import competition (market shares are computed relative to all French firms in the sector), local geographic markets (relevant for non-tradeable goods where national-level shares understate local concentration), and multi-product firm structure. Findings are for a flexible-price model driven by idiosyncratic productivity shocks; the paper explicitly discusses how nominal rigidities would further strengthen procyclicality at the sector level.

In depth

Q1. What is the central mechanism by which granular firm-level shocks generate markup cyclicality?

A: Because markups are endogenously increasing in within-sector market shares under oligopolistic competition, a firm that receives a positive productivity shock gains market share and therefore raises its markup, while its competitors lose market share and lower their markups. The net effect on the sectoral markup depends on the shocked firm’s initial size: a positive shock to a sufficiently large firm (above a threshold market share) raises the sectoral markup, while a positive shock to a small firm lowers it. Since sectoral expansions in a granular economy are disproportionately driven by large firms, sector output and sector markup tend to comove positively in the medium run.

Q2. Why does the sign of markup cyclicality differ depending on the level of aggregation?

A: Sector-level markups react only to within-sector idiosyncratic shocks, so sectors that happen to be driven by large-firm booms display positive comovement between sector markup and sector output. However, a given sector’s markup is uncorrelated with aggregate output movements coming from other sectors. In small samples (such as 25-year windows), whether a sector’s markup comoves positively or negatively with aggregate output depends on whether the sector happens to lead or lag the aggregate cycle. Over sufficiently long samples, the model implies positive comovement of sector markups with aggregate output, but in finite samples the relationship is indeterminate. This asymmetry across aggregation levels explains why researchers using different reduced-form specifications in the same dataset can reach opposing conclusions about procyclicality versus countercyclicality.

Q3. What is the within-between decomposition of sectoral markup changes and what does it imply quantitatively?

A: Changes in the inverse sectoral markup can be decomposed into (i) a within term—changes in firm-level markups holding market shares fixed—and (ii) a between term—changes in market shares holding firm-level markups fixed. Under Cournot competition, the within and between terms are analytically equal in every period, so each accounts for exactly 50% of the change in sectoral markups; this 50-50 split holds globally (not only to first order). In the French data, for the median sector, within-firm markup changes account for 59% of sector markup changes (interquartile range across sectors: 34%-81%), close to but slightly above the model’s 50% prediction.

Q4. How do variable markups affect granular aggregate output volatility relative to a model with constant markups?

A: Variable markups (endogenous pass-through that is decreasing in firm size) reduce granular aggregate output volatility relative to a model where markups are heterogeneous but fixed. The intuition is that larger firms have lower pass-through rates, so their productivity shocks translate into smaller price changes and therefore smaller output responses than they would under constant markups—effectively reducing the weight of large firms in the aggregate price index in a way similar to a decline in market concentration. Quantitatively, using first-order approximations around equilibrium distributions from the calibrated model, the standard deviation of aggregate output under variable markups is 0.87 times that under heterogeneous-but-constant markups (95% confidence interval: 0.82-0.97). The overall standard deviation under variable and heterogeneous markups is only 1.02 times that under homogeneous and constant markups (95% CI: 0.99-1.14), meaning markup heterogeneity and variability together have limited net effects on aggregate output volatility.

Q5. What does the model predict for firm-level markup cyclicality, and how heterogeneous is this across firm size?

A: Proposition 4 states that, in the asymptotic limit, firm-level markups comove positively with own-sector output for firms with market shares above a threshold, and negatively for firms below it. This occurs because large firms have a disproportionate impact on sector-level price and output (when the product of market share and pass-through rate is increasing in size), so large-firm shocks simultaneously drive sector expansions and raise large-firm markups while compressing small-firm markups. In the French data, the average firm’s markup is countercyclical with respect to sector output (beta_1 approximately -0.073 in log-levels with firm and year fixed effects), but firms with market shares above roughly 10% (top 0.1% of the distribution, since the average market share is only 0.07%) display procyclical markups (interaction coefficient beta_2 approximately 0.574). The model reproduces this qualitative pattern and the order of magnitude of these estimates.

Q6. How does the paper calibrate the key demand elasticities, and what are the resulting pass-through implications?

A: The within-sector substitution elasticity is set to epsilon = 5, a standard value. The cross-sector substitution elasticity sigma is calibrated to match the slope of the inverse sector markup on sector HHI in first-differences. The empirical slope is -0.37; under the model, the slope equals -(epsilon/sigma - 1)/(epsilon - 1), and given epsilon = 5, sigma = 1.8 delivers a model counterpart of -0.36. These parameter values imply own-cost pass-through rates that are decreasing in firm size; for large firms (with market share >= 57%, approximately the top 0.004% of the distribution), the implied pass-through rate is 0.63, within the confidence intervals reported in Amiti, Itskhoki, and Konings (2019) for large Belgian firms.

Q7. Why do aggregate productivity shocks not affect markups in the model, and what are the implications for aggregate markup cyclicality?

A: In the model, firm-level markups are functions of within-sector market shares, not the level of productivity. An aggregate shock that shifts all firms’ productivity proportionately leaves relative market shares unchanged and therefore leaves all markups unchanged. This means aggregate shocks increase aggregate output volatility but leave markup volatility unchanged, reducing the correlation between aggregate markup and aggregate output. When aggregate TFP shocks are added to match French aggregate output volatility, the model-implied median correlation between aggregate markup and output falls from 0.91 (without aggregate shocks) to 0.27 (with aggregate shocks), while 16% of 25-year simulated samples display countercyclical aggregate markups—more consistent with the weak and fragile empirical relationship.

Q8. How does the paper address the potential measurement-error bias in the negative correlation between markups and marginal costs?

A: Since marginal cost is computed as price divided by estimated markup, regressing market shares or markups on marginal costs risks spurious correlation via measurement error in the markup (which appears in both sides). The authors address this concern by constructing an instrumental variable for marginal cost based on firm-specific energy intensity interacted with energy price changes, following Ganapati, Shapiro, and Walker (2020). Table A10 confirms that instrumenting for marginal cost yields negative effects on both markup and market share with larger point estimates than the OLS specifications in Table 4, validating the baseline findings.

Q9. Is the 50-50 within-between decomposition of sectoral markup changes robust to the choice of competition mode?

A: No. The exact 50-50 split of within and between terms in sectoral markup changes is a specific property of Cournot competition and holds globally (not just as a first-order approximation). Under Bertrand competition, the within and between terms are generally not equal to each other. The paper derives analytic results under both competition modes and focuses on Cournot for quantitative work because it generates more markup variation and better matches the estimated pass-through rates and markup-size relationship.

Q10. What do model simulations imply for the magnitude and cyclicality of aggregate markups versus the data, and what is the role of variable versus constant markups?

A: In the data (detrended), the standard deviation of aggregate markup is 1.27% with a relative volatility (to output) of 0.40 and a correlation with output of 0.03. The baseline model with only granular shocks yields a median markup standard deviation of 0.30%, relative volatility of 0.36, and correlation with output of 0.91. The model with aggregate shocks added yields median markup standard deviation of 0.30%, relative volatility of 0.09, and correlation of 0.27. Counterfactually fixing markups at their initial heterogeneous levels while keeping the same market shares and shock variance yields aggregate markup standard deviation approximately 0.93 times the variable-markup value (standard deviation of markups under variable markups is 1.08 times that under constant markups, with a 95% CI of 1.00-1.18), and a correlation with output of 0.92 versus 0.87 under variable markups. Overall, the magnitude and cyclicality of aggregate markups are not substantially different between variable and constant-markup specifications.

Q11. How does the paper reconcile its findings with prior literature on markup cyclicality (Bils et al. 2018 vs. Nekarda and Ramey 2013)?

A: Nekarda and Ramey (2013) find procyclical sector markups with respect to sector output in US data—a result replicated in French data (beta approximately 0.160). Bils, Klenow, and Malin (2018) find countercyclical sector markups with respect to aggregate output in US data. Both results can be generated simultaneously in the model: sector markups are positively correlated with own-sector output because granular booms in a sector are driven by large-firm expansions that raise sector markups; however, a given sector’s markup is weakly and ambiguously correlated with aggregate output because aggregate fluctuations reflect shocks across many sectors, only some of which are in the same sector. The model can therefore simultaneously predict procyclicality with respect to sector output and an acyclical-to-weakly-countercyclical relationship with aggregate output—explaining why both empirical findings can be correct.

Q12. What are the data limitations and how do they affect the interpretation of results?

A: Three limitations are noted. First, market shares are computed relative to total revenue of all French firms in the sector without accounting for imports, so foreign competition is ignored and domestic concentration may be overestimated. Second, revenues are reported at the national level, so for non-tradeable goods (whose relevant market is local) the paper underestimates true local market concentration, attenuating the markup-concentration relationship in those sectors. Third, the model abstracts from entry and exit (the number of firms per sector is held fixed at sector-year averages), though Appendix D demonstrates robustness of main empirical results to restricting the sample to continuing firms.

Key Concepts

Granular macroeconomic model: A model in which the economy consists of a finite (large but discrete) number of firms, so that idiosyncratic firm-level shocks to large firms do not average out and instead generate aggregate fluctuations. In the paper’s usage, granularity means that sectoral and aggregate business-cycle fluctuations are driven primarily by shocks to the largest firms, which also have the highest markups and market shares.

Nested CES demand structure (Atkeson-Burstein): A two-level constant-elasticity-of-substitution aggregation where the final good aggregates N sectors with cross-sector elasticity sigma, and each sector aggregates the output of its Nk firms with within-sector elasticity epsilon > sigma. This structure generates firm-level markups that are endogenously increasing in within-sector market shares (under both Cournot and Bertrand competition) and yields closed-form expressions for sector-level markups as a function of sector HHI and aggregate markups as a function of the expenditure-weighted average of sector HHIs.

Markup elasticity with respect to market share (Gamma_ki): Under Cournot competition, the semi-elasticity of firm i’s log markup with respect to its log market share, equal to (epsilon/sigma - 1)s_ki / (epsilon/(epsilon-1) - (epsilon/sigma - 1)s_ki). This is strictly positive for epsilon > sigma and increasing in market share, implying that larger firms have markups that are more responsive to changes in their competitive position.

Pass-through rate (alpha_ki): The fraction of an idiosyncratic cost shock that is passed into the firm’s price relative to the sectoral price index, given by 1/(1 + (epsilon-1)Gamma_ki). Pass-through is decreasing in market share (larger firms have lower pass-through), which dampens their price response to own shocks and mutes the impact of large-firm shocks on aggregate price volatility—acting like a reduction in market concentration.

Within-between decomposition of sector markup changes: The change in inverse sector markup decomposed into (i) a within term measuring changes in firm-level markups holding market shares fixed, and (ii) a between term measuring reallocation of market shares across firms with heterogeneous markups. Under Cournot competition, these two terms are exactly equal (each 50%) for any firm-level shocks—a result that holds globally (not merely as a first-order approximation)—because the forces that increase the within term (higher markup sensitivity) also raise heterogeneity between firms (increasing the between term).

Sectoral markup (mu_kt): Defined as the ratio of sectoral revenues to total wage payments in the sector, equal to the harmonic mean of firm-level markups weighted by market shares. Under Cournot competition, this is a simple increasing function of the sector’s HHI: mu_kt = (epsilon/(epsilon-1))[1 - (epsilon/sigma - 1)/(epsilon-1) x HHI_kt]^(-1). This mapping between concentration and the markup price-cost wedge gives the central empirical prediction tested at the sector level.

Markup cyclicality (at different aggregation levels): The comovement between markups and output, which the paper distinguishes sharply across three levels: (i) firm markup vs. own-sector output—countercyclical for small firms, procyclical for large firms; (ii) sector markup vs. own-sector output—procyclical (positive covariance) under conditions proven in Proposition 3; (iii) sector markup vs. aggregate output—theoretically positive over long samples but ambiguous and close to zero in short samples, because aggregate output also reflects shocks to other sectors whose markups are uncorrelated with the focal sector’s markups. The paper’s central insight is that the same underlying model generates all three empirical patterns simultaneously.

Central bank communication by ??? The economics of monetary policy leaks

Mon, 01 Jan 0001 00:00:00 +0000

Layer 1 — Overview

Research Question

This paper investigates the economics of monetary policy leaks — anonymous disclosures of confidential information by insiders to the media — focusing on three central questions: (1) Are leaks random accidents, strategic individual disclosures, or institutionally authorized “plants”? (2) Do leaks shape public (financial market) views, and by how much? (3) Can attributed (named) communication by central bank officials mitigate the effects of leaks?

Data and Setting

The authors study the Eurosystem (ECB and euro area National Central Banks) over January 2002 to December 2021. Their primary data source is a novel database of 368 unique policy-relevant leaks — assembled by manually filtering and classifying more than a million news items from Reuters, Bloomberg, and Market News International archives — with precise minute-level timestamps. Topics covered include: policy rates (178 leaks), unconventional monetary policy/UMP (207 leaks), economic growth (47), inflation (41), and euro exchange rate (36); individual leaks may cover multiple topics. They complement this with a dataset of 7,883 attributable public statements by ECB Governing Council members, identified via keyword filtering and machine learning classification of the Reuters News Archive.

Methodology

The paper employs four main empirical strategies. First, high-frequency event studies using asymmetric windows (5 minutes before to 30 minutes after an event) compare absolute market reactions in OIS rates across the full term structure (3M to 10Y) and in the EURO STOXX 50 across leaks, 5,000 randomly sampled placebo events, and attributable statements. Second, Poisson regression models relate the number of leaks per policy meeting to proxies for Governing Council disagreement (Italian-German sovereign yield spread, inter-quartile range of national inflation rates, number of attributable statements per meeting) and a dummy for quarterly macroeconomic projection releases. Third, a regression framework tests whether leaks move market expectations toward the subsequent policy outcome — identifying whether leaks are informative about the direction of policy. Fourth, an augmented version of the Tillmann (2021) model relates end-of-day changes in longer-term OIS rates to high-frequency monetary policy surprises, interacted with dummies for post-announcement leaks and attributable statements.

Main Findings

Incidence and timing. The number of Eurosystem leaks peaked at 36 in 2019 (more than four per policy meeting on average) before declining by more than one third following the start of Christine Lagarde’s presidency in November 2019. Leaks cluster around policy meetings and, since 2015, have shifted notably from before meetings to after meetings, a shift driven by leaks related to UMP. Leaks occur even during the ECB’s quiet period, when policy-makers are formally restricted from public statements on policy-sensitive topics.

Leaks are not accidents. Poisson regressions reveal that the number of leaks per meeting is significantly and positively associated with proxies for Governing Council disagreement: every additional percentage point in the Italian-German sovereign yield spread is associated with approximately half an additional leak per meeting. The propensity of a policy change increases by four to six percentage points with each additional pre-meeting leak (statistically significant at the 5% or 10% level). The specification explains around 15% of the variation in leak counts.

Market impact. Market movements around leaks are up to 85% larger than those around placebo events. Leaks trigger market reactions that are consistently larger than those of attributable statements by individual Governing Council members across the entire OIS term structure and in equities — a result robust to controlling for distance to policy meetings. Rate leaks mainly move the short and medium end of the yield curve; UMP leaks affect the long end and equities. Leaks about general economic conditions (growth, inflation, exchange rate) produce little statistically significant market response.

Leaks are uninformative about policy direction. Conditional on a pre-meeting leak occurring, the average leak does not move market rates closer to the levels prevailing directly after the subsequent policy announcement. By contrast, attributable statements systematically do reduce this distance. This asymmetry implies that leaks predominantly reflect minority opinions within the Governing Council. Consistent with this, leaks counteract prevailing trends in market expectations at the short end of the yield curve (as established by a negative coefficient on the interaction between the prevailing seven-day pre-leak trend and the leak dummy).

Leaks are not plants; attributed communication mitigates their effects. Post-announcement leaks dampen the transmission of monetary policy surprises to longer-term rates (negative and significant interaction coefficient in the augmented Tillmann framework). Attributed statements by ECB Executive Board members, by contrast, systematically move in the direction opposite to the preceding leak across most of the yield curve, partially reversing leak-induced market moves. More intense pre-leak attributable communication is also associated with lower market impact of the subsequent leak, across most maturities. These results jointly indicate that most Eurosystem leaks originate from individual insiders with minority opinions rather than constituting institutional plants.

Scope Conditions

Results pertain to the Eurosystem committee setting, where decision-making is broadly consensus-based and voting records are not published; they may not fully generalize to institutions with concentrated decision-making power. The study measures effects on financial markets, not broader public opinion.

In depth

Q1. How is a “leak” defined in this paper, and how are Eurosystem leaks identified empirically?

A leak is defined as a disclosure of confidential information by an insider to the media with an expectation of anonymity. Eurosystem leaks are identified from Reuters, Bloomberg, and Market News International archives (2002–2021) using keyword-driven pre-filtering followed by manual classification of “candidate” items. The resulting database contains 1,253 news items that aggregate to 368 unique policy-relevant leaks with minute-level timestamps. Policy-relevant leaks touch on: policy rates, unconventional monetary policy tools, economic growth, inflation, or the euro exchange rate; leaks about local economic conditions, banking regulation, or managerial appointments are excluded.

Q2. What are the broad trends in the number and topic composition of Eurosystem leaks over 2002–2021?

The number of leaks rose sharply in the second half of the sample, peaking at 36 in 2019 (more than four per meeting on average). Since Christine Lagarde took over the ECB presidency in November 2019, leaks fell by more than one third from that peak. The topic composition shifted substantially over time: policy-rate leaks predominated in the earlier period, while leaks related to UMP came to dominate in the 2015–2021 sub-period.

Q3. How does the timing of leaks within the policy meeting cycle change across sub-periods?

In the full sample, leaks cluster in the run-up to policy meetings and immediately following announcement days (both on the announcement day itself and the following Friday). Since 2015, a notable shift occurs from pre-meeting to post-meeting timing, driven specifically by leaks related to UMP. The authors attribute this shift to the expectation-management role of UMP: post-meeting leaks allow dissenting insiders to reshape market expectations that are otherwise guided by official press releases and press conferences.

Q4. What regression evidence supports the view that leaks are not random accidents?

Poisson regressions of the number of leaks per meeting on disagreement proxies find significant positive coefficients on: the lagged Italian-German sovereign yield spread (about half a leak more per meeting for each additional percentage point of spread), the inter-quartile range of national inflation rates, and the number of attributable statements per meeting. Meetings coinciding with the release of quarterly macroeconomic projections also attract significantly more leaks. These results are robust to replacing the disagreement proxies with a binary dissent index based on Q&A sessions at ECB press conferences (Tillmann, 2021), even after excluding disagreement-related leaks from the dependent variable to address endogeneity. The model explains about 15% of the variation in leak counts.

Q5. Does the number of pre-meeting leaks predict policy changes?

Yes. The propensity of a monetary policy change increases by four to six percentage points with each additional pre-meeting leak (significant at the 5% or 10% level). This signal about the propensity of change (not the direction) is hard to square with the random accidents hypothesis.

Q6. How large are the financial market reactions to leaks relative to placebo events and to attributable statements?

Market movements around leaks are up to 85% larger than the average size of market reactions to 5,000 randomly sampled placebo events. When leaks are compared directly to attributable statements (with leaks as the baseline and fixed effects for year, month, weekday, and hour), average absolute market moves around leaks are consistently larger across the entire term structure of OIS rates and for the EURO STOXX 50. This result is robust to differences in distance to policy meetings, with size differences across the full term structure persisting for periods far from meetings; near meetings, differences narrow but the average market reaction to leaks never falls below that to attributable statements.

Q7. Do the market effects of leaks differ by topic?

Yes. Leaks about policy rates primarily move the short and medium end of the yield curve. Leaks about UMP tools affect the long end of the curve and equities. Leaks about general economic conditions (growth, inflation, euro exchange rate) do not produce statistically significant market reactions, consistent with the interpretation that economic condition leaks require more interpretation before their implications for the policy path become apparent.

Q8. Do leaks move market expectations in the direction of the subsequent policy outcome?

No. The average pre-meeting leak does not reduce the absolute distance of market rates to post-announcement levels. This result holds across maturities from 3M to 10Y and is robust to separating leaks inside and outside the ECB’s quiet period. Attributable statements, by contrast, systematically reduce this distance (Table 7). The failure of leaks to align expectations with outcomes is interpreted as evidence that leaks predominantly reflect minority views within the Governing Council rather than information held by the decisive voter.

Q9. Do leaks counteract or reinforce prevailing trends in market expectations?

Leaks counteract prevailing trends. The regression of market reactions to leaks and placebo events on the seven-day pre-event trend reveals a significantly negative interaction between the trend and the leak dummy at the short end of the yield curve. This result is driven specifically by leaks about policy rates.

Q10. Do post-announcement leaks dampen the transmission of monetary policy surprises to longer-term rates?

Yes. In the augmented Tillmann (2021) framework, the interaction of the high-frequency 2Y monetary policy surprise with a dummy for post-announcement leaks is negative and significant for 2Y, 5Y, and 10Y OIS rates. In contrast, the interaction with a dummy for post-announcement attributable statements is positive and significant across maturities, indicating that attributed communication reinforces the official policy signal. These two results jointly show that leaks weaken official policy announcements while attributed communication strengthens them.

Q11. Does more intense pre-leak attributable communication reduce the market impact of subsequent leaks?

Yes. Using an intensity measure that weights each attributable statement by the inverse of its distance in hours to the subsequent leak (covering a window from 36 hours to 30 minutes before the leak), the paper finds a significant negative relationship between pre-leak communication intensity and the absolute market reaction to the leak, controlling for year, month, weekday, and hour fixed effects. This holds across most maturities.

Q12. Does the market impact evidence support the “plant” hypothesis?

No. If leaks were institutional plants intended to prepare markets for new policy, one would expect the ECB Executive Board — which controls official communication — to subsequently reinforce the signal from leaks. Instead, attributable statements by ECB-affiliated Governing Council members are systematically negatively correlated with the market direction of the preceding leak across the yield curve, with significant coefficients at medium maturities. NCB Governor statements show weaker and more ambiguous effects, potentially because their statements generate smaller average market movements rather than reflecting a lack of willingness to counteract leaks.

Q13. Why do markets react to leaks even though leaks are generally uninformative about policy outcomes?

The paper offers three candidate explanations: (1) automated trading algorithms that do not distinguish between attributed and anonymous communication; (2) leaks serve as a coordination device in the spirit of Morris and Shin (2002), amplifying even noisy signals; (3) media-reporting models such as Nimark (2014) and Chahrour et al. (2021) predict that “man-bites-dog” news — unusual events such as revelations of committee disagreement — shift beliefs beyond their true information content. Leaks are unusual both in frequency (far less common than attributed statements) and in content (they reveal disagreement that rarely surfaces in official communication).

Q14. What are the implications for the measurement of monetary policy shocks from high-frequency identification?

The paper notes that Eurosystem leaks frequently occur shortly before or after official policy announcements. Pre-announcement leaks can shift market expectations before the start of standard event windows, reducing the measured surprise component of official announcements. Post-meeting leaks dampen the end-of-day effects of announcements. In both cases, standard high-frequency surprise instruments extracted from official announcements alone may miss the full extent of new information available to market participants, suggesting that accounting for leaks could improve the relevance of high-frequency instruments used in monetary policy identification.

Q15. What are the implications for the design of central bank quiet periods?

The ECB’s quiet period ends with the policy announcement, whereas the Federal Reserve’s extends to the day after the meeting. Based on the finding that post-announcement leaks dampen policy announcement effects while post-announcement attributed statements reinforce them, the paper suggests that permitting attributed communication shortly after policy decisions may help mitigate the market impact of post-announcement leaks.

Key Concepts

Monetary policy leak (“sources story”): In this paper, a leak is defined as a disclosure of confidential information emanating from an insider within the Eurosystem (ECB or NCB staff or policy-makers) that is transmitted to financial media with an expectation of anonymity for the source. The paper excludes whistle-blower cases and focuses on leaks where anonymity keeps attention on the content rather than the identity of the source. Leaks are distinct from “plants” (formally authorized institutional disclosures intended to advance the institution’s goals) and from “pleaks” (the middle ground).

Plant: An authorized or semi-authorized anonymous disclosure of confidential information made for the purpose of advancing the public institution’s own goals and interests, as distinct from a leak that originates from an individual insider’s personal agenda. The paper tests and rejects the plant hypothesis for most Eurosystem leaks on the basis that ECB Executive Board members’ attributed statements systematically counteract the market impact of leaks.

Single voice principle: The ECB’s communication norm requiring that Governing Council members discuss and resolve disagreements internally while publicly representing the official policy stance. This principle creates a setting where individual members with minority views may resort to anonymous communication as a way to express dissent “off-protocol.”

Quiet period (purdah): The ECB’s rule requiring policy-makers to refrain from public statements on policy-related topics in the seven days before each Governing Council monetary policy meeting. Leaks cluster during this period despite the restriction, supporting the non-random interpretation of leaks.

Attributable (named) statement: A public statement clearly attributed to a specific, named member of the ECB Governing Council, reported as a breaking-news headline. Attributable statements serve both as a comparison benchmark for measuring the market impact of leaks and as a mitigation instrument when they counteract leak-induced market moves.

Pre-leak communication intensity (lambda): The paper’s measure of the intensity of attributable communication in the 36-hour window before a given leak, defined as the sum of inverse time distances (in hours) from each attributable statement to the leak. A higher value means more recent and/or more numerous attributed statements precede the leak.

High-frequency event study window: The paper uses an asymmetric window starting 5 minutes before and ending 30 minutes after a leak’s timestamp. Market reactions are measured as the change in the median OIS quote during the 10 minutes after the window versus the 10 minutes before, matching methodology used for both leaks and attributable statements to ensure comparability across communication types.

Post-announcement leak dummy: An indicator taking the value of one if at least one leak occurs between the end of the official ECB monetary policy announcement window (15:50 CET) and end of trading hours on the announcement day. Used in the augmented Tillmann (2021) regression to measure whether leaks dampen the transmission of monetary policy surprises to longer-term rates.

Community Engagement and Public Safety: Evidence from Crime Enforcement Targeting Immigrants

Mon, 01 Jan 0001 00:00:00 +0000

This paper studies how immigration enforcement affects public safety, asking two questions: (1) what is the effect of increased enforcement on criminal victimization, and (2) how does increased enforcement affect victims’ willingness to report crimes to police? The authors exploit the staggered rollout of the U.S. Secure Communities (SC) program — the largest expansion of interior immigration enforcement in U.S. history — across counties between 2008 and 2013. SC expanded information sharing between local police and federal immigration authorities, causing ICE honored detainer requests to increase by over 50% following program activation.

The primary data source is the restricted-access National Crime Victimization Survey (NCVS), which measures victimizations independently of whether they were reported to police and includes respondent ethnicity. This allows the authors to separately estimate effects on underlying crime incidence and on reporting behavior for Hispanic and non-Hispanic individuals. The empirical strategy uses a staggered difference-in-differences design following Sun and Abraham (2021), comparing earlier-treated counties to the last 25% of counties to activate SC, with estimates run separately by ethnicity.

The main findings run contrary to the stated policy goal of improving public safety. Among Hispanic individuals, SC caused a statistically significant 0.15 percentage point increase in monthly victimization — a 16% increase relative to the pre-period baseline of 0.9 percentage points — implying approximately 1.3 million additional crimes against Hispanics in the two years following program activation. The increase is concentrated primarily in property crimes (a statistically significant 15% increase), with a similarly sized but imprecisely estimated 15% increase in violent crime victimizations. The victimization increase is larger for Hispanic females (0.23 percentage points, or 25%) and in counties with higher shares of non-citizen Hispanic residents.

Simultaneously, SC caused a 9.5 percentage point decline in the likelihood that Hispanic victims report incidents to police — a 30% decline relative to the pre-period mean reporting rate of 33 percentage points. This reporting decline is primarily driven by a 34% decline in the reporting of property offenses. No changes in victimization or reporting are found for non-Hispanic individuals in the aggregate, though non-Hispanic individuals in neighborhoods with high Hispanic population shares do experience higher victimization rates after SC.

Critically, reported crime rates (the product of victimization and reporting) are unchanged for both Hispanic and non-Hispanic individuals, explaining why prior studies using administrative reported-crime data found null effects of SC. The null effect on reported crime masks two large, opposing causal forces.

The authors provide evidence that the decline in crime reporting is the primary driver of the increase in victimization. Cohorts with larger reporting declines experienced larger victimization increases, and a decomposition exercise shows the reporting decline is substantially more important than concurrent SC-induced changes in unemployment, wages, female-headed household shares, and the male immigrant share. Supporting data from 75 police departments confirm no change in 911 call volumes or total arrest volumes, while showing a decline in the Hispanic share of arrestees in both Hispanic and non-Hispanic neighborhoods — consistent with reduced reporting leading to reduced apprehension of offenders, with offending shifting toward non-Hispanic individuals.

Scope conditions: results are estimated for the population residing in counties exceeding 100,000 residents (representing 61% of total U.S. population and 69% of the Hispanic population), excluding southern border counties and states that actively resisted SC implementation (Illinois, Massachusetts, New York). Effects apply to all Hispanic respondents — citizens and non-citizens — consistent with prior evidence that citizen Hispanics respond to immigration enforcement out of concern for non-citizen contacts.

Q: What was the Secure Communities program and how was it implemented? A: SC was a federal program launched in 2008 that required fingerprints of individuals booked into local jails to be forwarded not only to the FBI but also to the Department of Homeland Security, enabling automatic screening for immigration violations. Local authorities could not prevent federal officials from learning of an arrestee’s immigration status. The program rolled out county-by-county between October 2008 and January 2013 due to technological constraints and resource bottlenecks, generating the staggered variation used for identification.

Q: How large was the first-stage effect on actual immigration enforcement? A: County-level honored ICE detainer requests increased by over 50% following SC activation, with a similar 40% increase in all detainer requests. The number of honored detainers nationwide doubled between 2008 and 2012. Over 90% of detainers and removals in any given month were for individuals of Hispanic ethnicity.

Q: What is the main finding on Hispanic victimization? A: SC caused a 0.15 percentage point increase in monthly Hispanic victimization rates, a 16% increase relative to the pre-period baseline of 0.9 percentage points. This translates to approximately 1.3 million additional crimes against Hispanics over two years following program activation, calculated by multiplying the monthly effect by 24 months and the 35.3 million Hispanics in the sample counties.

Q: What is the main finding on Hispanic crime reporting? A: SC caused a 9.5 percentage point decline in the likelihood that Hispanic victims report incidents to police, a 30% decline relative to the pre-period mean reporting rate of 33 percentage points. This decline occurred relatively quickly after activation and was concentrated in property offenses, where reporting fell by 34%.

Q: Why do reported crime rates show no change despite large shifts in victimization and reporting? A: Reported crime rates — the probability of being victimized and reporting the crime — are unchanged because the 16% increase in victimization and the 30% decline in reporting are approximately offsetting in magnitude. This explains why prior work using administrative police data (Miles and Cox 2014; Treyger et al. 2014; Hines and Peri 2019) found null effects of SC on reported crime: those data sources cannot separately identify the two underlying changes.

Q: Does SC affect non-Hispanic individuals? A: In the aggregate, SC has no statistically significant effect on non-Hispanic victimization or reporting. However, non-Hispanic individuals living in neighborhoods with high Hispanic population shares do experience victimization increases, and in those neighborhoods their reporting rates also decline slightly. Re-weighting non-Hispanic respondents to match the county composition of Hispanic respondents yields an 8% increase in non-Hispanic victimization, suggesting spillover effects in Hispanic-dense areas.

Q: What mechanism links the reporting decline to the victimization increase? A: The authors argue that reduced victim reporting lowers the probability that offenders are apprehended, thereby reducing the cost of committing crimes. They demonstrate this through two analyses: first, cohorts of counties with larger reporting declines experienced larger victimization increases; second, a decomposition shows the reporting channel is substantially more important than concurrent SC-induced changes in unemployment, wages, female-headed household shares, and the male immigrant share of the population.

Q: What do the police administrative data show about offender composition? A: Data from 75 police departments show no change in 911 call volumes or total arrest volumes following SC — consistent with the NCVS finding of unchanged reported crime rates. However, the Hispanic share of arrestees declined after SC, with a 1.5 percentage point drop in Hispanic neighborhoods (off a base of 54%), suggesting the rise in offending was more concentrated among non-Hispanic offenders as reduced reporting lowered expected punishment probabilities.

Q: How does the victimization effect vary by gender? A: The victimization point estimate for Hispanic males is 0.085 percentage points and imprecisely estimated (SE = 0.088). For Hispanic females, the effect is over 2.5 times larger at 0.23 percentage points, a 25% increase. The decline in reporting is comparable in magnitude across male and female Hispanic victims, suggesting fear of enforcement is similar by gender but that females disproportionately bear the crime burden.

Q: How does the victimization effect vary by neighborhood non-citizen Hispanic share? A: Victimization effects for Hispanics are relatively constant across neighborhood types but are higher — around 25% — in neighborhoods with the highest shares of non-citizen Hispanics. Counties with higher non-citizen Hispanic shares also exhibit higher ICE removal rates, indicating greater total enforcement, and these counties have higher victimization effects. Reporting declines among Hispanics appear relatively uniform across neighborhood types.

Q: Could survey attrition or compositional changes explain the results? A: The authors rule this out through several tests. First, SC has no statistically significant effect on household survey response rates, even in Census tracts above the 90th percentile of Hispanic share. A worst-case bias calculation implies attrition could account for at most 26% of the victimization effect. Second, re-estimating using predicted victimization (based on pre-SC demographics) yields precise null effects, indicating the increase is not driven by compositional change. Third, results are stable when restricting to respondents present at all survey waves or using individual fixed effects.

Q: Could the reporting decline be mechanical — reflecting a change in the types of crimes committed rather than behavioral change? A: The authors test this by constructing predicted reporting rates using pre-SC incident characteristics. The largest alternative estimate is -1.45 percentage points, over six times smaller than the estimated main reporting effect of 9.5 percentage points, ruling out crime composition change as the primary explanation. Results also hold when focusing on always-respondents and using individual fixed effects, ruling out entry of low-reporting individuals into the survey.

Q: How robust are the results to alternative empirical strategies? A: Results are robust to including states that resisted SC (with somewhat smaller magnitudes as expected), alternative population cutoffs, TWFE specifications, the Borusyak et al. (2021) and Callaway and Sant’Anna (2021) estimators (which yield larger point estimates), a triple-differences specification using non-Hispanics as an additional control group, and the inclusion of time-varying unemployment rates. The dynamic event-study plots show parallel pre-trends across all specifications.

Q: What are the policy implications of the null effect on aggregate victimization? A: The authors estimate that the policy ruled out declines in aggregate victimization larger than 3.3%, indicating SC did not generate meaningful improvements in aggregate public safety. This contradicts the stated mission of immigration enforcement agencies. The findings imply that policies targeting immigrant communities can generate public safety costs through trust erosion that outweigh any deterrence or incapacitation benefits.

Secure Communities (SC): A federal program launched in 2008 requiring automatic sharing of fingerprints from local jail bookings with the Department of Homeland Security, enabling identification of unauthorized immigrants among local arrestees and triggering ICE detainer requests; the largest expansion of interior immigration enforcement in U.S. history.

Chilling effect: The mechanism by which immigration enforcement raises the perceived cost of contacting law enforcement for immigrant victims and witnesses — through fear that they, a family member, or neighbor will be detained or deported — thereby reducing willingness to report crimes independently of any change in underlying criminality.

Victimization rate: The likelihood that an individual is the victim of a crime in a given period, measured via the NCVS independently of whether the crime was reported to police; the paper’s primary measure of public safety.

Reporting rate: The likelihood that a criminal victimization is reported by the victim to the police, measured as a share of all crime incidents; distinct from victimization rate and central to the paper’s decomposition of reported crime into its two components.

Reported crime rate: The joint probability of being victimized and reporting the crime, analogous to measures available in administrative police data such as the FBI UCR; this outcome masks the opposing effects of SC on victimization and reporting.

Honored detainer: An ICE detainer request that results in a transfer of the arrested individual to ICE custody; the paper’s preferred measure of immigration enforcement intensity because it is available both before and after SC activation and is more directly linked to deportation actions than all detainer requests.

Decomposition of victimization increase: The paper’s procedure for quantifying the relative importance of the reporting-channel (reduced probability of apprehension) versus other SC-induced social and economic changes (unemployment, wages, female-headed households, male immigrant share) in explaining the rise in Hispanic victimization.

Consumer Credit and the Incidence of Tariffs: Evidence from the Auto Industry

Mon, 01 Jan 0001 00:00:00 +0000

Overview

Research Question. Do import tariffs affect consumer credit terms, and does focusing solely on goods prices understate tariff pass-through to consumers? The paper also asks whether vertical integration – specifically, the ownership of a captive finance subsidiary – expands the channels through which manufacturers can pass on cost shocks, and whether tariff incidence falls disproportionately on consumers with less elastic credit demand or in areas with lower credit market competition.

Setting. The Trump administration’s 2018 metal tariffs – a 25 percent tariff on steel and a 10 percent tariff on aluminum – created a large and largely unanticipated cost shock for US auto manufacturers who are heavy consumers of both metals across their supply chains. Crucially, auto manufacturers own captive finance subsidiaries (e.g., Ford Credit, GM Financial, Honda Finance) that originate consumer auto loans alongside independent noncaptive lenders (banks, credit unions, independent finance companies). Because noncaptive lenders had no direct exposure to the metal tariffs, they serve as a natural control group in a difference-in-differences design.

Data. The primary data source is Regulation AB II, which requires issuers of public auto loan asset-backed securities to report loan-level information monthly to the SEC. The final sample covers 1,973,639 auto loans originated between January 2017 and December 2018 across 14 lenders (8 captive, 6 noncaptive). Vehicle invoice price data come from Regulation AB II; consumer sales price data come from the Texas Department of Motor Vehicles (covering approximately 3.9 million vehicle transactions in 2017-2018). Population credit bureau data from Equifax are used for representativeness checks and HHI construction.

Empirical Strategy. The baseline difference-in-differences compares captive auto loans to otherwise-identical noncaptive auto loans originated in the same state, the same quarter, for the same vehicle make-model-condition, and to borrowers in similar income and credit score bins. Parallel pre-trends tests confirm no economically meaningful differential pre-trends across captive and noncaptive lenders for any outcome variable.

Main Findings.

Interest Rate Pass-Through. Relative to noncaptive lenders, captive lenders increased average interest rates by 26 basis points following the tariff announcement, representing a 10 percent increase relative to the pretreatment captive mean of 252 basis points. This corresponds to an average present value increase in total loan payments of $179 per loan (discounted at 5 percent for an average $26,914 principal with 66-month maturity). By the fourth quarter of 2018, the dynamic estimate reaches 48 basis points – nearly double the pooled average – as metal prices continued to rise. The increase is concentrated among more-exposed captive lenders (those whose manufacturers operate two or more domestic production plants), not less-exposed captive lenders (primarily BMW, Mercedes-Benz, Volkswagen), ruling out captive-specific omitted variables.
Non-Price Loan Terms. There is no economically significant change in captive loan amounts, maturities, or loan-to-value ratios following the tariffs. Captive lenders responded to the tariff shock exclusively by raising interest rates, consistent with prior evidence that auto loan demand is less sensitive to interest rates than to non-price terms.
Vehicle Prices. Invoice prices for makes with greater domestic production rose by approximately 1.0 percent (relative to makes with less domestic production), and consumer sales prices rose by approximately 0.7 percent ($225 average increase relative to a pretreatment mean of $32,206) for these same makes.
Relative Magnitude of Pass-Through Channels. After accounting for estimated spillover effects on noncaptive lenders of 7 basis points, the spillover-adjusted estimate implies captive interest rates rose by 33 basis points on average, corresponding to $227 per loan in present value terms. Interest rate pass-through is estimated to be almost two-thirds as large as vehicle price pass-through, meaning that focusing solely on vehicle prices would underestimate tariff incidence on consumers by approximately 37 percent. The population-weighted average cost increase per vehicle is $146 – roughly equally split between higher vehicle prices ($74) and higher financing costs ($72).
Intensive vs. Extensive Margin. The composition of captive borrowers did not deteriorate following the tariffs: average household incomes of captive borrowers increased slightly (economically small), credit scores were unchanged, and future default rates showed no significant change. This confirms that the interest rate increase reflects tariff pass-through to inframarginal borrowers along the intensive margin, not a shift in borrower composition.
Heterogeneity by Credit Demand Elasticity. Pass-through via interest rates was higher for borrowers with lower incomes (33 basis points vs. 20 basis points for higher-income consumers), lower credit scores (36 basis points vs. 15 basis points), and smaller loan amounts (36 basis points vs. 12 basis points). These groups are proxies for less elastic credit demand, consistent with theoretical predictions that cost pass-through is larger where demand is less price sensitive.
Heterogeneity by Market Competition. Tariff pass-through via interest rates was higher in states with lower credit market competition (as measured by state-level Herfindahl-Hirschman Index). Consumers in the lowest competition decile experienced an average captive interest rate increase of 41 basis points, compared to 24 basis points for consumers in the highest competition decile. This 17 basis point differential implies that interest rate pass-through was approximately 88 percent as large as vehicle price pass-through in less competitive markets, versus 57 percent in more competitive markets.

In depth

Q1. What is a captive finance subsidiary, and why does it create a novel channel for tariff pass-through?

A captive finance subsidiary is a wholly owned lending unit of an auto manufacturer (e.g., Ford Credit, GM Financial, American Honda Finance) whose primary purpose is to finance the sale of the manufacturer’s vehicles. Because the captive lender and the manufacturing unit share a parent company, a cost shock to the manufacturing side – such as higher steel and aluminum prices from the tariffs – can be passed on to consumers not only through higher vehicle prices but also through worse financing terms offered by the captive. Prior studies documented tariff pass-through to goods prices but found limited evidence of pass-through to consumer prices; this paper shows that the bundling of a product with captive financing creates a second, previously unmeasured channel. The institutional structure also facilitates “price shrouding”: because consumers are less attentive to financing costs than vehicle sticker prices, captive lenders can exploit this inattention to pass on cost shocks along the financing margin.

Q2. Why is the auto loan market a particularly suitable setting for studying this question?

The auto loan market provides three key advantages. First, both captive lenders (directly exposed to metal tariffs via manufacturing) and noncaptive lenders (with no direct tariff exposure) compete for the same borrowers on the same vehicle purchases, creating a clean within-vehicle, within-period control group. Second, the Regulation AB II data contain vehicle make-model-condition information, allowing the authors to hold vehicle choice fixed and isolate tariff pass-through to loan terms separately from any vehicle switching by consumers. Third, the indirect dealer-intermediated financing process means that consumers typically do not observe the full set of lender bids, weakening their ability to actively arbitrage between captive and noncaptive loan offers.

Q3. What is the Regulation AB II data, and how representative is it?

Under Regulation AB II (effective November 2016), issuers of publicly offered auto loan asset-backed securities must report monthly loan-level data to the SEC, including interest rates, loan amounts, maturities, vehicle characteristics, borrower credit scores and incomes, and loan performance. The final sample covers approximately 8 percent of all open auto loans in the United States and around 30 percent of the total auto loan portfolios of the 14 sampled lenders. Average loan characteristics in the Regulation AB II data closely match population credit bureau data from Equifax, indicating that securitization selection is not a major concern. Average credit scores and incomes are slightly higher in Regulation AB II than in the population, primarily because small banks and credit unions that serve riskier borrowers do not access public securitization markets.

Q4. What is the baseline empirical specification and what identifying variation does it use?

The baseline is a difference-in-differences regression comparing captive loans (treated) to noncaptive loans (control) before and after January 2018 (the date of the Department of Commerce’s initial tariff recommendation, chosen conservatively). The regression includes lender fixed effects, vehicle make-model-condition x origination quarter fixed effects, state x origination quarter fixed effects, $25,000 income bin x origination quarter fixed effects, and 10-point credit score bin x origination quarter fixed effects. The coefficient of interest is estimated using within-lender variation after netting out common vehicle-level shocks, state-level shocks, and shocks common across income and credit score cells. This granular fixed effect structure ensures that the estimate compares captive and noncaptive loans for exactly the same vehicle, in the same state, in the same quarter, to borrowers with similar incomes and credit scores.

Q5. What are the main coefficient estimates on interest rates, and how do they evolve dynamically?

In the full sample, the pooled difference-in-differences estimate is 26 basis points (t = 2.75), representing a 10 percent increase relative to the pretreatment captive mean of 252 basis points. Excluding subvented (subsidized) loans, the estimate is 29 basis points (t = 2.85). Dynamically, captive interest rates started rising within one quarter of the treatment date and continued increasing alongside metal prices, reaching a terminal coefficient of 48 basis points in the fourth quarter of 2018 – nearly double the pooled average. Consistent with the parallel trends assumption, there is no economically significant evidence of differential pre-trends across captive and noncaptive loans in the pretreatment period.

Q6. How do the authors validate that noncaptive lenders constitute a valid counterfactual?

Four alternative specifications are presented. First, when splitting captive lenders by tariff exposure (more exposed: Ford, GM-AmeriCredit, Honda, Toyota; less exposed: BMW, Mercedes-Benz, Volkswagen), only more-exposed captive lenders show a significant increase in interest rates (30 basis points; t = 3.37), while less-exposed captive lenders show no significant increase (-18 basis points; t = -1.33). This rules out captive-specific correlated omitted variables. Second, the authors add interactions of the treatment indicator with changes in the Fed Funds rate and 1-, 5-, and 10-year Treasury yields; results are unchanged in magnitude, ruling out differential sensitivity to the rising interest rate environment of 2018. Third, using CarMax (a noncaptive that also sells and finances vehicles but does not participate in DealerTrack) as the sole control group yields similar results. Fourth, lender-specific borrowing cost controls do not attenuate the estimates.

Q7. Did captive lenders adjust any non-price loan terms in response to the tariffs?

No. Columns 2-4 of Table 3 document that loan amounts, maturities, and loan-to-value ratios showed no economically significant changes for captive lenders relative to noncaptive lenders following the tariffs. Some coefficient estimates in the full sample are statistically significant but economically small, and they lose significance or flip signs once subvented loans are excluded. The event study plots confirm no meaningful pre-trends and no meaningful post-treatment changes in non-price terms. The authors note that this is consistent with prior evidence that auto loan demand is less sensitive to interest rates than to maturity, making interest rates the optimal margin along which to pass through costs.

Q8. How do the authors rule out that the increase in captive interest rates reflects a change in borrower composition rather than intensive-margin pass-through?

The authors estimate a separate regression (equation 4) with log household income, log credit score, and future default rate as outcomes. Relative to noncaptive borrowers, captive borrowers experienced a small but positive increase in average household income (Gamma = 0.012, t = 3.25), no significant change in credit scores (Gamma = 0.001, t = 1.13), and no significant change in 12-month or 24-month default rates. The income increase is of the wrong sign and too small in magnitude to explain the observed interest rate increase from a risk-based pricing perspective. Additionally, captive loan origination volumes declined 6.7 percent after the tariffs, inconsistent with a demand surge driving the interest rate increase.

Q9. How do the authors rule out alternative explanations including demand surges, borrowing cost increases, securitization changes, and dealer markup changes?

For demand surges: vehicle sales volumes showed no noticeable increase following the tariff announcement, and captive loan originations actually declined. For differential borrowing costs: controlling for lender-specific CDS spreads and other borrowing cost measures does not attenuate the main estimate. For securitization changes: combining Regulation AB II and credit bureau data, the authors find no significant change in captive lenders’ securitization rates, the ratio of securitized to total loan amounts, maturities, or monthly payments. For dealer markup changes: noncaptive loans are also subject to dealer markups, so common changes are absorbed in the DiD; additionally, subvented loans (which dealers cannot mark up) also show higher captive interest rates post-tariff, ruling out differential markup changes. For interest rate sensitivity differentials: controlling for changes in risk-free rates does not alter results. For prepayment responses: 12-month and 24-month prepayment rates show no significant change for captive loans.

Q10. How do the authors measure vehicle price pass-through, and what data do they use?

To measure invoice price pass-through, the authors use Regulation AB II data (which contains the invoice price for new vehicles) and estimate a regression comparing the change in log invoice prices for makes with a higher proportion of US-assembled vehicles versus those with lower domestic production, controlling for vehicle make-model fixed effects and price bin x quarter fixed effects. Invoice prices rose approximately 1.0 percent for more-exposed makes. For consumer sales price pass-through, the authors use Texas DMV data (1,819,498 new and 2,105,938 used vehicle transactions in 2017-2018) with the same identification strategy. Sales prices rose approximately 0.7 percent ($225 average increase) for more-exposed makes. Both effects are robust to defining exposure at either the make level or the make-model level.

Q11. How is the overall pass-through rate decomposed between the interest rate and vehicle price channels?

The authors define total tariff pass-through as the sum of interest rate pass-through (change in aggregate captive financing costs divided by aggregate production cost increase) and vehicle price pass-through (change in aggregate new vehicle sales revenue divided by aggregate production cost increase). Taking the ratio of these two components allows them to estimate the relative importance of each channel without needing to directly measure production costs. With a captive loan penetration rate (M) of 0.59, a per-loan present value financing cost increase of $179 (unadjusted) or $227 (adjusted for 7 basis point spillover effect on noncaptives), and a $225 average vehicle price increase, the spillover-adjusted estimate implies interest rate pass-through is almost two-thirds as large as vehicle price pass-through. Focusing solely on vehicle prices would underestimate tariff incidence on consumers by approximately 37 percent. The population-weighted average total cost increase is $146 per vehicle, roughly equally split between vehicle prices ($74) and financing costs ($72).

Q12. How large is the estimated aggregate impact of the tariffs on consumer financing costs?

Using population data of approximately 50 million vehicles sold annually in the United States and a population-weighted average financing cost increase of $72 per vehicle, the authors estimate that the tariffs resulted in approximately $3.6 billion (= 50,000,000 x $72) in additional present value financing costs each year. For reference, Flaaen, Hortacsu, and Tintelnot (2020) estimated that the 2018 tariffs on washing machines led to $1.5 billion in additional annual consumer costs.

The triple-differences results show monotonically higher pass-through for borrowers with less elastic credit demand. Lower-income borrowers (below median) experienced an average captive interest rate increase of 33 basis points versus 20 basis points for higher-income borrowers. Lower-credit-score borrowers experienced an increase of 36 basis points versus 15 basis points for higher-credit-score borrowers. Borrowers with smaller loan amounts (below median) experienced an increase of 36 basis points versus 12 basis points for larger loan amounts. Within income quartiles, consumers in the lowest income quartile experienced a 37 basis point increase compared to 17 basis points in the highest quartile. These patterns are not driven by changes in borrower composition, as default rates show no significant change across any of these subgroups.

Q14. How does credit market competition affect tariff pass-through via interest rates?

States with lower credit market competition (higher Herfindahl-Hirschman Index, constructed from pretreatment lender market shares) experienced higher interest rate pass-through. Comparing above- versus below-median HHI states, the difference is 5 basis points (28 vs. 23 basis points), statistically significant at the 10 percent level. When restricting to the tails of the competition distribution, the difference is substantially larger: consumers in the lowest competition decile experienced an average increase of 41 basis points versus 24 basis points for consumers in the highest competition decile – a 17 basis point differential. This implies interest rate pass-through was 88 percent as large as vehicle price pass-through in less competitive markets versus 57 percent in more competitive markets, consistent with theoretical predictions that firm-specific cost shocks generate higher pass-through when competition is weaker.

Q15. Why do captive lenders spread interest rate increases broadly across vehicle types rather than targeting directly tariff-exposed new vehicle models?

The authors find that captive interest rates increased for both new and used vehicles, and that within more-exposed captive lenders, interest rate increases were not concentrated in domestically produced vehicle models. This is consistent with the hypothesis that firms spread cost shocks across multiple goods and business segments (as documented in the industrial organization literature for multiproduct firms). The authors argue this occurs because vehicles of different makes and models are substitutes for each other (making vehicle-specific price increases costlier in terms of demand loss), whereas auto loans are complementary to vehicle purchases and are offered as an add-on to the sales transaction. This bundled structure, combined with consumer inattention to financing terms, makes it optimal to spread the cost shock across the loan book rather than concentrating it in specific vehicle models.

Key Concepts

Captive Finance Subsidiary: A wholly owned lending unit of a manufacturer (e.g., Ford Credit, GM Financial) whose primary purpose is to originate loans and leases to finance the sale of the manufacturer’s own products. Unlike independent noncaptive lenders, captive lenders are vertically integrated with the manufacturing unit and can, in principle, use financing terms as an additional margin to pass through manufacturing-side cost shocks to consumers.

Tariff Pass-Through (Interest Rate Channel): The extent to which an input cost increase caused by an import tariff is transmitted to consumers via higher interest rates charged by captive lenders, rather than (or in addition to) higher goods prices. The paper defines interest rate pass-through as the ratio of the aggregate present value increase in captive financing costs to the aggregate increase in manufacturing production costs.

Intensive vs. Extensive Lending Margin: The distinction between raising loan prices charged to existing (inframarginal) borrowers (intensive margin) versus changing the pool of borrowers served or lending standards (extensive margin). The paper argues that the observed increase in captive interest rates reflects intensive-margin pass-through because borrower incomes, credit scores, and future default rates did not change significantly after the tariffs.

Price Shrouding: The practice of making price increases less salient to consumers by embedding them in a less-scrutinized component of a bundled transaction. In the auto market, because consumers are documented to be less sensitive to increases in financing costs than to vehicle sticker prices, captive lenders can pass on cost shocks through interest rates with less demand response than if they raised vehicle prices by an equivalent amount. The paper treats this as a key mechanism enabling the financing pass-through channel.

Subvented (Subsidized) Loan: A promotional auto loan offered at a below-market interest rate, often tied to specific vehicle models or sales events (e.g., “1.99 percent APR for well-qualified borrowers”). Subvented loans are typically fixed by the manufacturer and cannot be marked up by dealers. The paper uses the subsample of non-subvented loans as a robustness check and to isolate tariff pass-through from seasonal variation in promotional financing.

Captive Loan Penetration Rate (M): The ratio of captive auto loans originated to new vehicles produced and sold, used in the paper’s decomposition of total tariff pass-through into the interest rate and vehicle price channels. Estimated at approximately 0.59 from population data, this parameter determines how the aggregate present value financing cost increase scales relative to the aggregate vehicle sales price increase when computing the relative importance of the two pass-through channels.

Herfindahl-Hirschman Index (HHI) as Market Competition Measure: The paper constructs state-level HHIs based on pretreatment lender market shares in each state using population credit bureau data, as an inverse measure of credit market competition. Local (direct) auto lending markets exhibit meaningful geographic variation in HHI, in contrast to the largely national scope of indirect (dealer-arranged) lending. The paper uses this variation to test whether pass-through is higher in less competitive credit markets, consistent with theoretical predictions for firm-specific cost shocks.

Consumer durables and monetary policy according to HANK

Mon, 01 Jan 0001 00:00:00 +0000

Layer 1 — Overview

Research Question

Consumer durables account for a disproportionately large share of household expenditure fluctuations despite their small share of total private consumption. Two stylized facts motivate the paper: (1) durable expenditure is far more interest-rate sensitive than nondurable expenditure following monetary policy shocks, and (2) durable and nondurable expenditures comove positively and persistently—both reaching trough in the same quarter. Standard two-sector New Keynesian models struggle to generate this positive conditional comovement because asymmetric sectoral price rigidity induces large relative-price movements that push the two sectors in opposite directions. This paper asks what model features are necessary and sufficient to reproduce both the sectoral comovement pattern and the hump-shaped aggregate dynamics observed in the data, and how the answer changes across households sorted by liquid asset holdings.

Data and Methodology

Empirical identification. The authors employ a local projection instrumental variables (LP-IV) strategy using Romer-Romer monetary policy shocks updated by Wieland and Yang (2020), over the sample 1969:Q1–2007:Q3. Impulse response functions (IRFs) are normalized to a cumulative 100 basis-point increase in the Federal Funds Rate over five years. Household-level evidence is drawn from the Consumer Expenditure Survey (CEX) and the Survey of Consumer Finances (SCF); households are classified as liquidity-constrained if liquid assets are below $1,000.

Model. The authors develop a two-sector Heterogeneous Agent New Keynesian (HANK) model in which households maximize utility over nondurable consumption and a durable stock (Cobb-Douglas aggregation), face convex adjustment costs on durable purchases, and update expectations infrequently in the Mankiw-Reis sense (probability of not updating: Xi = 0.918 per period). The general equilibrium version features asymmetric Rotemberg price stickiness (Calvo probability 0.671 for nondurables, 0.797 for durables), nominal wage stickiness (Calvo 0.802), and a Taylor rule with inflation coefficient 1.105, output coefficient 1.440, and smoothing 0.988.

Main Findings and Quantitative Magnitudes

Sectoral magnitude gap. At trough (approximately 8 quarters after the shock), the durable expenditure response to monetary tightening is an order of magnitude larger than the nondurable response—a fact the calibrated HANK model is designed to match.
Positive comovement. Both durable and nondurable expenditures contract and reach trough in the same quarter, contradicting TANK models (Monacelli 2009) in which savers shift portfolios toward durables and generate negative comovement for that group.
Relative-price dynamics. The relative price of durables rises following monetary tightening (nondurables deflate more), but the rise is modest and cannot overturn the positive comovement result.
Role of the direct interest-rate effect. Across liquid-asset groups, the direct effect accounts for 73–87% of the cumulated durable expenditure response and 37–91% of the cumulated nondurable expenditure response. This direct channel—operating through intertemporal substitution—is quantitatively first-order for durables in a way it is not in standard single-sector HANK models where income effects dominate.
Role of sticky information. A full-information HANK variant produces a counterfactually high durable elasticity (35.24 times the baseline) and no hump-shaped dynamics. Infrequent information updating (Xi = 0.918) is essential to match the hump-shaped propagation of both sectoral and aggregate expenditures.
Income effects and fiscal policy. For a fiscal subsidy specifically targeting durable purchases, intertemporal substitution incentives generate a large shift toward durables and, without income effects, a counterfactual crowding-out of nondurable spending. Income effects are essential to protect nondurable spending, and the aggregate consumption effect of such a policy is at best modest—consistent with Mian and Sufi’s (2012) evidence on cash-for-clunkers.

Scope Conditions

All empirical results are conditional on the LP-IV sample 1969:Q1–2007:Q3 and Romer-Romer shocks as instrumented by Wieland-Yang. The household-level comovement result is established for both liquidity-constrained (liquid assets below $1,000) and unconstrained savers using CEX/SCF data. Model quantitative results are specific to the calibration targeting moments from Fagereng et al. (2021) marginal propensities and BEA depreciation data (delta = 0.054).

In depth

Q1. What is the core empirical puzzle the paper addresses, and why do standard models fail?

Standard two-sector New Keynesian models predict that asymmetric sectoral price stickiness generates large relative-price movements between durables and nondurables following a monetary shock. These relative-price shifts tend to produce negative conditional comovement—when durables contract, nondurables expand—contradicting the data. The authors document that both categories exhibit positive and persistent comovement, both reaching their trough at approximately 8 quarters, which standard models cannot replicate.

Q2. What are the key empirical facts established via LP-IV?

Using Romer-Romer shocks over 1969:Q1–2007:Q3, normalized to a cumulative 100bp Federal Funds Rate increase, the authors find: (1) aggregate expenditure follows a hump-shaped contraction with trough at roughly 8 quarters; (2) the durable expenditure response is an order of magnitude larger than the nondurable response at trough; (3) both categories reach their trough in the same quarter; and (4) the relative price of durables rises modestly after monetary tightening (nondurables deflate more), but not enough to reverse comovement.

Q3. How is the partial equilibrium model calibrated, and which moments does it target?

Key calibrated parameters include CRRA sigma = 2.640, Cobb-Douglas weight on nondurables theta = 0.607 (implying durable expenditure share 0.193), adjustment cost alpha = 8.299, information stickiness Xi = 0.918, depreciation rate delta = 0.054, steady-state real rate r = 0.03/4, discount factor beta = 0.915 (matching a 30% share of liquidity-constrained households with liquid assets-to-income ratio of 0.26), and borrowing wedge kappa = 0.05. Moments matched include quarterly MPC on nondurables (22.94%), quarterly MPX on durables (24.15%), interest-rate elasticity of durable expenditure (3.35, within the empirical range of 1.1–5.0), price elasticity of durable demand (29.59), and durable stock skewness relative to nondurable consumption (0.695, consistent with Bertola et al. 2005).

Q4. How does the paper decompose monetary policy transmission?

The paper decomposes transmission into three channels: (1) the direct effect of real interest rate changes, which operates through intertemporal substitution and accounts for the quantitatively largest share of the durable response; (2) the relative-price effect, which is modest and redistributive but cannot overturn positive comovement; and (3) pure income effects, which are key for persistence of the nondurable response but not for the sign of comovement.

Q5. What do counterfactual models reveal about the role of each model ingredient?

A sticky-information RANK produces positive comovement but the dynamics are front-loaded and less inertial than in the data. A sticky-information TANK delivers results similar to RANK—income effects do not qualitatively change the story. A full-information HANK produces a counterfactually high durable interest-rate elasticity (35.24 times the baseline) and no hump-shaped dynamics, demonstrating that sticky information is the ingredient generating realistic propagation, not heterogeneity per se.

Q6. What does the household-level evidence from CEX and SCF show about comovement across the wealth distribution?

Classifying households as liquidity-constrained if liquid assets are below $1,000, the LP-IV estimates show positive comovement between durables and nondurables for both constrained and unconstrained savers. This contradicts TANK models (Monacelli 2009), in which savers shift portfolios toward durables following a monetary shock, generating negative comovement for the saver group. After controlling for income and relative prices, the direct interest-rate effect operates uniformly across financial status groups.

Q7. How does the direct effect vary across liquid asset groups quantitatively?

Decomposing across four liquid asset groups (below $1k, $1k–$10k, $10k–$20k, above $20k), the direct effect accounts for 73–87% of the cumulated durable expenditure response and 37–91% of the cumulated nondurable expenditure response. Income effects are more important for nondurable spending prolongation among liquidity-constrained households, but the direct channel dominates durable expenditure for all groups.

Q8. How does the general equilibrium two-sector HANK model differ from the partial equilibrium setup?

The GE model adds asymmetric sectoral price stickiness (Calvo probabilities 0.671 for nondurables and 0.797 for durables), nominal wage stickiness (Calvo 0.802), a Taylor rule (inflation coefficient 1.105, output coefficient 1.440, smoothing 0.988), and fiscal lump-sum taxes responding to debt (coefficient 0.191). These features generate the relative-price dynamics observed in the data while preserving the positive comovement result.

Q9. What does the fiscal policy application reveal about the role of income effects?

A fiscal subsidy targeting durable purchases generates a much larger shift in the relative price of durables than monetary policy does. Without income effects, intertemporal substitution dominates and nondurable spending falls—a counterfactual result inconsistent with the data. With income effects present, nondurable spending is protected. The aggregate consumption effect of such a durable-targeted fiscal policy is at best modest, consistent with Mian and Sufi’s (2012) evidence from the cash-for-clunkers program.

Q10. What is the broader implication for the literature on HANK versus RANK transmission?

In standard single-sector HANK models, income effects (the indirect channel) typically dominate monetary transmission. The presence of consumer durables restores a quantitatively important role for the direct interest-rate channel, which operates through intertemporal substitution in durable purchases. This rebalances the direct-versus-indirect decomposition relative to the conventional HANK wisdom and shows that the durable goods sector is essential to understanding the full transmission mechanism.

Key Concepts

Sectoral comovement (conditional on monetary policy shocks) The empirical regularity that durable and nondurable expenditures both contract following monetary tightening and reach their respective troughs in the same quarter. In this paper, comovement is defined conditional on identified monetary policy shocks (LP-IV with Romer-Romer instruments), not unconditionally. Standard two-sector NK models predict negative conditional comovement due to relative-price effects; replicating positive comovement is the central discipline imposed on the model.

Direct effect (of real interest rate changes) The component of monetary transmission that operates through the intertemporal substitution incentive induced by changes in the real interest rate, holding income and relative prices fixed. Distinct from the income effect (indirect channel) and the relative-price effect. In this paper’s decomposition, the direct effect accounts for 73–87% of the cumulated durable expenditure response across liquid-asset groups.

Sticky information (Mankiw-Reis) Households update their information sets infrequently, with probability (1 - Xi) per period; Xi = 0.918 means only about 8.2% of households update each quarter. This mechanism is essential in the model for generating the hump-shaped, inertial impulse response dynamics observed in the data. Without it (full-information HANK), the durable elasticity is counterfactually large (35.24 times baseline) and dynamics are front-loaded.

MPX (Marginal Propensity to Expend on durables) Analogous to the MPC for nondurables, the MPX measures the additional durable expenditure flow induced by an income windfall. Calibrated to 24.15% quarterly, matching estimates from Fagereng et al. (2021). Distinct from the MPC because durable purchases represent investment in a stock, not immediate consumption flow.

Liquidity-constrained households Households with liquid assets below $1,000, identified in the CEX and SCF. In the model, the 30% share of such households is targeted by the discount factor (beta = 0.915) and the borrowing wedge (kappa = 0.05). The paper’s key finding is that positive comovement holds for both constrained and unconstrained households, contradicting TANK predictions.

HANK (Heterogeneous Agent New Keynesian model) A New Keynesian general equilibrium model in which households are heterogeneous in their liquid asset holdings (and thus face binding borrowing constraints), so that the distribution of assets matters for aggregate dynamics. Distinguished from RANK (Representative Agent NK) and TANK (Two-Agent NK, which approximates heterogeneity with one unconstrained and one hand-to-mouth agent). In this paper, HANK is extended to a two-sector setting with durables and nondurables.

Convex adjustment costs on durable purchases A cost of adjusting the durable stock that is convex in the size of the adjustment (calibrated parameter alpha = 8.299). This smooths the durable expenditure response and prevents counterfactually sharp jumps in durable purchases following interest rate changes, contributing to realistic propagation dynamics alongside sticky information.

Contextually Private Mechanisms

Mon, 01 Jan 0001 00:00:00 +0000

Haupt and Hitzig introduce a framework for comparing the privacy properties of different mechanism protocols. The core research question is: when a designer commits to implementing a social choice rule, how much superfluous private information must they inevitably learn about agents, and how should they design the elicitation protocol to minimize that exposure?

The setting is a finite-player extensive-form game in which a designer elicits agents’ private types through a dynamic protocol to compute a social choice function. The authors explicitly exclude cryptographic tools and trusted mediators, working under the minimal assumption that the designer learns information if and only if an agent discloses it. This assumption is motivated by the historical prevalence of live dynamic auction formats — ascending formats at Sotheby’s, descending formats at Aalsmeer, oral ascending formats used by the U.S. Forest Service for timber, multi-round clock auctions for radio-spectrum allocation — and by settings where mediating technology is unavailable or costly.

The central object is the contextual privacy violation. A protocol produces a contextual privacy violation for agent i at type profile θ if the designer can distinguish θ_i from some alternative type θ’_i while holding other agents’ types fixed, yet the social choice rule assigns the same outcome at both profiles. Violations are defined at the level of individual agent–state pairs, not aggregated ex ante. A protocol is fully contextually private if it produces no violations; it is maximally contextually private if its set of violations is inclusion-minimal among all protocols that implement the same rule.

The main characterization result (Theorem 1) connects privacy to pivotality: a social choice function admits a fully contextually private protocol if and only if, on every product subset of the type space where agents are collectively pivotal, at least one agent is individually pivotal. The contrapositive is what drives the paper’s impossibility results: whenever a rule contains a region where no single agent’s report changes the outcome but a group’s joint report does, any implementing protocol must produce contextual privacy violations.

Using this characterization, the authors establish that the first-price auction rule (Proposition 2) and serial dictatorship (Proposition 3) admit fully contextually private protocols. Conversely, k-item Vickrey auction rules (Proposition 4) and any stable school-choice rule (Proposition 5) do not admit fully contextually private protocols, because these rules contain type-space regions where agents are only collectively — not individually — pivotal.

For k-item Vickrey auctions, the authors study maximally contextually private protocols. They establish (Proposition 6) that, for a class of social choice rules on totally ordered type spaces that contains k-item Vickrey auctions, it is without loss to consider only protocols consisting of threshold queries that are monotonically increasing or decreasing after an initial guess. This reduction identifies two key design dimensions: the initial query posed to each agent, and the order in which agents are queried.

The main constructive result (Theorem 2) proves that an ascending-join protocol is maximally contextually private for the k-item Vickrey auction. Proposition 7 formalizes the sense in which this protocol protects privacy by delaying queries to certain bidders — it repeatedly asks agents whether they can rule out a particular outcome, and postpones questioning agents whose privacy it is protecting.

The authors also show (Proposition 19) that the ascending-join protocol is minimally relatively informative among protocols that are maximally contextually private. Extensions cover group contextual privacy (Proposition 11) and individual contextual privacy (Proposition 8), showing that individual contextual privacy violations equal the union of contextual privacy violations and nonbossiness violations.

Q: What is a contextual privacy violation, precisely? A: A protocol produces a contextual privacy violation for agent i at type profile θ if the designer can distinguish θ_i from some alternative type θ’_i — holding all other agents’ types fixed — yet the social choice rule assigns the same outcome at both profiles. The violation is defined at the level of individual agent–state pairs. A single additional superfluous distinction at the same (i, θ) pair does not register as a second violation; the framework records whether any unnecessary disclosure occurs for that agent at that state, not the degree of overexposure.

Q: How does contextual privacy differ from relative informativeness? A: Relative informativeness compares two protocols by whether one distinguishes every pair of type profiles the other does, treating all disclosures as equally undesirable. Contextual privacy conditions the notion of a “violation” on the social choice rule: a distinction between θ_i and θ’_i counts as a violation only when the rule assigns the same outcome at both profiles. Relative informativeness thus penalizes the designer for learning information that is necessary to implement the rule, whereas contextual privacy imposes no penalty for learning pivotal information.

Q: What is the pivotality characterization (Theorem 1)? A: A social choice function admits a fully contextually private protocol if and only if, on every product subset of the type space where agents are collectively pivotal, at least one agent is individually pivotal. The necessity direction shows that if a collectively pivotal set exists where no agent is individually pivotal, any implementing iterative partition must contain an earliest node that distinguishes two type profiles leading to the same outcome. The sufficiency direction constructs a contextually private protocol inductively by always querying an individually pivotal agent, ensuring every distinction implies a different outcome.

Q: Which social choice rules admit fully contextually private protocols? A: The first-price auction rule (Proposition 2) and serial dictatorship (Proposition 3) admit fully contextually private protocols. The authors use Theorem 1 to show this: in both rules, any collectively pivotal region contains an individually pivotal agent. By contrast, k-item Vickrey auction rules (Proposition 4), any stable school-choice rule (Proposition 5), efficient allocations in housing assignment, and generalized median voting rules (Section B) do not admit fully contextually private protocols.

Q: Why do k-item Vickrey auctions fail full contextual privacy? A: Proposition 4 shows that k-item Vickrey auctions for k ≥ 1 do not admit fully contextually private protocols. The argument uses the necessary conditions from Theorem 1 (Corollaries 1 and 2): the Vickrey payment rule creates type-space regions where multiple agents together determine the price but no single agent is individually pivotal over the price, so any protocol implementing the Vickrey rule must produce violations for at least some agents at some type profiles.

Q: What is the ascending-join protocol and what does Theorem 2 establish? A: The ascending-join protocol is a specific dynamic elicitation protocol for k-item Vickrey auctions that repeatedly asks agents whether they can rule out a particular outcome, structured as threshold queries ascending from an initial guess. Theorem 2 proves that the ascending-join protocol is maximally contextually private for the k-item Vickrey auction. Proposition 7 formalizes the protection mechanism: the protocol delays queries to the bidders whose privacy it is protecting, querying them only when their responses become necessary for determining the outcome.

Q: What does Proposition 6 establish about the structure of maximally contextually private protocols? A: For a class of social choice rules on totally ordered type spaces that contains k-item Vickrey auctions, Proposition 6 shows it is without loss of generality to consider only protocols consisting of threshold queries that are monotonically increasing or decreasing in the threshold after an initial guess. This result serves as a theoretical reduction (enabling proofs that certain protocols are maximally private) and as a practical design principle (identifying the initial query and the ordering of agents as the two key design dimensions).

Q: How does contextual privacy relate to obviously dominant strategies? A: The paper treats privacy properties and incentive properties as largely orthogonal questions, to be analyzed separately. For the ascending-join protocol specifically, the authors verify obvious dominance — the most demanding incentive notion they consider — which requires that at every history, the worst-case payoff from the equilibrium action exceeds the best-case payoff from any deviation. This analysis proceeds after the contextual privacy properties of the protocol are established.

Q: What is group contextual privacy and why do the authors focus on individual-level violations instead? A: Group contextual privacy requires that whenever the designer learns any property of the joint type profile, that property must affect the outcome. The authors show (Proposition 11) that a protocol is fully group contextually private if and only if every query rules out at least one outcome. They argue this standard is extremely demanding and produces a very coarse partial order: improving in the group privacy order requires restructuring the entire protocol tree rather than making agent- or state-specific improvements. They also note that normative accounts of privacy, including Nissenbaum’s contextual integrity theory, center on individual rather than group information.

Q: How does individual contextual privacy relate to nonbossiness? A: Individual contextual privacy (Proposition 8) requires that if two type profiles differing only in agent i’s type are distinguished, they must lead to different allocations for agent i — presuming a private allocation domain. The paper shows that the set of individual contextual privacy violations equals the union of contextual privacy violations and nonbossiness violations: individual contextual privacy is violated precisely when either (a) agent i’s superfluous type information is revealed, or (b) agent i is “bossy” — able to change others’ outcomes without changing their own.

Q: What is the relationship between the ascending-join protocol and minimal relative informativeness? A: Proposition 19 shows that the ascending-join protocol is not only maximally contextually private but also minimally relatively informative among protocols that are maximally contextually private. That is, among all maximally contextually private protocols, the ascending-join protocol reveals the smallest total amount of information about the type profile in the relative informativeness order. This establishes relative informativeness as a useful refinement for selecting among contextually privacy-equivalent protocols.

Q: What motivates the exclusion of cryptographic tools and trusted mediators from the framework? A: The authors work under the minimal assumption that the designer learns information if and only if an agent directly discloses it — no commitment to forget, anonymize, or cryptographically conceal. They motivate this on two grounds: first, many real-world auction formats are live and dynamic with no mediating technology; second, advanced cryptography is often costly in time, money, or computation, and studying the no-mediator benchmark can explain the historical prevalence of dynamic protocols and inform auction design in environments where cryptography may become unavailable (for example, due to quantum computing). The authors cite a Danish sugar-beet auction as a case where designers themselves questioned whether full multiparty computation was necessary.

Contextual privacy violation: A protocol produces a contextual privacy violation for agent i at type profile θ if the designer can distinguish θ_i from some alternative type θ’_i — holding other agents’ types fixed — yet the social choice rule assigns the same outcome at both profiles. The violation is assigned at the level of individual agent–state pairs.

Maximally contextually private protocol: A protocol whose set of contextual privacy violations is inclusion-minimal among all protocols that implement the same social choice rule — equivalently, a protocol that lies on the Pareto frontier of implementation and contextual privacy, such that no other implementing protocol weakly reduces every violation and strictly reduces at least one.

Iterative partition: A directed rooted tree whose nodes are subsets of the type space, where each non-leaf node is split into children by partitioning on a single agent’s type. Any protocol is equivalent (in terms of what the designer learns) to a partitional protocol induced by an iterative partition (Proposition 1).

Individual pivotality: On a product set of type profiles, agent i is individually pivotal if there exist two subsets of agent i’s types such that every type profile from one subset leads to a different outcome than every type profile from the other subset, holding others’ types fixed.

Collective pivotality: Agents are collectively pivotal on a product set if there exist two type profiles in that set with different outcomes. Collective pivotality without any agent being individually pivotal is precisely the condition that forces contextual privacy violations (Theorem 1).

Ascending-join protocol: A specific dynamic protocol for k-item Vickrey auctions that poses threshold queries in ascending order after an initial guess, repeatedly asking agents whether they can rule out a particular outcome. It is maximally contextually private (Theorem 2) and minimally relatively informative among maximally contextually private protocols (Proposition 19), and it achieves privacy protection by delaying queries to agents whose privacy it protects (Proposition 7).

Relative informativeness: A partial order on protocols defined by: protocol P is less relatively informative than P’ if every pair of type profiles P distinguishes is also distinguished by P’. Unlike contextual privacy, relative informativeness treats all disclosures as equally undesirable and does not condition on the social choice rule. The paper positions it as a useful refinement for selecting among contextually privacy-equivalent protocols.

Defying Distance? The Provision of Medical Services in the Digital Age

Mon, 01 Jan 0001 00:00:00 +0000

This paper asks whether digital platforms can improve healthcare outcomes by enabling needs-based matching between patients and physicians unconstrained by geography. Amanda Dahlstrand studies digital primary care in Sweden during 2016-2018, exploiting nationwide conditional random assignment between approximately 200,000 patients and 143 doctors employed by Europe’s largest digital primary care provider. Patients who selected the “first available doctor” option (82% of first visits) were effectively randomized to a doctor within each 3-hour shift-by-date stratum, generating quasi-experimental variation free of the patient-doctor sorting that confounds identification in physical primary care.

The paper defines three observable dimensions of primary care physician skill: (1) identifying risky patients and triaging them to higher levels of care, measured by whether patients subsequently have an avoidable hospitalization within 90 days; (2) providing guideline-consistent treatment, measured by counter-guideline antibiotic prescriptions; and (3) leaving patients sufficiently informed so they do not unnecessarily seek additional in-person care within the following week. Doctor skill in each dimension is estimated via a value-added framework in a hold-out sample (Sample 1, the first 600 randomized consultations per doctor), using empirical Bayes shrinkage to reduce noise. Complementarities between doctor skill and patient risk are then estimated in a disjoint main sample (Sample 2).

A central finding is that doctor skill is task-specific rather than governed by a single latent ability: skills across the three tasks are not positively correlated, meaning doctors within general practice have individual “specializations.” A patient ranked in the top 1% of avoidable hospitalization risk who is matched to a doctor ranked in the top 10% at reducing avoidable hospitalizations experiences a 90% reduction in that adverse outcome, relative to a patient with the same risk profile matched to the worst-performing doctor. Patients not estimated as risky show effects indistinguishable from zero when matched to the same high-skilled doctors, establishing a strong complementarity between doctor type and patient risk.

Using the Average Match Function framework of Graham, Imbens, and Ridder (2014, 2020), the paper evaluates counterfactual reallocation policies. Reallocating only 2% of patients — those in the top 1% of predicted avoidable hospitalization risk — to doctors in the top 10% of triage skill reduces aggregate avoidable hospitalizations by 20% relative to random assignment, without adversely affecting counter-guideline prescriptions or other measured outcomes. Doctor skills across outcomes are not positively correlated, so this reallocation does not generate meaningful trade-offs. The paper benchmarks this matching policy against a selective hiring/expansion policy in which doctors with above-median skill in three tasks expand their hours by up to 70% at the expense of below-median peers; that policy yields no significant reduction in avoidable hospitalizations and only a 4% reduction in counter-guideline prescriptions — smaller gains than matching and harder to implement.

The paper also documents that physical primary care quality is worse in lower-income and more deprived areas of Sweden (a negative relationship between deprivation index and patient-reported experience is statistically significant at the 1% level in a cross-section of roughly 120-150 primary care centers in Region Skane). Because the estimated risk of avoidable hospitalization and prior avoidable hospitalizations are concentrated in the lower end of the income distribution, needs-based digital matching reallocates triage skill toward lower-income patients, severing the correlation between local area income and service quality. Simulating positive assortative matching on patient income and doctor skill — approximating existing healthcare inequalities — leads to more avoidable hospitalizations than random assignment, because the most vulnerable patients tend to be the poorest. Scope conditions: findings derive from a single digital primary care provider in Sweden, 2016-2018, pre-pandemic, covering conditions amenable to video consultation and a patient pool younger and somewhat more urban than the average Swedish citizen.

Q: What is the key identification strategy, and why is it valid in this setting but not in physical primary care? A: Patients who selected the “drop in” (first available doctor) option — 82% of first visits — were assigned to whichever certified doctor was next in the roster within a 3-hour shift-by-date stratum, a by-product of the first-come-first-served queue. Neither patients nor doctors could intervene in this digital process. The author validates the assumption by regressing doctor characteristics on patient characteristics controlling for shift-by-date fixed effects and finds characteristics are balanced. In physical primary care, endemic patient-doctor sorting means doctors do not meet a common support of patient types, preventing causal identification of doctor effects.

Q: How are doctor skill estimates constructed and why does the split-sample matter? A: Doctor skill in each task is estimated as an empirical Bayes-shrunk random effect from a value-added regression on Sample 1, each doctor’s first 600 randomized consultations (40% of the sample). Sample 2 (60%) is entirely disjoint and used to estimate complementarities between doctor skill and patient risk. The split-sample design prevents overfitting: doctor skill was estimated on different patients than those in Sample 2. The Durbin-Wu-Hausman test does not reject random effects (p = 0.16).

Q: What is the main quantitative result on avoidable hospitalization matching? A: A patient ranked in the top 1% of predicted avoidable hospitalization risk matched to a doctor ranked in the top 10% at reducing avoidable hospitalizations could reduce that patient’s avoidable hospitalizations by 90%, relative to the worst-performing doctor in that skill. At the aggregate level, reallocating only 2% of patients (those in the top 1% risk group) to high-triage-skill doctors reduces avoidable hospitalizations across the full patient population by 20% compared to random assignment.

Q: Does the avoidable hospitalization reallocation harm other outcomes? A: No. The paper explicitly evaluates the Average Reallocation Effect on counter-guideline prescriptions and additional in-person care seeking when optimizing for avoidable hospitalizations, and finds no significant adverse effects on these other outcomes. The author attributes this to the fact that doctor skills across tasks are not positively correlated, so reallocating triage-skilled doctors does not systematically remove skill from other dimensions.

Q: How does matching compare to selective hiring and hour expansion as a policy? A: Even expanding the working hours of doctors with above-median skill across three tasks by as much as 70% yields no significant reduction in avoidable hospitalizations and only a 4% reduction in counter-guideline prescriptions — both smaller gains than the matching policy. Matching outperforms hiring expansion because patients have heterogeneous needs that can be identified from prior healthcare records, and doctors have differentiated skill sets relevant to some patients but not others.

Q: What is the evidence that doctor skills are task-specific rather than reflecting a single latent ability? A: The estimated doctor effects across the three tasks — triaging to avoid hospitalizations, guideline-consistent antibiotic prescribing, and minimizing unnecessary follow-up care — are not positively correlated with one another. This means a doctor who is effective at one task is not systematically effective at others, indicating individual specializations within general practice that are not accounted for in standard primary care organization.

Q: How is patient risk for avoidable hospitalizations measured? A: A propensity score is estimated from pre-digital physical healthcare data (2013-2015), regressing past number of avoidable hospitalizations on demographic and healthcare utilization variables — including age, a disease index of chronic diagnoses, and previous hospitalizations — all variables already available in patient medical records. The top 1% of predicted risk scores are classified as “risky.” Patients in the risky group had on average 0.35 avoidable hospitalizations in the prior 3 years, versus 0.01 for non-risky patients.

Q: What is the distributional (equity) implication of needs-based matching versus income-assortative matching? A: Estimated risk of avoidable hospitalization and the count of prior avoidable hospitalizations are concentrated in the lower end of the income distribution. Needs-based matching therefore reallocates triage skill toward lower-income patients. Simulating positive assortative matching on patient income and doctor skill — approximating observed inequalities in physical care — produces more avoidable hospitalizations than random assignment, because the most vulnerable patients are often the poorest. Needs-based digital matching can sever the link between local area income and service quality.

Q: How does digital care usage sort by income and demographics in the data? A: At the extensive margin, the deprivation index (Care Need Index) is similar among digital users and non-users in Region Skane. However, at the intensive margin, individuals with a higher deprivation index who use the digital service have more appointments in it; similarly, lower-income users use the service more intensively. Digital care users are younger than non-users and are more likely to live in cities than the average Swedish citizen.

Q: What are avoidable hospitalizations and why are they the primary outcome? A: Avoidable hospitalizations (also called hospitalizations for ambulatory care sensitive conditions) are hospital admissions defined in the medical literature as preventable by adequate and timely primary care. They are coded using ICD-10 diagnosis codes listed in Page et al. (2007). The most common diagnoses in the 90-day post-consultation window are respiratory and genitourinary, conditions commonly treated in digital care. The outcome is rare (0.2% of patients in the sample), but high-stakes: an estimated 1.1 potential life years are lost per avoidable hospitalization, and in Sweden they cost an estimated SEK 7.1 billion (~$820 million) annually (7% of inpatient curative and rehabilitative care costs).

Q: What is the scope of the counter-guideline antibiotic prescription outcome? A: Non-adherence is coded against 16 guidelines from Sweden’s strategic programme against antibiotic resistance (Strama 2017, 2019), all designed to limit or narrow antibiotic use. The measured rate of non-adherence is described as quite low by international standards; the CDC estimates 28% of US antibiotic prescriptions are unnecessary, while the author’s sample rate is 2%. The guidelines require doctors to sometimes refuse patients who request antibiotics, introducing a behavioral compliance dimension to this skill.

Q: What are the costs and feasibility considerations for implementing needs-based digital matching? A: The paper characterizes matching as a “resource-neutral” policy because it reallocates existing doctors without hiring or training. The primary costs are a small increase in waiting time for some patients and the costs of importing data and developing the matching algorithm. Because the algorithm handles patient-doctor allocation while doctors retain all clinical decision-making, the policy functions as a complement to human skill rather than a substitute, which the author argues makes it less subject to “algorithm aversion.”

Q: Why does the paper restrict to each patient’s first digital consultation only? A: The first visit is the one subject to conditional random assignment; subsequent visits could reflect endogenous selection by patients who preferred a particular doctor or outcome. Using only first visits eliminates this concern. The restriction reduces the sample from approximately 378,000 to 210,171 patients (56% of the original), paired with 143 doctors who each had at least 600 randomized consultations.

Conditional random assignment: The allocation mechanism by which patients selecting the “first available doctor” option in digital primary care were assigned to whichever certified doctor was next in the shift roster, conditional on 3-hour shift-by-date strata — a by-product of the first-come-first-served queue rather than an intended experimental design.

Average Match Function (AMF): The conditional mean of a patient outcome given observable doctor type and patient type under random assignment, β(x,w) = E[Y|X=x, W=w], which serves as the building block for evaluating counterfactual reallocation policies.

Average Reallocation Effect (ARE): The difference in expected patient outcomes between a counterfactual doctor-patient assignment and the status quo random assignment, taking into account the externality on the patient from whom a high-skilled doctor is moved.

Task-specific doctor skill: The paper’s finding that primary care physician effectiveness is not governed by a single latent ability but varies across distinct tasks — triage/risk prediction, guideline-consistent prescribing, and minimizing unnecessary follow-up care — with skills across tasks not positively correlated.

Avoidable hospitalization: A hospital admission coded to a diagnosis (per Page et al. 2007 ICD-10 classification) defined in the medical literature as preventable by adequate and timely primary care, used as the primary high-stakes outcome measure (0.2% incidence in the sample within 90 days of a digital consultation).

Counter-guideline prescription: A prescription of an antibiotic in violation of one of 16 guidelines from Sweden’s Strama antibiotic resistance programme, all of which are designed to limit use or require narrower-spectrum first-line antibiotics; used as the primary guideline-adherence outcome (2% incidence in the sample).

Empirical Bayes shrinkage: A procedure applied to raw doctor value-added estimates in which the noisy estimate of doctor quality is multiplied by the ratio of signal variance to total (signal plus noise) variance, yielding a best linear predictor of the underlying doctor random effect and reducing noise from small-sample estimation.

Designing Dynamic Reassignment Mechanisms: Evidence from GP Allocation

Mon, 01 Jan 0001 00:00:00 +0000

This paper studies the design of dynamic reassignment mechanisms—centralized systems that must not only provide good initial matches but also accommodate changes in agents’ preferences over time. The empirical setting is Norway’s system for allocating patients to general practitioners (GPs), where every individual is assigned a specific GP whose panel has a binding capacity cap. Since 2016, Norway has allowed patients to join waitlists for oversubscribed GPs while retaining their spot on their current GP’s panel, with reassignment proceeding strictly first-come, first-served (FCFS) as vacancies arise.

The paper makes three contributions. First, it provides direct evidence of unrealized gains from trade: in December 2019, 15 percent of the 133,332 patients then standing on waitlists could have been immediately reassigned via a single run of the Top-Trading Cycles (TTC) algorithm, which identifies not only bilateral swaps but arbitrary cycles. A mechanical simulation holding patient choices fixed shows that running TTC monthly from November 2016 through December 2019 would have left 23 percent fewer patients on waitlists by end-2019, with average waiting times among reassigned patients 29 percent shorter.

Second, the paper introduces a dynamic TTC mechanism and clarifies why static properties do not carry over. In the static case, TTC is both strategy-proof and Pareto-improving (Shapley and Scarf, 1974; Roth, 1982). In a dynamic setting, neither property holds. Repeated TTC is not strategy-proof because patients’ GP choices affect how long they wait. More importantly, TTC may leave some patients worse off: a panel slot that would have gone to the first person on a waitlist under FCFS may instead go to a later-arriving patient who can form a trading cycle, effectively de-prioritizing patients whose GPs are undersubscribed. In the mechanical simulation, 4.5 percent of patients face longer waiting times under TTC.

Third, the paper estimates a structural model of patient attention and GP choice using monthly Norwegian administrative data covering 4.78 million patients and 6,470 GP panels (2014–2019), restricting estimation to the Trondelag region (approximately 8 percent of the country). The model specifies: a Poisson attention process (patients consider switching only when an attention shock arrives); preferences over GPs as a function of travel time, GP fixed effects, and match characteristics; and a belief model mapping observed waitlist lengths into expected waiting times. Parameters are recovered via a Gibbs sampler with Metropolis-Hastings for the discount rate. Key estimates: the annual discount factor is approximately 0.91; a female patient under 45 would travel 7.3 minutes farther to see a female GP (6.3 minutes for a female patient over 45); GP fixed effects have a standard deviation of 31 minutes’ travel-time equivalent; idiosyncratic taste shocks have a standard deviation of 12.6 minutes.

The paper then simulates a stationary equilibrium for each counterfactual mechanism. Under the status quo in stationary equilibrium, 9.4 percent of patients are on a waitlist, 82.2 percent of GPs have a waitlist, and average expected waiting time is 16.7 months. Introducing TTC reduces average waiting time to 14.1 months and raises mean patient welfare by the equivalent of 0.75 minutes’ travel time (more than 13 percent of the gain achievable under a no-capacity-constraints benchmark). Over half of this gain (0.4 minutes) comes directly from patients obtaining geographically closer GPs. Benefits are concentrated among younger patients, female patients, and recent movers; rural patients gain 2.1 minutes. However, patients with undersubscribed GPs face waiting times that rise from 16.7 to 22.8 months and are worse off by the perpetuity equivalent of 0.8 minutes.

Two modified mechanisms are evaluated. Deferred Acceptance (DA), which strictly respects FCFS priority, achieves essentially no improvement over the status quo, illustrating a fundamental trade-off between eliminating envy and exploiting gains from trade. A “TTC with Priority” (TTCP) mechanism, which gives priority for panel vacancies to patients with undersubscribed GPs before running TTC, achieves 61 percent of TTC’s welfare gains (0.46 minutes flow payoff; 1.08 minutes NPV) while leaving patients with undersubscribed GPs no worse off than under the status quo. A benchmark simulation eliminating waitlists altogether raises mean welfare slightly (0.19 minutes) but lowers median welfare (−0.60 minutes), with gains concentrated among highly mismatched patients.

Q: What is the core market failure the paper documents? A: Norway’s waitlist mechanism assigns panel vacancies strictly first-come, first-served without allowing patients to trade. This creates a “double coincidence of wants” problem: patients can simultaneously be on each other’s waitlists but cannot swap. In December 2019, 15 percent of 133,332 waiting patients could have been immediately reassigned via a single TTC run. A mechanical simulation shows that monthly TTC would have left 23 percent fewer patients on waitlists by end-2019 and reduced average realized waiting times among reassigned patients by 29 percent.

Q: Why does TTC fail to be strategy-proof in a dynamic setting? A: In the static case, TTC gives every agent an assignment at least as good as their endowment, making truthful reporting a dominant strategy. In a dynamic setting, a patient’s choice of GP determines not only which GP they receive but also how long they wait — patients who choose less-demanded GPs reach the front of the waitlist faster. This creates incentives to misreport preferences strategically, breaking strategy-proofness. The paper shows this formally and builds it into the equilibrium model by requiring patients to optimize over both GP choice and expected waiting time.

Q: Why does dynamic TTC harm some patients relative to the status quo? A: Under FCFS, the first person on a waitlist is guaranteed the next available slot on the target GP’s panel. Under TTC, a patient who arrived later but whose current GP is oversubscribed can form a trading cycle that redirects that slot, effectively jumping the queue. Patients with undersubscribed GPs — whose panel endowment is not a scarce resource that others want — cannot form cycles and are systematically de-prioritized. In the stationary equilibrium, their expected waiting time rises from 16.7 to 22.8 months, and they are worse off by the perpetuity equivalent of 0.8 minutes’ travel time.

Q: What are the main parameter estimates and what do they imply? A: The annual discount factor is estimated at approximately 0.91 once GP fixed effects are included (rising to near 0.95 without them, because more desirable GPs have longer waitlists). Gender homophily is worth 6.3–7.3 minutes of travel time for female patients under 45. Age homophily is worth approximately 1 minute. The standard deviation of GP fixed effects is 31 minutes and idiosyncratic shocks are 12.6 minutes, both in travel-time equivalents, indicating substantial horizontal differentiation across GPs and across patients’ idiosyncratic tastes.

Q: How important are moves as a driver of GP switching? A: Moves are the dominant driver. Among non-movers, older men consider switching just once every 25 years; temporary residents consider switching approximately once every 7.5 years (1.084 percent per month). Among patients who moved more than 30 minutes, a temporary resident has an 18.59 percent monthly probability of considering switching in the month of or month after the move. For a permanent resident making a long-distance move, the cumulative attention probability over the 8 months surrounding the move rises to 34 percent (versus 22 percent for a short-distance move). In the data, 26 percent of waitlist users moved municipality during 2017–2019, versus 6 percent of non-switchers.

Q: What does the stationary equilibrium under the status quo look like? A: In the long-run stationary equilibrium, 9.4 percent of patients are on a waitlist, 82.2 percent of GPs have a waitlist, and the average expected waiting time to switch GPs is 16.7 months. Each month, 2,299 patients on average draw attention shocks; 85.2 percent of these choose to join a waitlist, while the remainder either switch to an open GP or stay with their current GP. The average attentive patient expects to successfully obtain their chosen GP after 16.8 months.

Q: What are the distributional consequences of TTC across patient subgroups? A: Female patients benefit especially because they are more likely to be attentive (and thus use waitlists) than males. Recent movers gain 2.3 minutes’ travel-time equivalent. Patients who have never moved still gain 1.0 minutes. Rural patients gain 2.1 minutes (larger than average), reflecting their longer baseline travel times and greater geographic mismatch potential. Urban patients also benefit but less so. The one group that is harmed is patients with undersubscribed GPs, who face longer waits and a welfare loss of 0.8 minutes perpetuity equivalent.

Q: Why does the Deferred Acceptance mechanism fail to improve on the status quo? A: DA strictly respects FCFS waiting-time priority: no patient may be reassigned to a GP for whom another patient has been waiting longer. This means DA can only execute swaps in which all patients ahead of each participant on their respective waitlists are also reassigned in the same month. In practice, this virtually never occurs, so DA reassigns almost no patients earlier than the status quo Waitlists mechanism. The result illustrates a fundamental trade-off: fully respecting FCFS priority eliminates nearly all gains from trade.

Q: How does TTCP restore fairness while preserving most of the efficiency gains? A: TTCP modifies TTC by prioritizing patients with undersubscribed GPs over those with oversubscribed GPs when assigning panel vacancies, while still respecting the constraint that patients cannot be assigned a GP they prefer less than their current one. This gives patients with undersubscribed GPs a compensating advantage in the queue that offsets their inability to trade via cycles. TTCP achieves 0.46 minutes’ mean flow payoff improvement versus 0.75 for TTC (61 percent of TTC’s gains), and an NPV measure of 1.08 minutes versus 1.25 for TTC. Patients with undersubscribed GPs are left no worse off than under the status quo.

Q: What happens when waitlists are eliminated entirely? A: Under No Waitlists, attentive patients may only choose among GPs with open panels at the moment of attention. Mean welfare rises slightly (0.19 minutes) because patients spend less time mismatched while waiting, but median welfare falls by 0.60 minutes. The gains are concentrated among a minority of highly mismatched patients who prefer limited choice with no waiting over broader choice with long waits, while most patients prefer the option to wait for a more preferred GP. The authors note this may partly explain why formal waitlists are rare in other primary care systems.

Q: What is the welfare benchmark and how large are the gains? A: The benchmark is a “No Caps” scenario in which all panel caps are removed, representing the maximum achievable improvement. The mean welfare gain from TTC (0.75 minutes) represents more than 13 percent of this upper bound. The “Truthful TTC” benchmark, where patients submit full preference lists, yields 1.04 minutes, but its gains are also concentrated: the median patient is no better off than under the status quo Waitlists mechanism.

Q: What are the scope conditions for these findings? A: The demand model is estimated on the Trondelag region of Norway (approximately 8 percent of the national population) over 2017–2019, a period when waitlists were growing rapidly rather than in steady state. Counterfactual comparisons are made in a stationary equilibrium calibrated to Trondelag. The model excludes patients under 16 (whose enrollment is managed by parents). The partially capitated payment structure and fixed panel caps are institutional features specific to Norway, though similar systems exist in Canada, the UK, Italy, and Sweden. GP characteristics are held fixed in the model. The analysis abstracts from health outcomes, focusing on preference-based welfare from GP assignment.

Top-Trading Cycles (TTC) algorithm: A centralized reassignment algorithm that takes agents’ preference lists and objects’ priority lists as inputs, has each agent “point to” their preferred object and each object “point to” their highest-priority current or waiting agent, identifies cycles of mutual pointing, and executes the trades in those cycles simultaneously. In the paper’s static application, TTC is both Pareto-improving (every participant receives an assignment at least as good as their endowment) and strategy-proof. In the dynamic setting studied here, neither property holds.

Dynamic TTC mechanism: A mechanism that runs the TTC algorithm repeatedly at the end of each period after naturally arising vacancies have been filled from waitlists. Because patients’ GP choices affect how long they wait — not only which GP they receive — this mechanism is not strategy-proof and may leave patients with undersubscribed GPs worse off than under strictly FCFS waitlists.

TTC with Priority (TTCP): A modified version of dynamic TTC that changes the priority ordering so that patients with undersubscribed current GPs are prioritized above patients with oversubscribed GPs when panel vacancies are allocated. This modification preserves patients’ endowment rights but compensates the group harmed by standard TTC. In the paper’s simulations, TTCP achieves 61 percent of TTC’s mean welfare gains while leaving patients with undersubscribed GPs no worse off than under the status quo.

Patient attention model: A model in which patients consider switching GPs only when they receive a Poisson-distributed attention shock. Attention rates vary by observable characteristics (age, gender, temporary vs. permanent residency, whether and how far the patient recently moved). The model interprets any switch request as evidence of both an attention shock and a preference for the requested GP over the current one. Patients who do not request switches may be either inattentive or attentive but satisfied.

Horizontal differentiation (GP preference heterogeneity): The extent to which different patients prefer different GPs for reasons unrelated to overall GP quality — primarily driven by geographic proximity, gender homophily (worth 6.3–7.3 travel-time-equivalent minutes for young female patients), and age similarity (approximately 1 minute). Horizontal differentiation is the fundamental source of gains from trade: if all patients preferred the same GP, there would be no mutual-benefit swaps to find.

Deferred Acceptance (DA) algorithm: The patient-proposing DA algorithm, which strictly respects FCFS waiting-time priority: no patient may be reassigned ahead of another patient who has been waiting longer for the same GP. In the dynamic context, DA achieves essentially no welfare improvement over the status quo because its strict respect for priority eliminates nearly all trading opportunities, illustrating the trade-off between envy-freeness and efficiency.

Double coincidence of wants: The situation in which two (or more) patients are simultaneously on each other’s waitlists and would mutually benefit from trading GP assignments, but cannot do so under the current mechanism because there is no vacancy on either panel. The paper’s direct evidence of this phenomenon — 15 percent of waiters could be immediately reassigned via one TTC run — motivates the counterfactual analysis.

Education and the Margins of Cyclical Adjustment in the Labor Market

Mon, 01 Jan 0001 00:00:00 +0000

Overview

Research question. This paper asks how the cyclical sensitivity of wages varies with workers’ educational attainment, what mechanisms drive the differences, and what the welfare consequences are of ignoring this heterogeneity. The starting point is a well-known asymmetry: less-educated workers have much higher and more volatile job separation rates, yet the standard macroeconomic literature has treated wages as roughly acyclical for a representative worker. Doniger asks whether this employment-centric picture is incomplete—and finds that it is, in a direction opposite to what the employment pattern would suggest.

Data and methodology. The paper uses two primary data sources: the National Longitudinal Survey of Youth 1979 (NLSY), which provides detailed job histories enabling identification of current and completed employer tenure, and the Current Population Survey (CPS) from 1995 to 2020, used both for employment flow statistics and, via biennial Job Tenure Supplements, for replication of the main wage findings. The sample is restricted throughout to males with 0–30 years of potential experience, following the conventions of the user-cost-of-labor (UCL) literature (Kudlyak, 2014; Basu and House, 2016). Workers are grouped into three educational categories: less than high school, high school or some college, and bachelor’s degree or more.

A key methodological contribution is a new, more parsimonious estimator for the cyclical sensitivity of the UCL. Rather than the multi-step indicator-variable approach of Kudlyak (2014), the paper recovers the UCL sensitivity from interaction terms between a flexible function of tenure and the cyclical position at the time of hiring, estimated within an augmented Mincer regression. This estimator admits higher-frequency identification, enables transparent inference via the delta method, and facilitates nonparametric impulse response estimation via the Jorda (2005) local projection method. Cyclical position is measured primarily as the deviation of the unemployment rate from an HP-filtered trend (lambda = 100,000), with robustness checks using the Hamilton (2018) filter and GDP-based detrending.

Main findings — employment. Monthly separation rates from the CPS (1995–2020) show that workers with less than a high school degree separate at a rate of 9.4 percent per month, more than twice the 3.4 percent rate for workers with a bachelor’s degree or more, regardless of cyclical position. The volatility of the separation rate (measured by the time-series standard deviation) is also larger for the least educated (1.7) than for the most educated (0.6). All sub-components of separation-to unemployment, to inactivity, and job-to-job transitions-exhibit the same ordering. In response to a 100 basis point monetary policy contraction (Romer and Romer, 2004 shocks), employment of workers with less than a high school education falls significantly, while employment of college graduates or more is statistically unaffected.

Main findings — wages. Using the NLSY, the cyclical sensitivity of the UCL to a 1 percentage point deviation of the unemployment rate from trend is estimated at approximately −15.5 percent for workers with a bachelor’s degree or more, −4.9 percent for high school or some college workers, and −1.4 percent (statistically indistinguishable from zero) for workers without a high school degree. In contrast, average hourly earnings (AHE) show much smaller and more compressed differences across education groups (−1.4, −1.1, and −1.0 percent respectively). The pattern of increasing procyclicality with education holds for new hires’ wages (NHW) as well but is considerably less stark than for the UCL. Replication in the CPS confirms the ordering: UCL sensitivities are −7.0 percent for college graduates, −2.9 percent for high school or some college, and effectively zero for those without a high school degree.

Mechanism. Counterfactual decompositions show that differences in the cyclical sensitivity of the wage-tenure profile—not just differences in job duration (separation rates)-account for the vast majority of the divergence across education groups. When separation rates are held constant across groups, the UCL sensitivity of the college-educated falls from -15.5 to −13.0 percent; when wage-tenure profile sensitivities are held constant, it falls to −6.3 percent, and the ordering across groups largely disappears. This finding is consistent with implicit contracting theory (Thomas and Worrall, 1988): longer expected employment durations for the more educated make it optimal to defer a greater share of the wage response to shocks over time, rendering near-term rigidities functionally less binding and producing more persistent effects of hiring-period conditions on subsequent wages.

Robustness. After controlling for cyclical sorting in match quality using the Hagedorn and Manovskii (2013) proxies (cumulated market tightness during tenure and leading up to the present job), the UCL sensitivity for college graduates falls modestly to −12.4 percent, confirming that match-quality composition effects account for only a minority of the documented pattern. The monetary policy shock analysis (Romer-Romer shocks identified from Greenbook forecast errors) yields a 35 percent decrease in the UCL for the most educated at the two-year horizon following a 100 basis point contraction, with no discernible effect for the least educated.

Welfare consequences. Using a stylized New Keynesian model extended to two labor varieties with heterogeneous wage flexibility, the paper shows that ignoring the documented heterogeneity leads to underestimating the welfare costs of business cycle fluctuations by more than 15 percent under the baseline calibration (unit Frisch elasticity and unit elasticity of intertemporal substitution). Conditional on this model, the welfare loss due to fluctuations for the least educated is more than 15 times larger than for the most educated. The paper explicitly notes this is a conservative lower bound, because the model assumes pooled household consumption, and admitting idiosyncratic consumption risk would disproportionately burden less-educated workers who bear adjustment on the extensive (employment) rather than intensive (wage) margin.

In depth

Q1. What is the user cost of labor (UCL), and why does the paper use it rather than average hourly earnings or new hires’ wages?

The UCL, formalized by Kudlyak (2014), is the present discounted value of wage payments an employer expects to make to a worker over the duration of the employment relationship, net of the continuation value of retaining that worker. It equals the new hire’s wage plus the expected wage wedge—the discounted stream of future wage differences between workers hired in the current period versus workers hired one period later. Unlike average hourly earnings or new hires’ wages, the UCL captures the persistent effects of macroeconomic conditions at the time of hiring on all future remitted wages, making it the appropriate allocative wage concept from a macroeconomic standpoint. The paper documents that AHE understates the cyclicality of wages for all groups but especially for the most educated, because AHE omits the highly cyclically sensitive expected wage wedge that characterizes college-educated employment relationships.

Q2. How does the paper’s new estimator for the cyclical sensitivity of the UCL differ from the existing method, and what does this enable?

The existing Kudlyak (2014)/Basu and House (2016) method recovers the UCL by estimating a very large set of date-of-hire x current-date indicator interactions, constructing a time series of the UCL, and then analyzing that series—a multi-step procedure that loses covariances across steps and makes cross-sectional disaggregation or high-frequency identification impractical. The new method instead estimates the UCL sensitivity directly from coefficients on the interaction between a flexible tenure function and the cyclical position at hiring, estimated within a single augmented Mincer regression. The UCL semi-elasticity is recovered analytically from these coefficients via a formula that sums discounted weighted differences in the tenure-interaction coefficients across the tenure horizon. This single-step approach allows transparent inference via the delta method, enables fully interacted specifications for heterogeneous subgroups, permits the hiring-date frequency (e.g., weekly in NLSY) to differ from the wage observation frequency (annual or biannual), and permits estimation from repeated cross-sections—all of which were infeasible in the prior approach.

Q3. What are the quantitative magnitudes of the education gradient in UCL cyclicality, and how do they compare across wage measures?

Using the NLSY with unemployment deviations from HP-filtered trend as the cyclical indicator: the UCL sensitivity is −15.5 percent (se 3.86) for workers with a bachelor’s degree or more, −4.9 percent (se 1.52) for high school or some college, and −1.4 percent (se 2.48, statistically insignificant) for those without a high school degree. By contrast, new hires’ wages show sensitivities of −3.4, −1.8, and −1.2 percent respectively, and average hourly earnings show −1.4, −1.1, and −1.0 percent. The gradient is largest and most statistically significant for the UCL, indicating that the bulk of the education gap in cyclical wage sensitivity operates through the persistent effect of hiring-period conditions on subsequent wages rather than through the contemporaneous wage alone.

Q4. What mechanism accounts for the UCL gradient — differential job durations or differential sensitivity of the wage-tenure profile?

The paper decomposes the UCL into the new hire’s wage and the expected wage wedge, and performs counterfactual exercises holding either separation rates or wage-tenure profile sensitivities constant across education groups (Table 3). Holding separation rates constant while allowing wage-tenure profiles to differ reduces the college-educated UCL sensitivity only modestly, from -15.5 to −13.0 percent; holding wage-tenure profile sensitivities constant while allowing separation rates to differ reduces the college-educated sensitivity to −6.3 percent and compresses the education gradient substantially. Thus, differential sensitivity of the wage-tenure profile—the degree to which wages continue to respond to hiring-period conditions over the course of the job-is the primary driver of the UCL gradient, with differential separation rates playing a secondary but non-trivial role. This finding confirms the prediction of Thomas and Worrall (1988) that lower separation rates support greater use of deferred payment and intertemporal risk sharing in optimal wage contracts.

Q5. How does the paper rule out cyclical sorting in match quality as the explanation for the UCL gradient?

Workers hired during recessions may be of systematically lower match quality, producing persistently lower wages not because wages are more cyclically sensitive for the same quality match but because recession hires are worse matches. Using the Hagedorn and Manovskii (2013) proxies for match quality - cumulated market tightness during the worker’s tenure on the present job (mjob) and on all prior jobs leading to it (mctj) - the paper augments the wage regression with full interactions between these proxies and the tenure-cyclicality terms. After controlling for match quality, the UCL sensitivity for college graduates falls from -15.5 to −12.4 percent (se 5.56); the point estimate remains large, statistically significant, and well above the estimates for lower-education groups. Figure 4 shows that match-quality adjustment primarily affects the first two years of the wage-tenure profile, after which the bias from cyclical sorting fades, confirming that scarring in remuneration for college graduates hired in recessions persists beyond what sorting can explain.

Q6. What do monetary policy shocks reveal about the education gradient in wage sensitivity?

Monetary policy shocks (identified from Greenbook forecast errors as in Romer and Romer, 2004) subject all labor markets to the same aggregate demand shock simultaneously, providing a cleaner test of differential responsiveness than cyclical regressions that may conflate demand composition and supply factors. Using Jorda (2005) local projections, a 100 basis point monetary policy contraction is associated with a 35 percent decrease in the UCL for workers with a bachelor’s degree or more at the two-year horizon, with statistically insignificant effects on the UCL of workers without a high school degree. The employment results are symmetric: less-educated workers’ employment falls significantly after a monetary contraction, while college-educated workers’ employment is unaffected. This cross-validation using monetary policy shocks supports the main thesis that more-educated workers absorb aggregate demand variation through the wage margin, while less-educated workers absorb it through the employment margin.

Q7. How does acyclical wages for the least educated affect interpretation of the existing macro literature on wage rigidity?

The aggregate finding of Kudlyak (2014) and Basu and House (2016)-that the UCL is more procyclical than new hires’ wages or average hourly earnings, casting doubt on wage rigidity as an amplification mechanism—holds only for educated workers. The paper finds that the UCL for workers without a high school degree is statistically acyclical by all three wage measures. This result restores a potential role for nominal wage rigidity in generating amplification and persistence of shocks for less-educated labor markets, including in the Diamond-Mortensen-Pisarides class of search models criticized by Kudlyak (2014) and in New Keynesian models criticized by Basu and House (2016). The paper therefore reconciles the literature on wage rigidity with the empirical finding of cyclical employment volatility concentrated among the less educated.

Q8. What is the welfare calculation, and what are its key results and limitations?

The welfare exercise uses a parsimonious New Keynesian model with two labor varieties (capturing more- and less-educated workers) and price and wage rigidities. The model is extended to admit heterogeneous wage flexibility, and the welfare costs of fluctuations are evaluated following the second-order approximation method of Gali et al. (2007). Under the baseline calibration (unit Frisch elasticity, unit elasticity of intertemporal substitution), the heterogeneous-worker economy incurs welfare costs of fluctuations that exceed those of the output-gap-equivalent representative agent economy by more than 15 percent. The welfare loss of the least-educated workers is more than 15 times that of the most educated. The paper explicitly characterizes this as a conservative lower bound: the model assumes pooled household consumption (within varieties), which implies equal consumption sensitivity across education groups, whereas in reality less-educated workers face income loss on the extensive margin without the wage smoothing available to the more educated. Relaxing this assumption, as in Krusell et al. (2009), could yield welfare losses an order of magnitude larger.

Q9. What does the CPS replication add, and what are its limitations relative to the NLSY baseline?

The CPS replication (Table 7) confirms the main ordering: UCL sensitivities are −7.0, −2.9, and approximately 0 percent for college graduates, high school or some college, and less than high school respectively. This rules out the concern that the NLSY findings are artifacts of the single aging cohort that characterizes the NLSY 1979. However, the CPS must be treated as a repeated cross-section because the tenure data are only available biennially and individual-level panel linkage across tenure supplement waves is infeasible. As a result, the CPS estimates cannot include individual fixed effects and must rely more heavily on observable controls (industry, occupation) to absorb cyclical variation in workforce composition. The CPS also precludes the match-quality controls of Hagedorn and Manovskii (2013). Despite these limitations, the main qualitative and directional findings replicate.

Q10. What policy implications does the paper draw for monetary policy?

The paper argues that because less-educated workers bear adjustment to aggregate demand shocks disproportionately through the employment margin while their wages are acyclical, welfare assessments that focus on the aggregate output gap underweight the costs borne by less-educated workers. The paper suggests that re-optimizing the monetary policy rule to account for documented heterogeneity would entail placing greater weight on the unemployment rate of the least-educated when measuring the output gap. More broadly, the K-shaped nature of labor market adjustment across education groups — wage scarring for the educated versus employment volatility for the less educated - implies that policies targeting either margin in isolation will miss welfare costs concentrated in the other group.

Key Concepts

User Cost of Labor (UCL). The allocative wage from the employer’s perspective, defined as the present discounted value of expected future wage payments to a worker hired at date t, net of the continuation value of retaining that worker in the next period. Formally, UCL_t = w_{t,t} + E_t[sum beta^j(1-s)^j (w_{t+j,t} - w_{t+j,t+1})], decomposing into the new hire’s wage and the expected wage wedge. In this paper’s usage, the UCL is the appropriate measure of the cyclical impact of shocks on labor costs because it captures persistent effects of hiring-period conditions on the entire subsequent wage sequence, not just the contemporaneous wage.

Expected Wage Wedge (EWW). The component of the UCL beyond the new hire’s wage: the discounted stream of differences between wages a worker hired at date t will receive in future periods and the wages a worker hired one period later would receive in those same future periods. The EWW is non-zero whenever wages are history-dependent - i.e., whenever current macroeconomic conditions at the time of hiring affect future remitted wages. The paper finds that the EWW is larger, more negative, and more persistent for more-educated workers conditional on being hired during a cyclical downturn.

Self-enforcing implicit wage contract. A labor contract in which the sequence of remitted wages is not pinned down period-by-period by spot-market forces but instead reflects an intertemporal risk-sharing arrangement between employer and worker that is sustained by the mutual benefit of the ongoing employment relationship. In this paper’s framework (drawing on Thomas and Worrall, 1988), lower separation rates make longer planning horizons feasible, which in turn expands the scope for deferring wage adjustments across time - effectively allowing more-educated workers and their employers to smooth the effects of cyclical shocks over longer horizons than is possible for less-educated workers with shorter expected job durations.

Cyclical sorting / match quality bias. The compositional concern that workers hired during recessions may be of systematically different (in this context, lower) match quality than those hired during booms, so that the persistent wage depression observed for recession hires could reflect poor match quality rather than cyclically sensitive wages for equivalent-quality matches. The paper uses the Hagedorn and Manovskii (2013) proxies - cumulated labor market tightness during the current job and prior employment history - to control for cyclical variation in match quality and assess the residual sensitivity of the UCL for average-quality matches.

Extensive versus intensive margin of labor market adjustment. The distinction between adjustment through changes in the number of workers employed (extensive margin: hiring and separation) versus adjustment through changes in wages or hours conditional on employment (intensive margin). A central finding of the paper is that less-educated workers bear cyclical adjustment disproportionately on the extensive margin (more volatile separation rates, employment losses following monetary contractions) while their wages are acyclical, whereas more-educated workers exhibit the reverse: stable employment but highly cyclically sensitive wages, especially as measured by the UCL.

Wage scarring. The persistent negative effect of hiring-period macroeconomic conditions on wages throughout the subsequent employment spell, beyond what is explained by contemporaneous market conditions. In this paper’s context, wage scarring is concentrated among more-educated workers: being hired when the unemployment rate is one percentage point above trend is associated with wages that remain depressed for several years, with the depression being larger and more persistent for college-educated workers than for those with less education. This is demonstrated via the expected wage wedge profiles in Figure 3 and is confirmed to survive controls for match-quality sorting.

Output-gap-equivalent representative agent economy. A conceptual benchmark constructed in the paper’s welfare analysis: a single-worker-type New Keynesian economy whose wage and labor supply elasticities are set equal to the output-elasticity-weighted averages of the two labor variety types in the heterogeneous economy. The paper shows that the heterogeneous-worker economy and this representative-agent benchmark produce identical aggregate output gap and price level paths (under Cobb-Douglas production, earnings elasticities are identical across varieties), but welfare diverges because period utility is more volatile for the variety with more rigid wages. The 15 percent excess welfare cost of the heterogeneous economy relative to this benchmark is the paper’s headline welfare result.

Energy Transitions in Regulated Markets

Mon, 01 Jan 0001 00:00:00 +0000

This paper asks how rate-of-return (RoR) regulation in U.S. electricity markets affects the speed and efficiency of energy transitions, specifically the transition from coal to combined-cycle natural gas (CCNG) generation driven by fracking-induced cost declines. The authors build and estimate a structural model of regulated utility behavior in which utilities optimize investment, retirement, and hourly operations decisions against an incentive structure set by state Public Utility Commissions (PUCs).

The regulatory environment combines two instruments: (1) an allowable rate of return that is decreasing in consumer electricity rates (incentive regulation), parameterized as s = (r/r₀)^{-γ}, where higher γ penalizes high-cost outcomes more severely; and (2) a “used-and-useful” standard in which a coal plant’s contribution to the rate base depends on its capacity utilization via a logit function. These two instruments create a tension: utilities want to lower costs to earn a higher RoR, but also want to run existing coal plants—even when uneconomical—to prove they are “used and useful” and thus maximize their rate base and profits.

The authors estimate the model using publicly available EIA and EPA CEMS data spanning 2006–2017, covering 39 unique regulated utilities in the Eastern Interconnection across more than 4 million utility-hour observations (459 utility-years). Structural parameters are recovered via a nested fixed-point indirect inference approach that matches simulated regression coefficients to actual data; investment and retirement costs are estimated with a GMM nested fixed-point approach.

Key reduced-form findings confirm the model’s two core mechanisms. First, a 10% increase in total variable costs is associated with a 2.5% decrease in variable profits per MW of capacity (with utility fixed effects), consistent with incentive regulation. Second, regulated utilities reduce coal generation by only a statistically insignificant 4.2 percentage points when coal fuel costs exceed import prices, compared to 16.1 percentage points for restructured utilities—consistent with regulated utilities running coal out-of-dispatch order to preserve used-and-useful status.

In counterfactual simulations that impose 2018–20 natural gas prices ($2.01/MMBtu versus the 2006 price of $7.24/MMBtu) on utilities with their 2006 capital stocks, regulated utilities retire only 53% of coal capacity over 30 years and increase CCNG capacity by 296%, whereas a cost minimizer would retire most coal capacity while increasing CCNG by only 58%. The Averch-Johnson over-investment effect dominates: regulated utilities over-invest in CCNG while simultaneously over-using legacy coal.

Carbon taxes on regulated utilities reduce short-run coal generation only 48% as much as when imposed on a cost minimizer (because the used-and-useful incentive partially offsets the carbon price signal), but in the long run result in 68% lower coal capacity and 77% lower coal generation relative to baseline by year 30—larger effects than for the cost minimizer. Eliminating the coal usage incentive (μ₂ = 0) produces 82% lower coal capacity and 92% lower coal generation over 30 years but requires utility variable profits to fall by over $300 million, threatening reliability without compensating transfers.

Scope conditions: Results apply to regulated (non-restructured) utilities in the Eastern Interconnection, 2006–2017. The model estimates the coal-to-CCNG transition only; it explicitly does not model the ongoing transition to renewables and storage due to insufficient data variation.

In depth

Q1. What is the central research question?

The paper asks whether and how rate-of-return regulation in U.S. electricity markets slows energy transitions, and what alternative regulatory structures or carbon tax policies could accelerate the transition away from coal. It addresses this both theoretically—through a structural model of regulated utility behavior—and empirically, through estimation and counterfactual simulation using data on 39 regulated utilities over 2006–2017.

Q2. What are the two key regulatory instruments in the model, and what distortions do they create?

The first instrument is incentive regulation: the allowable rate of return declines as consumer electricity rates rise (s = (r/r₀)^{-γ}), so utilities have an incentive to lower costs. The second is the used-and-useful standard: a coal plant’s contribution to the rate base depends on its capacity utilization via a logit function, creating an incentive to run coal plants even when their fuel costs exceed import prices. Together, these instruments generate a tension between cost-reduction incentives and legacy-capacity-preservation incentives, causing the regulated utility to both over-invest in new CCNG capacity (Averch-Johnson effect) and over-use existing coal capacity relative to the cost-minimizing benchmark.

Q3. What does the reduced-form evidence show about uneconomical coal usage?

In a triple-difference specification, regulated utilities reduce coal generation by only 4.2 percentage points (statistically insignificant) when coal fuel costs exceed import prices, compared to a 16.1 percentage point reduction for restructured utilities. CCNG generation responds similarly under both regulatory regimes (21.1 vs. 19.7 percentage points), confirming that the distortion is specific to legacy coal under RoR regulation and not a general feature of high-cost generation. The six states with the largest responsiveness of coal usage to low market prices are all restructured states; out-of-dispatch-order coal generation also correlates strongly with utility ownership share across states.

Q4. What do the structural parameter estimates reveal about the rate base?

Each MW of CCNG capacity increases the rate base by $229,000. When fully utilized, each MW of coal capacity contributes 1.144 times as much as CCNG. When coal is not fully used, unused coal capacity contributes only 40% as much to the rate base as CCNG. NGT capacity contributes 79% more to the rate base than CCNG per MW. Operations cost estimates include O&M costs of $12.89/MWh for coal, $8.82/MWh for CCNG, and $44.63/MWh for NGT; a 100 MW coal ramp in one hour costs $4,770 versus $3,860 for CCNG.

Q5. What happens in the 30-year long-run counterfactual under the baseline regulated utility?

Facing a sudden drop to 2018–20 natural gas prices ($2.01/MMBtu vs. $7.24/MMBtu in 2006), regulated utilities retire 53% of coal capacity and increase CCNG capacity by 296% over 30 years. The Averch-Johnson over-investment effect dominates: utilities invest heavily in CCNG while retaining and using legacy coal far longer than a cost minimizer would. The social planner effectively eliminates coal generation immediately (99% reduction in the first period) and retires almost all coal capacity over the horizon.

Q6. How does a cost minimizer behave relative to the regulated utility in the same long-run counterfactual?

A cost minimizer immediately reduces coal generation by 50% in the first period and retires most coal capacity over 30 years while increasing CCNG capacity by only 58%—versus the regulated utility’s 296% CCNG increase. Thirty years after the shock, the cost minimizer has retired 71% more coal capacity than the regulated utility. The cost minimizer’s much smaller CCNG expansion reflects that it does not face Averch-Johnson incentives to over-invest in rate-base capital.

Q7. What is the short-run vs. long-run impact of carbon taxes on regulated utilities compared to cost minimizers?

In the short run, carbon taxes on regulated utilities reduce coal generation only 48% as much as when imposed on a cost minimizer (34% vs. ~100% in immediate generation drop), because the used-and-useful incentive counteracts the carbon price signal. In the long run (30-year horizon), however, carbon taxes on regulated utilities result in 68% lower coal capacity and 77% lower coal generation relative to baseline—larger percentage reductions than for a cost minimizer—because the regulatory structure amplifies the retirement incentive over time once carbon costs erode the economic rationale for keeping coal in the rate base.

Q8. What is the short-run operations counterfactual finding for carbon taxes in the sample period?

Using each utility-year in the analysis sample, imposing carbon taxes on regulated utilities reduces carbon costs by only about $500 million relative to baseline—41% of the $1.3 billion carbon cost savings from imposing the same carbon taxes on a cost minimizer. Despite this limited carbon reduction, electricity rates nearly triple from $77.58/MWh to $224.18/MWh under the regulated utility with carbon taxes, as the utility passes through most carbon costs to consumers; regulated utility variable profits also fall by over $500 million.

Q9. What happens when the coal usage incentive is eliminated (μ₂ = 0)?

Setting the coal usage incentive parameter μ₂ = 0 (eliminating the logit slope on capacity utilization) causes coal capacity to fall 82% and coal generation to fall 92% relative to baseline over 30 years—a slightly larger generation decline than for the cost minimizer. However, this comes at the cost of more than twice the CCNG capacity due to the Averch-Johnson effect, and requires utility variable profits to fall by over $300 million, raising reliability concerns unless accompanied by compensating transfers.

Q10. How does the paper’s mechanism relate to observed differences in coal exit rates between regulated and restructured states?

Between 2006 and 2018, 26.0% of coal capacity exited in restructured states versus only 17.2% in regulated states—a gap the authors attribute primarily to the used-and-useful incentive structure in RoR regulation. The structural model quantifies how this regulatory feature specifically distorts coal usage and retirement decisions; it is not explained by demand or cost differences across states, as confirmed by the triple-difference evidence showing the gap is specific to coal (not CCNG) and to regulated (not restructured) utilities.

Q11. Why does the paper argue that alternative regulatory adjustments are insufficient to replicate cost-minimizing transitions?

Changing regulatory parameters—such as increasing the coal usage incentive or adjusting the electricity rate penalty—does not come close to replicating the speed of the energy transition under a cost minimizer in the long-run simulations. Regulatory adjustments that do approach cost-minimizing outcomes (such as eliminating μ₂) require large reductions in utility variable profits sufficient to risk reliability, consistent with why the 2022 Inflation Reduction Act relied on substantial investment transfers rather than carbon taxes as its primary clean energy instrument.

Q12. What is the paper’s identification strategy?

Identification exploits the sharp, exogenous decline in natural gas fuel prices from fracking, which had heterogeneous implications across utilities depending on their initial capital mixes (coal-heavy vs. CCNG-heavy). By comparing investment, retirement, and operations decisions across utilities and over time—particularly between utilities that had CCNG exposure before the price decline and those that did not—the authors recover the structural regulatory and cost parameters. The IV specification for reduced-form evidence uses the current natural gas price interacted with the utility’s initial CCNG generation share as an instrument for fuel and import costs.

Q13. What are the paper’s explicit limitations?

The paper estimates the coal-to-CCNG transition only and cannot speak to the transition to renewables and storage, because there is insufficient variation in the data to identify how regulators would treat CCNG as a legacy technology subject to used-and-useful standards, or how renewables and storage would contribute to the rate base. The authors note that over-investment in CCNG capacity may create future stranded asset problems for ratepayers and that usage incentives for CCNG are likely to further hinder the transition to renewables—but these are conjectures rather than estimated findings.

Rate-of-return (RoR) regulation: A regulatory structure in which the PUC sets electricity rates so that utility revenues cover total variable costs plus an allowable return on the utility’s rate base (capital stock), with the allowable return parameterized as s = (r/r₀)^{-γ}, declining as consumer electricity rates rise.

Used-and-useful standard: A prudence criterion under which a capital asset’s contribution to the rate base depends on its capacity utilization, modeled as a logit function of the generation-to-capacity ratio; fully used coal capacity contributes 1.144 times as much as CCNG per MW, while unused coal contributes only 40% as much.

Rate base: The capital stock on which the PUC grants the utility its allowable rate of return; adjusted by prudence and used-and-useful assessments and described in the paper as “at best an arduous task” to quantify precisely.

Averch-Johnson (AJ) over-investment effect: The tendency of regulated utilities to over-invest in capital because profits are proportional to the rate base; in this paper’s setting, this causes regulated utilities to increase CCNG capacity by 296% over 30 years following the natural gas price shock, compared to 58% for a cost minimizer.

Incentive regulation: A modification of cost-plus RoR regulation in which the allowable rate of return declines as electricity rates rise; it provides efficiency incentives for cost reduction but does not achieve first-best outcomes and is insufficient to overcome the used-and-useful distortion for legacy coal.

Out-of-dispatch-order generation: Running a generation unit when its fuel costs exceed the market import price; regulated utilities engage in this behavior with coal plants to maintain used-and-useful status and rate base contribution, whereas restructured utilities do not face this incentive.

Nested fixed-point indirect inference: The estimation approach used to recover structural regulatory and operations parameters by minimizing the distance between regression coefficients from actual data and those from model-simulated data via a non-linear parameter search.

Evaluating macroeconomic outcomes under asymmetries: Expectations matter

Mon, 01 Jan 0001 00:00:00 +0000

Layer 1 — Overview

Research Question

This paper investigates whether and how assumptions about household and firm expectations alter the macroeconomic implications of asymmetries commonly embedded in macroeconomic models. Specifically, it asks: when a model features a nonlinearity — such as an asymmetric monetary policy rule or a nonlinear Phillips curve — do the longer-run average outcomes and the distributional properties of inflation and unemployment depend on whether agents have rational expectations (RE, accounting for the possibility of future shocks) versus perfect foresight (PF, not anticipating future shocks)?

Methodology

The paper works within a standard three-equation New Keynesian model comprising an IS curve (linking the unemployment gap to the policy rate and the natural rate of interest via Okun’s law with coefficient c ≈ 2), a forward-looking Phillips curve, and a monetary policy rule. The model is parameterized at a quarterly frequency with β = 0.99, κ = 0.01, φπ = 1.5, φu = −0.25, shock persistence ρ_rn = 0.9, and shock standard deviation σ_rn = 0.0025 (calibrated to match a 1-percentage-point standard deviation of the unemployment gap under the symmetric baseline rule).

The key methodological distinction is the specification of the expectations operator. Under RE, agents use the true stochastic transition matrix for the natural rate (approximated via the Rouwenhorst method with 105 grid points). Under PF, agents instead use a transition matrix that always places probability one on the steady-state value of the natural rate next period — i.e., they do not anticipate future shocks. The model is solved globally with a discrete state space projection (parameterized expectations) method, applied identically to RE and PF cases. The authors first derive analytical results in a simplified three-state environment and then present numerical results from 3,000 simulations of 1,000 periods each.

Two types of asymmetry serve as case studies: (i) an asymmetric monetary policy rule — the “Shortfalls rule” — under which the central bank does not tighten in response to a tight labor market (negative unemployment gap), in the spirit of the FOMC’s 2020 framework update; and (ii) a nonlinear (kinked) Phillips curve that steepens by a factor of three when the labor market is tight (unemployment gap < 0), consistent with empirical evidence in Smith, Timmermann, and Wright (2025).

Main Findings

The core finding is that the sign and magnitude of longer-run average outcomes under asymmetric macroeconomic environments can differ substantially — and can even reverse — depending on whether agents have rational expectations or perfect foresight.

For the Shortfalls rule, under PF the model implies a longer-run tradeoff: average unemployment gap is −0.32 percentage points and average inflation gap is +0.25 annualized percentage points relative to the symmetric Deviations rule. PF thus suggests policymakers can lower average unemployment at modest inflationary cost. Under RE, however, this apparent tradeoff disappears entirely: the average unemployment gap is essentially zero (−0.05 percentage points) while average inflation is elevated by approximately 1.02 annualized percentage points. The gap in average inflation outcomes between RE and PF thus exceeds one percentage point, and the labor market benefit implied by PF is absent under RE.

For the nonlinear Phillips curve (under a symmetric deviations rule with φu = 0), the results again diverge across expectations assumptions, and the direction of the effects reverses. Under PF, the kinked Phillips curve implies average inflation of +0.41 annualized percentage points and a near-zero unemployment gap (+0.30 percentage points). Under RE, the average inflation gap is essentially zero while the average unemployment gap rises to +0.63 percentage points — the opposite directional pattern from PF.

The mechanism driving the RE–PF divergence is the interaction between forward-looking price-setters and an inflation-stabilizing central bank. Under RE, anticipated future episodes in which the asymmetry may bind (e.g., the Shortfalls rule providing accommodation, or the Phillips curve steepening) cause firms to set higher prices today. The central bank responds to the resulting pickup in inflation expectations with tighter policy, generating a persistent contractionary offset. This channel is absent under PF because agents expect no future shocks.

Robustness

The main conclusions are robust across three extensions: (i) Bounded rationality (following Gabaix 2020, with m_br = 0.97): outcomes move toward the PF case, confirming that what matters is the degree to which agents internalize the probability of future shocks; (ii) Cost-push shocks instead of natural rate shocks: the RE–PF divergence under a Shortfalls rule is broadly similar in direction and magnitude to the baseline; (iii) Alternative shock specifications: the qualitative conclusions are maintained.

Crucially, under the symmetric Deviations rule the RE and PF solutions are identical in all cases, confirming that the divergence is specific to models with macroeconomic asymmetries, not an artifact of the solution method.

In depth

Q1. What is the central methodological claim about perfect foresight solutions in asymmetric models?

The paper argues that in macroeconomic models with asymmetries or nonlinearities, perfect foresight solutions — in which agents do not account for the possibility that future shocks may occur — can yield longer-run average outcomes and distributions that differ from their rational expectations counterparts in magnitude and potentially in sign. The paper is explicit that this is not a critique of PF methods per se, as PF is often necessary for estimating larger models; rather, the point is that researchers should check the robustness of conclusions about longer-run averages using simplified models solvable under both approaches.

Q2. How is the difference between RE and PF operationalized in the model?

The sole technical distinction lies in the specification of the conditional expectations operator Et. Under RE, this operator uses the true stochastic Markov transition matrix for the natural rate (P^RE), which assigns positive probability to all feasible future states. Under PF, agents use a degenerate transition matrix (P^PF) that assigns probability one to the mean value of the natural rate next period regardless of the current state — effectively, agents expect no future innovations. The same global solution method (discrete state space projection with 105 Rouwenhorst grid points) is applied to both, so differences in equilibrium outcomes are entirely attributable to the expectation specification.

Q3. What are the analytical results for the Shortfalls rule in the simplified three-state model?

In the simplified environment with the natural rate taking three equiprobable values (low, steady-state, high) and no persistence, the analytical solution shows that under PF the average unemployment gap is −Δ/(1 + φπκ) < 0 and the average inflation gap is Δκ/(1 + φπκ) > 0, where Δ parameterizes the degree of additional accommodation in the high-demand state. Under RE, the average unemployment gap is exactly zero and the average inflation gap is Δ/(φπ − 1) > 0. The inflation gap under RE exceeds that under PF by Δ(1 + κ)/[(φπ − 1)(1 + φπκ)] > 0, and the unemployment gap under RE exceeds that under PF by Δ/(1 + φπκ) > 0. Thus, PF spuriously implies an exploitable long-run tradeoff that vanishes under RE.

Q4. What are the analytical results for the nonlinear Phillips curve in the simplified model, and how do the directions of the effects compare to the Shortfalls rule case?

Under PF with a nonlinear (kinked) Phillips curve, the average inflation gap is positive (= Δpc > 0) while the average unemployment gap is zero. Under RE, the signs reverse: the average unemployment gap is positive (= Δpc/κ > 0) and the average inflation gap is zero. The difference is ūRE − ūPF = Δpc/κ > 0 and π̄RE − π̄PF = −Δpc < 0. This sign reversal relative to the Shortfalls rule case illustrates that the directional error introduced by PF is not uniform but depends on the specific asymmetry — the key feature is always the absence, under PF, of the forward-looking price-setting channel interacting with monetary policy.

Q5. What is the quantitative magnitude of the RE–PF divergence in the numerical model for the Shortfalls rule?

In the fully parameterized numerical model (Table 2), under a Shortfalls rule the average inflation gap is 1.02 annualized percentage points under RE versus 0.25 annualized percentage points under PF — a difference of roughly 0.77 percentage points. The average unemployment gap is −0.05 percentage points under RE versus −0.32 percentage points under PF — a difference of 0.27 percentage points. The paper also notes that model-implied averages for inflation and nominal interest rates “under perfect foresight can easily differ by at least one percentage point from their rational expectations counterparts.”

Q6. How do the simulated distributions differ between RE and PF under a Shortfalls rule?

Under PF, the simulated distributions of unemployment and inflation gaps exhibit a pronounced kink near the steady-state value (zero gap), reflecting the asymmetric treatment of expansions and contractions. Under RE, the distributions are substantially more symmetric, shifted to the right for inflation (mean of 1.0 versus 0.25 under PF). Standard deviations of the unemployment and inflation gaps are somewhat larger under PF (1.42 and 1.10, respectively) than under RE (1.33 and 1.03), because under RE the contractionary force from inflation expectations moderates the amplitude of fluctuations. These distributional differences have direct implications for how policymakers interpret the risks associated with state-contingent policies.

Q7. What is the role of the forward-looking pricing–central bank interaction in generating RE–PF differences?

The key mechanism is as follows: under RE, the possibility that the asymmetry may bind in the future (e.g., a positive demand shock triggering more accommodation under the Shortfalls rule, or a tight labor market steepening the Phillips curve) causes forward-looking firms to raise prices today in anticipation of future inflation. This increase in current inflation leads the central bank — whose mandate includes inflation stabilization — to raise policy rates, generating a contractionary offset even when the economy is not currently in the high-demand state. Under PF, agents do not form these anticipatory expectations, so this channel is entirely absent, and the asymmetry affects outcomes only when it directly binds.

Q8. Does the RE–PF divergence arise under a symmetric Deviations rule?

No. The paper shows analytically and numerically that when the monetary policy rule is symmetric (the Deviations rule, responding equally to deviations above and below target), the RE and PF solutions are identical. Unemployment and inflation gaps are both zero on average under either expectations assumption, and the policy rate gap is essentially zero (0.01 annualized percentage points) in both cases. This equivalence result confirms that the RE–PF divergence is not an artifact of the solution method or parameterization but is specifically generated by the interaction between an asymmetry and agents’ forward-looking behavior.

Q9. What do the bounded rationality results imply about the mechanism?

The extension following Gabaix (2020), with a myopia parameter m_br = 0.97, produces results that lie between the full-RE and PF cases: the adoption of the Shortfalls rule yields average unemployment of −0.26 percentage points (intermediate between RE’s −0.05 and PF’s −0.32) and average inflation of 0.62 annualized percentage points (between RE’s 1.02 and PF’s 0.25). This gradient confirms that the key driver is the extent to which agents internalize the probability of future shocks: the more forward-looking agents are, the more strongly the anticipatory pricing channel operates and the less favorable (and more inflationary) the apparent policy tradeoff becomes.

Q10. What are the results for the nonlinear Phillips curve in the numerical model?

Under the numerically calibrated nonlinear Phillips curve model (Panel B.3 of Table 3, with the slope increasing by a factor of three when the unemployment gap is negative), the average unemployment gap under RE is 0.63 percentage points versus 0.30 under PF, and the average inflation gap under RE is essentially zero (0.01 annualized percentage points) versus 0.41 under PF. The authors note that “the average outcomes for both unemployment and inflation can differ by roughly 0.3 to 0.4 percentage points between rational expectations and perfect foresight” in this case.

Q11. What is the paper’s advice for researchers who must use perfect foresight methods?

The paper explicitly states that PF methods remain valuable, especially for estimating or simulating larger models with heterogeneity at the micro level where RE solutions are computationally prohibitive. The authors recommend that researchers relying on PF to solve larger models “check the robustness of their conclusions on longer-run averages and the distribution of outcomes using simplified models which can be solved under both perfect foresight and rational expectations.” To support this, the authors provide multiple versions of code for solving simple macroeconomic models under various asymmetries and expectations assumptions.

Q12. How does the paper position its contribution relative to prior work on RE vs. PF in asymmetric models?

The paper acknowledges that Adam and Billi (2007) and Nakov (2008) previously documented that, at the zero lower bound, households’ anticipation of future ZLB episodes leads to lower average inflation — an RE–PF difference in the spirit of this paper’s findings. However, the paper’s contribution is to show that the sign and quantitative implications of a given asymmetry can change depending on the expectations assumption, and to systematically characterize this sensitivity across multiple types of asymmetry (asymmetric policy rules and nonlinear Phillips curves). The paper also categorizes the existing literature by expectations assumptions in Table A.1, showing that many papers examining macroeconomic asymmetries use only one approach.

Key Concepts

Shortfalls Rule: A monetary policy rule, motivated by the FOMC’s 2020 Statement on Longer-Run Goals and Monetary Policy Strategy, under which the central bank responds only to shortfalls of employment from its maximum level — i.e., it does not tighten policy in response to a tight labor market (negative unemployment gap) during an expansion. Formally, it = φπ πt + φu ut when ut ≥ 0 (labor market slack), and it = φπ πt only when ut < 0 (labor market tight). Contrasts with the symmetric Deviations rule that responds to deviations of employment in both directions.

Deviations Rule: A symmetric monetary policy rule in which the central bank responds to the unemployment gap regardless of its sign — tightening in expansions and easing in contractions. Serves as the baseline against which the Shortfalls rule is compared, and as the case in which RE and PF solutions are identical.

Perfect Foresight (PF) Equilibrium: An equilibrium in which agents solve their optimization problems assuming that no future shocks will occur — they expect all endogenous variables to converge to their longer-run (steady-state) values next period, regardless of the current state. In the paper’s notation, the PF transition matrix P^PF assigns probability one to the mean state next period. In linear models, PF and RE yield identical outcomes; in models with asymmetries, they diverge.

Rational Expectations (RE) Equilibrium: An equilibrium in which households and firms correctly account for the full stochastic distribution of future shocks in forming their expectations. Agents use the true Markov transition matrix P^RE for the natural rate process. This allows forward-looking pricing behavior to incorporate the possibility that the economy may enter states in which asymmetries bind in the future.

Nonlinear (Kinked) Phillips Curve: A Phillips curve in which the slope coefficient κ̃t is state-contingent, increasing when the unemployment gap is negative (labor market is tight). In the paper’s numerical implementation, the slope triples (κ̃ = 3κ) when ut < 0, consistent with empirical evidence in Smith, Timmermann, and Wright (2025) on structural breaks in the Phillips curve. The nonlinearity generates an asymmetric inflationary response: a given level of unemployment produces more inflation when the labor market is tight than when it is slack.

Stochastic Steady State: The equilibrium to which the economy converges in the absence of additional shocks, taking into account the stochastic nature of the environment (i.e., accounting for the possibility of future shocks). Used as the initial condition for computing impulse response functions under RE. Contrasts with the deterministic steady state (zero gaps), which serves as the initial condition under PF.

Parameterized Expectations (Global Solution) Method: The numerical solution algorithm used in the paper to solve for equilibrium policy functions for unemployment and inflation gaps over the state space. Implemented identically for RE and PF cases, differing only in the transition matrix used. Applied with 105 Rouwenhorst grid points for the natural rate. The paper shows this method is orders of magnitude faster than the more common shooting algorithm (0.04 seconds vs. 10.8 seconds) while yielding identical policy functions.

Bounded Rationality (Gabaix 2020): An extension of the baseline model in which agents discount the influence of future expectations by a myopia parameter m_br ∈ (0, 1), applied to both the IS curve and the Phillips curve. The parameter m_br = 0.97 (following McKay, Nakamura, and Steinsson 2017) limits the degree to which distant future states affect current decisions. Produces outcomes intermediate between full RE and PF, confirming that the key dimension of variation is the extent to which agents internalize the probability of future shocks.

Genetic Prediction and Adverse Selection

Mon, 01 Jan 0001 00:00:00 +0000

This paper asks how much adverse selection would arise in critical illness insurance (CII) markets if consumers can observe polygenic indexes (PGIs) — genetic risk scores derived from millions of genetic variants — while insurers are legally barred from using genetic information. The authors develop an econometric method that measures selection under current PGI technology, then extends identification to expected future PGI accuracy using heritability bounds, even though future PGIs are not yet observable in data.

The primary dataset is the UK Biobank (UKB), comprising approximately 446,570 genotyped individuals of European-like ancestry linked to NHS electronic health records. The authors study seven single-disease CII contracts (Alzheimer’s disease, breast cancer, coronary artery disease, colorectal cancer, prostate cancer, schizophrenia, and type 2 diabetes) and multiple-disease bundled contracts paying a lump sum upon onset. The econometric model assumes a probit disease probability, Gaussian PGI structure, and identification relies on published heritability estimates to pin down future PGI predictive power. The key selection metric is the implicit tax proposed by Hendren (2013): the percentage markup a marginal consumer must pay above her actuarially fair price due to adverse selection. The authors use the minimum implicit tax up to the 80th percentile of risk (t80) as their summary statistic, with market unraveling benchmarked at t80 between 43% and 83% from prior literature.

The paper reports three main findings, all scoped to a population of 35-year-olds in the standard insurer risk class (those whose predicted risk falls within 0.75–1.25 times the population mean).

First, under current PGI technology with full consumer adoption, selection is noticeable but heterogeneous across diseases. t80 ranges from 17.9% for coronary artery disease to 117.9% for Alzheimer’s disease. Coronary artery disease and colorectal cancer fall in the middle of the no-unraveling range; breast cancer, schizophrenia, and type 2 diabetes fall between the no-unraveling and unraveling ranges; Alzheimer’s disease and prostate cancer (t80 = 59.8%) reach or exceed the unraveling range. The current prostate cancer PGI explains 9.9% of liability variance, adding 8.3 percentage points over the 22.9% explained by non-genetic covariates.

Second, under expected future PGI accuracy — bounded below by SNP heritability and above by twin heritability — selection becomes potentially crippling. Under the lower bound (Scenario 3L), t80 ranges from 57.5% for breast cancer to above 1,000% for Alzheimer’s. Under the upper bound (Scenario 3U), t80 exceeds 100% for all seven single-disease contracts and exceeds 1,000% for three of them. For prostate cancer, the reference case, t80 reaches 86.8% under Scenario 3L and 426.9% under Scenario 3U — far above Hendren’s unraveling benchmarks. For multiple-disease male contracts, t80 = 30.8% under current technology, rising to 54.4% (Scenario 3L) and 243.9% (Scenario 3U).

Third, variation in selection across contracts is driven primarily by: the predictive power of the future PGI, the incremental predictive power over non-genetic covariates, and disease prevalence. Alzheimer’s and schizophrenia — high heritability, low prevalence — display the highest implicit taxes; breast and colorectal cancer — lower SNP heritability, lower incremental R2 — display the lowest.

These findings are corroborated by a calibrated Akerlof-Einav-Finkelstein equilibrium model using HRS data: current PGI availability reduces equilibrium market quantity from 30% to 21.4%; future PGI availability drives equilibrium quantity to zero in a full adverse selection death spiral. Partial take-up robustness checks show that even at 50% consumer adoption, selection remains problematically high under future PGI accuracy for most contracts. The analysis is restricted to individuals of European-like ancestry due to data availability constraints.

Q: What is the core market failure the paper analyzes? A: The paper analyzes adverse selection arising from an asymmetric information gap: consumers can observe PGI-based disease risk predictions from consumer genetic tests (e.g., 23andMe), while insurers in many jurisdictions are legally prohibited from requesting or using genetic information. This creates a situation where high-risk consumers have private information allowing them to sort into insurance, driving up average claims costs and potentially unraveling the market.

Q: What is a polygenic index (PGI) and why does it differ from classical genetic testing? A: A PGI is a weighted sum of millions of genetic variants (typically over one million) each with individually tiny effects, constructed using effect-size estimates from genome-wide association studies (GWASs). This contrasts with traditional genetic testing focused on rare single-gene mutations (e.g., BRCA for breast cancer or PKD for kidney disease), which are rare, explain small shares of population-level disease variance, and can largely be inferred from family history. PGIs target common polygenic diseases and are the primary driver of the adverse selection concern because they aggregate diffuse genetic signals into a meaningful risk prediction.

Q: What are the current PGI R2 values for the seven diseases studied? A: Estimated on the liability scale in the UKB, current PGI R2 values are: Alzheimer’s disease 7.1%, breast cancer 6.7%, coronary artery disease 2.5%, colorectal cancer 2.2%, prostate cancer 9.9%, schizophrenia 4.9%, and type 2 diabetes 7.4%. These represent the share of liability variance explained by each disease’s current PGI in the study sample.

Q: How does the paper identify the degree of selection under future PGI technology that does not yet exist in the data? A: The identification strategy combines three elements: the normality of PGI distributions, the relationship between current and future PGIs (the current PGI is modeled as a noisy version of the future PGI with an independent Gaussian error), and published heritability estimates that bound the future PGI’s predictive power. Theorem 1 establishes that under five stated assumptions — including a probit disease model and known future R2 from heritability studies — the full joint distribution of loss, current PGI, future PGI, and non-genetic covariates is identified from observed data.

Q: What heritability bounds are used for the future PGI scenarios, and why two bounds? A: Scenario 3L sets future PGI R2 equal to each disease’s SNP heritability (estimated from common genetic variants), which the authors treat as a conservative lower bound because future PGIs will also incorporate rarer variants with better effect-size precision. Scenario 3U sets future PGI R2 equal to twin heritability, treating it as an upper bound since the theoretical maximum predictive power of a PGI is the trait’s narrow-sense heritability. For prostate cancer, these bounds are 18.0% (SNP) and 57.0% (twin); for Alzheimer’s, SNP heritability is 33.1% and twin heritability is 58%.

Q: What is the implicit tax and how is it used as a benchmark? A: The implicit tax t(r) for a consumer with private risk r equals the percentage by which her insurance cost exceeds her own actuarially fair price when she must pool with all consumers of equal or higher risk. It measures how much the marginal buyer overpays due to adverse selection. The authors follow Hendren (2013) in reporting t80, the minimum implicit tax up to the 80th percentile. Hendren’s benchmarks: t80 between 7–35% for markets that did not unravel; t80 between 43–83% for markets that had unraveled.

Q: What are the single-disease contract results under current PGI technology (Scenario 2)? A: With full consumer adoption of current PGI technology, t80 ranges from 17.9% for coronary artery disease to 117.9% for Alzheimer’s disease. Coronary artery disease (17.9%) and colorectal cancer (26.5%) fall in the middle of Hendren’s no-unraveling range. Breast cancer (36.9%), schizophrenia (42.1%), and type 2 diabetes (37.0%) fall between the no-unraveling and unraveling ranges. Alzheimer’s disease (117.9%) and prostate cancer (59.8%) reach or exceed the unraveling range.

Q: What are the single-disease contract results under future PGI technology? A: Under the lower bound (Scenario 3L, R2 = SNP heritability), t80 ranges from 57.5% for breast cancer to above 1,000% for Alzheimer’s disease. Under the upper bound (Scenario 3U, R2 = twin heritability), t80 exceeds 100% for all seven contracts and exceeds 1,000% for three (Alzheimer’s, schizophrenia, and at least one other). These figures substantially exceed Hendren’s unraveled-market benchmarks for virtually all contracts.

Q: What drives cross-disease variation in the implicit tax? A: The authors identify three main drivers: the expected accuracy of future PGI (higher heritability → higher implicit tax), the incremental predictive power of the future PGI over non-genetic covariates observable by insurers (more incremental information → more adverse selection), and disease prevalence (lower prevalence concentrates risk heterogeneity, amplifying selection). Alzheimer’s disease and schizophrenia — high heritability and low prevalence — have the highest implicit taxes. Breast and colorectal cancers — lower SNP heritability and lower incremental R2 — have the lowest.

Q: What do the multiple-disease bundled contract results show? A: For the male multiple-disease contract under Scenario 2 (current PGI), t80 = 30.8%, comparable to Hendren’s no-unraveling range. Under Scenario 3L, t80 = 54.4%; under Scenario 3U, t80 = 243.9%, both in or above the unraveling range. The female contract yields qualitatively similar results. Implicit taxes in bundled contracts are generally lower than in single-disease contracts, suggesting some diversification of genetic risk across diseases.

Q: What does the calibrated equilibrium model find? A: Using an Akerlof (1970) / Einav-Finkelstein-Cullen (2010) supply-and-demand model calibrated to match a 30% market participation rate and a 50% loss ratio in the UK CII market, and using HRS data on individual risk aversion, the model finds that current PGI availability reduces equilibrium quantity from 30% to 21.4%. Future PGI availability (both Scenario 3L and 3U) drives equilibrium quantity to zero — a complete adverse selection death spiral with no trade.

Q: How robust are results to partial consumer adoption of genetic testing? A: At 10% consumer take-up, selection is low regardless of PGI accuracy. At 50% take-up, selection remains problematically high for all single-disease contracts under future PGI accuracy (Scenarios 3L and 3U). For multiple-disease contracts at 50% take-up, t80 falls just below Hendren’s unraveling threshold under Scenario 3L but enters the unraveling range under Scenario 3U. This suggests market problems would materialize once predictive power exceeds the SNP heritability bound and take-up exceeds roughly 50%.

Q: What role do risk preferences play, and do they confound the results? A: The authors test whether risk tolerance correlates with disease risk in the UKB using a self-reported general risk tolerance measure. They find extremely low correlations between risk tolerance and each disease. This is consistent with low correlation between relative risk aversion and disease risk in the HRS calibration, and supports the finding that correlation between risk and risk preferences is unlikely to meaningfully affect the main results.

Q: What is the paper’s assessment of preventive treatment as a mitigating factor? A: The authors acknowledge that genetic testing could enable personalized preventive medicine, which would reduce actual disease incidence among high-risk individuals. However, they argue this is unlikely to substantially affect their main findings because the most commonly covered diseases under CII are cancers, for which preventive behaviors have bounded effectiveness.

Q: What are the paper’s policy implications? A: The paper situates the genetic information problem within the standard regulatory framework for selection markets, distinguishing laissez-faire (allow genetic underwriting — efficient but potentially unfair to high-risk consumers), government provision (unattractive for non-essential CII), and managed competition (community rating combined with subsidies and risk adjustment). The authors argue that a full ban on genetic underwriting — the current policy in many countries — may become untenable as PGI accuracy improves, because it generates potentially crippling adverse selection. Some level of community rating may remain desirable for redistribution, but needs to be paired with subsidies or risk adjustment to prevent market collapse.

Q: What are the main data and scope limitations? A: The analysis is restricted to individuals of European-like ancestry because most large GWASs were conducted in European ancestry samples and PGIs perform poorly across ancestries. The UKB sample was aged 40–69 at recruitment and the analysis adjusts for age-dependent covariates; the HRS replication uses approximately 20,000 individuals. The equilibrium model ignores moral hazard and uses a parsimonious binary loss framework. The paper does not specify a timeline for when PGI accuracy will reach heritability bounds.

Polygenic Index (PGI): A weighted sum of an individual’s genetic variants across the genome (typically over one million variants), constructed using effect-size estimates from a genome-wide association study (GWAS) conducted in an independent sample. It is a noisy proxy for the individual’s true additive genetic factor for a disease, and its predictive power is bounded above by the trait’s narrow-sense heritability.

Implicit Tax: A measure of adverse selection defined by Hendren (2013) as the percentage by which a consumer with private risk r must overpay relative to her own actuarially fair price if she is pooled with all consumers of equal or higher risk. The minimum implicit tax up to the 80th percentile of risk (t80) serves as the paper’s primary summary statistic; t80 above roughly 43% is associated with market unraveling in prior literature.

SNP Heritability: The share of variance in a disease’s liability attributable to the set of common genetic variants (SNPs) used in heritability estimation. Used in this paper as a conservative lower bound on the predictive power of future PGIs, because future PGIs will additionally capture rarer variants.

Twin Heritability: An estimate of a trait’s narrow-sense (additive) heritability computed by comparing resemblance of monozygotic twins (sharing 100% of their genomes) to dizygotic twins (sharing ~50% on average). Used as an upper bound on future PGI predictive power, since heritability is the theoretical maximum R2 for a PGI.

Standard Risk Class: The set of consumers whose predicted disease risk (based on non-genetic covariates observable to insurers) falls between 0.75 and 1.25 times the population-wide average risk, following standard insurance underwriting practice. Insurers charge the same premium to all consumers in this class; any variation in risk within the class due to private genetic information constitutes the source of adverse selection analyzed in this paper.

Private Risk Function: The probability rho(g, w) of contracting the disease conditional on both the consumer’s observed PGI g and non-genetic factors w. Contrasted with the non-genetic private risk function pi(w), which conditions only on non-genetic covariates. The dispersion of the private risk distribution across consumers in the same risk class determines the degree of adverse selection.

Adverse Selection Death Spiral: The Akerlof (1970) mechanism in which high-risk consumers disproportionately purchase insurance, causing insurers to raise premiums, which deters low-risk consumers, which further raises the average risk of purchasers, ultimately driving equilibrium quantity to zero. The paper’s calibrated equilibrium model finds this outcome under future PGI accuracy for the HRS CAD contract.

Germs in the Family: The Short- and Long-Term Consequences of Intra-Household Disease Spread

Mon, 01 Jan 0001 00:00:00 +0000

This paper studies the short- and long-term consequences of intra-household respiratory disease transmission from older to younger siblings in Danish families. The central research questions are: (1) how do respiratory illnesses spread from preschool-aged older siblings to younger infant siblings during the first year of life, and (2) how does respiratory disease exposure during infancy causally affect younger siblings’ long-term economic, human capital, and health outcomes?

The study uses population-level Danish administrative data covering 1,230,180 children from 37 birth cohorts (1981–2017), linking records from the National Patient Register, income and labor market registers, education registers, and psychiatric care registers. The identification strategy combines birth order variation in respiratory disease vulnerability with within-municipality variation in local respiratory disease prevalence among children aged 13–71 months. The authors construct a municipality-level disease exposure index—cumulative respiratory hospitalizations per 100 children aged 13–71 months in a child’s municipality over their first 12 months of life—and estimate the differential effect of this index on younger versus older siblings, controlling for municipality fixed effects, birth year-month fixed effects, and an extensive set of individual and family background characteristics.

The descriptive findings are stark: younger siblings have 2–3 times higher rates of hospitalization for acute respiratory conditions during their first year of life compared to older siblings at the same age, with the gap largest at ages two and three months. The gap is larger for winter births, shorter birth spacing, and when older siblings attend childcare centers—all patterns consistent with the older sibling serving as a disease vector.

On the causal estimates, moving from the 25th to the 75th percentile of the disease exposure index distribution increases the younger sibling’s acute respiratory hospitalizations in the first year of life by 0.023 (32.9 percent above the sample mean), with effects more than twice as large for exposure in the first six months compared to the second six months.

In the long run, an interquartile increase in first-year respiratory disease exposure reduces younger siblings’ wage earnings (conditional on employment) at ages 25–32 by 0.8 percent and total income by 0.8 percent, and reduces their income percentile rank by 0.3 percentage points. There is no significant effect on labor force participation at the extensive margin. Effects on earnings are approximately twice as large when exposure is measured in the first six months of life. These earnings effects are comparable in magnitude to those from a 10 percent reduction in birth weight or a 9 percent increase in ambient air pollution at birth, and correspond to roughly two-thirds of the adult earnings impact of in utero exposure to the 1918 Spanish Influenza. When the disease index interaction is included, the main birth order coefficient declines by approximately 70 percent, suggesting intra-household disease transmission is an important channel underlying the documented birth order earnings disadvantage.

Additional findings include: a 0.5 percentage point reduction in high school graduation and a 0.6 percentage point reduction in college graduation (interquartile effects); a 0.01 standard deviation penalty in ninth grade Danish test scores; a 20 percent increase (0.016 per hundred per year) in chronic respiratory hospitalizations at ages 16–26; and a 6.1 percent increase (0.5 additional visits per hundred per year) in psychiatric clinic visits at ages 16–26. Breastfeeding mitigates short-term effects, with 15 months of breastfeeding sufficient to entirely offset the elevated hospitalization risk.

Scope conditions: findings apply to second-born relative to first-born children in Danish sibling pairs with at least 11 months birth spacing; long-term estimates are net of parental compensatory responses and any immunity benefits, and thus represent lower bounds of the uncompensated biological impact of respiratory illness in infancy.

Q: What is the magnitude of the birth order gap in acute respiratory hospitalizations during infancy, and what patterns support an intra-household transmission mechanism? A: Younger siblings have 2–3 times higher hospitalization rates for acute respiratory conditions in the first year of life compared to older siblings at the same age, with the gap especially large at ages two and three months. The gap is larger for winter births (when respiratory viruses circulate more), for siblings with shorter birth spacing, and when the older sibling attends a childcare center. Hospitalizations for non-infectious digestive diseases and injuries show no analogous birth order differences, ruling out differential parental healthcare-seeking as an explanation.

Q: How is the disease exposure index constructed and what variation does it exploit? A: The index is the cumulative count of acute respiratory hospitalizations per 100 children aged 13–71 months in a child’s municipality over their first 12 months of life, with the older sibling excluded from the count when applicable. It exploits irregular spatial and temporal waves of respiratory viruses (such as RSV and influenza) across Danish municipalities. The interquartile range of this index captures meaningful variation in community disease burden faced by infants across different places and years.

Q: What is the first-stage relationship between the disease index and infant hospitalizations? A: Moving from the 25th to the 75th percentile of the disease index increases younger siblings’ acute respiratory hospitalizations in the first year of life by 0.023 (a 32.9 percent increase relative to the sample mean), while the effect on older siblings is substantially smaller. The interaction coefficient in the preferred specification implies that one additional hospitalization per 100 community children aged 13–71 months raises the younger sibling’s hospitalization count by 0.012 more than the older sibling’s. Effects are more than twice as large for exposure in the first compared to the second six months of life.

Q: What are the estimated long-term effects on adult earnings, and how do they compare to benchmarks in the literature? A: An interquartile increase in first-year respiratory disease exposure reduces younger siblings’ wage earnings at ages 25–32 by 0.8 percent and total income by 0.8 percent, with a 0.3 percentage point reduction in income percentile rank. These magnitudes are comparable to a 1 percent earnings reduction from a 10 percent birth weight reduction (Black et al., 2007), a 1 percent earnings reduction from a 9 percent increase in ambient air pollution (Isen et al., 2017b), and roughly two-thirds of the in utero Spanish Influenza effect (Almond, 2006).

Q: Does the birth order earnings disadvantage reflect intra-household disease transmission? A: When the interaction between birth order and the disease index is excluded, the regression finds a 1.9 percent birth order earnings disadvantage for second-born children (consistent with Black et al., 2005 range of 1.2–4.2 percent). When the interaction is included, the main birth order coefficient declines by approximately 70 percent, suggesting that disease transmission from older to younger siblings is an important channel driving the birth order earnings penalty.

Q: Are effects larger for exposure in the first versus second six months of life? A: Yes, consistently across all outcomes. The interaction coefficient for acute respiratory hospitalizations is more than twice as large when exposure is measured in the first versus second six months. Effects on wage earnings are approximately 60 percent larger for first-half exposure, and effects on income rank are two to three times larger. This is consistent with biomedical evidence that infants’ immune systems mature around six months when solid food introduction begins.

Q: What are the effects on educational outcomes? A: An interquartile increase in first-year respiratory disease exposure reduces the likelihood of high school graduation by 0.5 percentage points (0.6 percent at the sample mean) and college graduation by 0.6 percentage points (1.7 percent at the sample mean), with effects approximately 60 percent larger when measuring first-half exposure. A 0.01 standard deviation reduction in ninth grade Danish test scores is also found. A back-of-the-envelope calculation using Danish returns to schooling suggests the reduction in educational attainment can explain approximately half of the estimated earnings effect.

Q: What are the effects on chronic respiratory and mental health outcomes? A: An interquartile increase in first-year exposure increases chronic respiratory hospitalizations (asthma, COPD) at ages 16–26 by 0.016 per hundred per year (20 percent above the sample mean), with significant increases also apparent at ages one to two. For mental health, the same exposure is associated with 0.5 additional psychiatric clinic visits per hundred per year at ages 16–26 (6.1 percent above the sample mean), with effects becoming more significant in the early twenties. Effects on mental health from this paper are smaller than those estimated for more extreme fetal and early childhood shocks such as Ramadan exposure or maternal bereavement.

Q: What does the acute respiratory trajectory look like beyond infancy? A: Elevated acute respiratory hospitalizations persist at age one, then there is a reduction at ages two to three consistent with an immunity formation hypothesis, but this protective effect disappears by age four. There is no significant increase or decrease in acute respiratory hospitalizations at older ages, in contrast to the persistent increase found for chronic respiratory conditions.

Q: What heterogeneity is found in short-term effects? A: Effects on infant respiratory hospitalizations are larger for low birth weight children, for male infants (consistent with the fragile male hypothesis), for siblings with shorter birth spacing, and for sibling pairs where the older child attends childcare. The monotonic decline in effect size with increasing birth spacing is the opposite of what would be predicted if differential parental time investment were the main mechanism, supporting intra-household disease spread as the operative channel.

Q: What is the role of breastfeeding as a moderator? A: Using supplementary data on breastfeeding duration (covering 2009–2016, matched to 7.6 percent of the sample), the authors find that the impact of disease exposure on younger siblings’ infancy hospitalizations declines significantly with longer breastfeeding duration. A linear specification implies that 15 months of breastfeeding entirely offsets the elevated hospitalization risk from higher disease exposure. Second-born children breastfed for less than half a month are particularly vulnerable to acute respiratory infections.

Q: How do the authors validate the identifying assumption? A: Three validation exercises are used. First, results are robust to adding municipality-specific linear and quadratic trends and maternal fixed effects. Second, using family background characteristics as outcomes in the interaction regression, at most two of fourteen coefficients are significant in any specification, and all effect sizes are less than one percent of sample means. Third, using alternative disease indices based on non-infectious digestive diseases and injuries shows no differential effects for younger siblings, ruling out a parental healthcare-seeking confound.

Q: What are the policy implications? A: The authors highlight breastfeeding support policies (paid family leave, workplace lactation accommodations), RSV vaccination campaigns for pregnant women and monoclonal antibody prophylaxis for infants, sick pay regulations, and childcare attendance policies as levers to reduce infant respiratory disease burden. They argue that current cost-benefit evaluations of such policies likely undercount the long-term human capital and earnings benefits. The COVID-19 pandemic illustrates the mechanism: restrictions reduced RSV spread during 2020 potentially benefiting infants with older siblings, while the subsequent RSV surge in 2021–2022 may have exposed later cohorts to above-average disease burden.

Respiratory Disease Exposure Index: A municipality-level cumulative measure of acute respiratory hospitalizations per 100 children aged 13–71 months assigned to each child over their first 12 months of life (or first and second six months separately), designed to proxy for community respiratory disease burden faced by infants from slightly older children, with the child’s own older sibling excluded from the count.

Intra-Household Disease Transmission: The mechanism by which preschool-aged older siblings, exposed to respiratory viruses in group childcare settings, bring home those viruses and infect younger infant siblings who are in a vulnerable stage of immune and brain development, creating a within-family externality in health outcomes.

Differential Birth Order Effect (Identification): The quasi-experimental design exploits the interaction between birth order (younger siblings are more exposed to older siblings’ illnesses) and local disease prevalence variation to identify causal impacts, netting out the main effects of both birth order and local disease environment through municipality and birth year-month fixed effects.

Immunity Formation Hypothesis: The conjecture that early respiratory disease exposure may have a protective effect on later acute respiratory illness through immune system training; supported in the data by reduced acute hospitalizations at ages two to three, though this protection disappears by age four and does not prevent chronic respiratory disease development.

Dynamic Complementarities with Sibling Health Spillovers: An extension of the Cunha-Heckman framework: while standard models incorporate investment complementarities across time periods for a given child, this paper’s findings imply that sibling health spillovers create differential returns to early-life health investments by birth order, since disease asymmetries between older and younger siblings are not incorporated in existing theoretical models.

Net Long-Term Effects: The estimated long-run impacts incorporate not only the direct biological effects of respiratory illness on the younger sibling but also any parental compensatory responses and immunity benefits; thus they represent lower bounds of the uncompensated biological impact, as parental compensation would attenuate the measured sibling difference.

Heterogeneity and the Macro-Economic Effects of Changes in Loan-to-Value Limits

Mon, 01 Jan 0001 00:00:00 +0000

Layer 1 — Overview

Research Question

De Veirman and de Jong develop a new approach to estimating the macroeconomic effects of changes in regulatory loan-to-value (LTV) limits on mortgage loans. The central questions are: (1) how do changes in an LTV cap translate into changes in the average LTV and, through that channel, into house prices and real output; and (2) how do heterogeneity in the cross-sectional LTV distribution, non-linearity, and asymmetry shape those effects?

Motivation and Gap

Prior empirical literature on macroprudential LTV policy typically pools across countries using coded indicator variables, which imposes the restriction that all LTV policy actions have the same effect regardless of the size of the change or the position of the limit relative to the distribution. Standard TANK models with homogeneous borrowers imply either full symmetry or threshold asymmetry precisely at the point where the constraint ceases to bind. The authors are the first to relate borrower heterogeneity to non-linearity and asymmetry in LTV policy effects.

Data and Setting

The empirical application focuses on the Netherlands, which introduced an LTV cap of 106 percent on August 1, 2011, subsequently reduced in annual one-percentage-point steps to 100 percent by January 2018. Cross-sectional LTV distributions are constructed from the De Nederlandsche Bank Loan Level Data (LLD), covering 77-81 percent of outstanding Dutch mortgage debt in 2012Q4-2014Q4, restricted to borrowers aged 35 or younger as a proxy for first-time buyers. A survey-based average LTV series spanning 1979-2015 was fielded in January 2016 across the CentERpanel and LISS panel (7,943 respondents combined; 2,238 usable observations after cleaning), measuring LTV at the time of first home purchase. This survey-based annual LTV series, together with the log relative house price, log real GDP, and the real mortgage rate, forms a four-variable Vector Error Correction Model (VECM) estimated over 1981-2015, with a single cointegrating vector identified by Johansen maximum likelihood.

Methodology

The authors’ core innovation is to translate changes in the LTV cap into changes in the cross-sectional average LTV by applying each successive cap level to the underlying distribution: observations above the cap are moved to the cap value (with adjustments for exceptions in the ex post variant). These implied annual changes in the average LTV serve as a succession of impulses fed into the VECM. Two variants are implemented: an ex ante approach using only the pre-cap 2010M8-2011M7 distribution, and an ex post approach that uses the most recent empirical distribution prior to each cap change. The Cholesky identification ordering is [LTV, house prices, GDP, mortgage rate].

Main Findings with Quantitative Magnitudes

Non-trivial macroeconomic effects of Dutch LTV policy: Under the ex post approach (the preferred estimate), the imposition of the cap at 106 percent in 2011 and its gradual reduction to 100 percent by 2018 imply, twenty years after the first shock, that relative house prices are 4.84 percent lower and real GDP is 1.15 percent lower than they would have been in the absence of the cap sequence. The bulk of these responses materializes within ten years, at 4.18 percent and 1.05 percent respectively.
Non-linearity: For a given underlying distribution, changes in the cap have progressively larger effects as the cap tightens. In the ex ante approach, the fraction of households constrained by the cap rises from approximately 20 percent at a limit of 105 percent to approximately 40 percent at a limit of 100 percent. A 10 percentage point tightening from 110 to 100 percent implies a long-run relative house price response of 6.12 percent, while a tightening from 100 to 90 percent implies a response of 14.27 percent — a pronounced non-linearity traceable to the substantial mass of observations in the 90-110 range of the Dutch distribution.
Heterogeneity matters substantially: In mean-preserving comparisons using Pearson-family approximations to the pre-cap Dutch distribution, the macroeconomic effects of the actual Dutch LTV policy sequence are 2.58 times larger in the high standard deviation case (standard deviation 25 percent above the Dutch baseline of 17.09) than in the low standard deviation case (standard deviation 25 percent below). Specifically, twenty-year house price responses are 12.34 percent (high SD) versus 4.79 percent (low SD), and GDP responses are 2.93 percent versus 1.14 percent.
Asymmetry is conditional on the position of the cap relative to the distribution: For the Dutch distribution, symmetry is a good approximation for LTV limits at around 80 percent or lower, where the cap is binding for the bulk of households. Asymmetry is pronounced for higher levels. At an initial cap of 100 percent, the absolute effect of a ten-percentage-point tightening is 2.33 times that of a ten-percentage-point loosening. At 80 percent, the asymmetry ratio is only 1.17. Tightenings have smaller effects when they start from a point where few households are constrained; conversely, loosenings can have larger effects when starting from a point where many are constrained.
Homogeneity assumption understates effects above the mean LTV: Under the homogeneous-borrower benchmark (all borrowers at the Dutch mean of 93.72 percent), asymmetry is infinite at cap levels of 100 and 95 percent but zero at other levels — a feature that causes effects to be entirely absent for caps above the mean. In the heterogeneous Dutch setting, an increase in the LTV limit from 95 to 105 percent raises house prices by 10.72 percent in the long run; the homogeneous case implies no effect at all.

Scope Conditions and Caveats

The paper does not address welfare or financial stability effects. The VECM impulse responses do not establish economic causality. Anticipation effects — if households front-loaded high-LTV purchases before the cap — would cause the procedure to overstate the effect. The LTI robustness check (which smooths the loan-to-income ratio due to noisy survey responses) yields twenty-year responses of 3.32 percent (house prices) and 0.74 percent (GDP), somewhat lower than the baseline, indicating that not controlling for LTI tends to overstate the LTV-macroeconomy connection. The approach requires a usable pre-cap or recent-prior LTV distribution; it is not directly portable to settings where a loosening is studied and no recent pre-cap distribution is available.

In depth

Q1. What is the fundamental identification challenge this paper faces, and how does the proposed approach address it?

A: The standard challenge is that LTV caps are changed infrequently and have no long time series suitable for regression, so panel studies typically pool countries and use coded dummy variables that impose size-independence of effects. The authors bypass this by using the cross-sectional LTV distribution itself: they measure how each cap level would truncate the underlying distribution and track the implied change in the cross-sectional mean LTV, which is then fed as a shock into a time-series VECM. This approach does not require the cap to have been in place previously, imposes no cross-country coefficient restrictions, and explicitly accounts for the size of the policy change.

Q2. What are the ex ante and ex post approaches to translating cap changes into average LTV changes, and how do their cumulative estimates differ?

A: The ex ante approach applies all successive cap levels to the single pre-cap distribution of 2010M8-2011M7 (after correcting for the June 2011 sales-tax reduction from 6 to 2 percent), without allowing for exceptions. The ex post approach uses the most recent empirical distribution prior to each cap change and accounts for the observed share of borrowers above the cap as exceptions. The ex ante approach yields a cumulative decline in the average LTV of 3.08 percentage points over 2011-2018; the ex post approach yields 1.96 percentage points, roughly one percentage point less. The difference is largely concentrated in 2011-2012 and stems from the ex ante approach not accounting for exceptions to the cap.

Q3. How does the paper correct for the coincident 2011 sales-tax reduction, and why does this matter?

A: In June 2011, the Dutch sales tax on housing purchases fell from 6 to 2 percent, approximately coinciding with the August 2011 imposition of the LTV cap. Without correction, the observed drop in high LTVs in the 106-cap period would conflate the two policy changes. The authors apply a tiered correction: LTVs at or below 100 percent are left unchanged (the data show no notable change in that range); LTVs between 100 and 110 percent are reduced proportionally to the share of total closing costs attributable to the tax; LTVs at or above 110 percent are reduced by the full magnitude of the tax decline. This yields the “tax-adjusted pre-cap distribution” with a mean of 93.72 percent, down from 94.46 percent in the unadjusted data.

Q4. Why does the fraction of constrained households matter so much, and how does it drive non-linearity?

A: The key mechanism is that the average LTV changes when and only when the cap binds for a given borrower. The larger the share of borrowers whose LTV (in the counterfactual uncapped distribution) would exceed the cap, the larger the share of individual LTVs that move in lockstep with any change in the cap, and therefore the larger the aggregate average LTV response and, through the VECM, the house price and GDP response. As the Dutch cap tightened from 105 to 100 percent, the constrained fraction rose from roughly 20 percent to roughly 40 percent, and the annual implied decline in the average LTV grew from 22 basis points to 42 basis points — illustrating monotonically increasing non-linearity within the ex ante approach.

Q5. How does the survey design address the risk of selection bias relative to alternative data sources such as the American Housing Survey?

A: The survey, fielded in January 2016 across both the CentERpanel and LISS panel, asks retrospectively about respondents’ first home purchase, irrespective of whether they still reside there. This avoids the selection bias in the American Housing Survey, where the first-time-buyer flag captures only those still living in the first home — disproportionately selecting homes that are traded less frequently. A single-wave design also avoids the methodological discontinuities that arise from combining multiple survey waves. The resulting series covers 2,238 observations over 1979-2015 (average 60.49 per year).

Q6. What does the VECM cointegration evidence suggest about the long-run relationship between LTV, house prices, GDP, and the real mortgage rate?

A: Augmented Dickey-Fuller tests do not reject a unit root in any of the four series in levels, while all four are stationary in first differences (with the borderline case of log relative house price inflation when an intercept is included). Both the Johansen L-Max and Trace tests reject no cointegration at the 1 percent level, and neither test indicates more than one cointegrating vector. The authors therefore estimate a single-cointegrating-vector VECM with one lag (selected by the Schwarz Information Criterion) over 1981-2015. The long-run relation is normalized so that the coefficient on the log relative house price is one.

Q7. What do the impulse responses in the baseline VECM specification imply for the long-run macro effects of Dutch LTV policy?

A: Under the preferred ex post approach, twenty years after the first shock in 2011 the VECM implies that relative house prices are 4.84 percent lower and real GDP is 1.15 percent lower than the no-cap counterfactual. The bulk of the response materializes within ten years, with house prices 4.18 percent lower and GDP 1.05 percent lower at the ten-year horizon. The twenty-year real mortgage rate response is positive but negligibly small. When the ex ante approach is used instead, responses are larger owing to the larger cumulative LTV impulse.

Q8. How does the paper conduct the mean-preserving heterogeneity exercise, and what are the key quantitative results?

A: The authors generate Pearson-family distributions that match the first four moments of the Dutch pre-cap distribution (mean 93.72, standard deviation 17.09, skewness -1.16, kurtosis 5.97 under the convention that a normal has kurtosis 3), truncated to support (0, 200]. Two alternative distributions are constructed with standard deviations 25 percent below (12.97) and 25 percent above (21.61) the Pearson proxy, holding mean, skewness, and kurtosis constant. The same VECM and Cholesky ordering are applied. Twenty-year house price responses are 12.34 percent (high SD), 8.46 percent (Pearson proxy), and 4.79 percent (low SD). Twenty-year GDP responses are 2.93, 2.01, and 1.14 percent respectively. The ratio of high-to-low-SD responses is 2.58 for both variables.

Q9. How does asymmetry vary across different initial levels of the LTV cap for the Dutch distribution, and what is the intuition?

A: At a starting cap of 100 percent, a ten-percentage-point tightening produces a long-run house price response 2.33 times larger (in absolute value) than a ten-percentage-point easing from the same starting point. At 80 percent the asymmetry ratio falls to 1.17, meaning the effects of tightening and easing are nearly symmetric. The intuition is that at 80 percent the cap is binding for the bulk of the distribution, so both tightenings and easings move a similarly large fraction of borrowers and have large, roughly comparable effects. At 100 percent, far fewer borrowers are currently constrained, so an easing from 100 to 110 moves almost no one whereas a tightening from 100 to 90 moves substantially more.

Q10. What does the comparison of the heterogeneous-borrower and homogeneous-borrower cases reveal about the implications for TANK and HANK models?

A: Under the homogeneous benchmark — all borrowers at the mean Dutch LTV of 93.72 percent — changes in the cap produce infinite asymmetry at cap levels of 100 and 95 percent (tightening has a full effect, easing has zero effect) but zero asymmetry and zero effect for any cap level above 95 percent. For example, an increase in the cap from 95 to 105 percent has no effect in the homogeneous case but raises house prices by 10.72 percent in the heterogeneous case. In sum, homogeneous-borrower models — including TANK frameworks and linearized models with always-binding constraints such as Iacoviello (2005) — overstate asymmetry in a narrow range around the mean LTV and simultaneously understate the effects of cap changes above the mean LTV. The results are more consistent with heterogeneous-agent frameworks, though the authors note they are not aware of any existing HANK paper that investigates asymmetry and non-linearity specifically in response to changes in the borrowing limit.

Q11. What do the robustness checks show about sensitivity of results to LTV measurement choices?

A: The results are robust to all alternative Cholesky orderings, to using the real mortgage rate computed as the nominal rate minus current (rather than two-year moving average) inflation, to using the computed LTV without cross-checking, and to using the directly reported LTV after cross-checking. The most notable alternative is the directly reported LTV without cross-checking, which yields a twenty-year house price response of 3.81 percent and a GDP response of 0.72 percent (ex post approach), somewhat lower than the baseline of 4.84 and 1.15 percent but in the same direction. A further robustness check using an LTV series that extrapolates 2011-2015 values from the Loan Level Data yields larger estimates (cumulative twenty-year house price response of 6.65 percent and GDP response of 1.40 percent), reflecting the LLD series’ more moderate drop in 2014.

Q12. What is the policy implication regarding the importance of distributional information for gauging LTV policy effects?

A: The results imply that knowing the mean of the LTV distribution is not sufficient for estimating the effects of cap changes: the variance — and specifically the fraction of borrowers constrained by the cap — is critical. This is analogous in spirit to the finding of Krueger, Mitman, and Perri (2016) that matching the tails of the wealth distribution, and not just the mean, is essential for determining the aggregate consumption effects of shocks. Existing empirical literature that focuses on the first moment of the LTV distribution will therefore systematically mismeasure the macro effects of LTV limits, and the direction of the bias depends on where the cap stands relative to the distribution.

Key Concepts

Loan-to-value (LTV) cap / limit: The regulatory maximum on the ratio of total mortgage loan amount to the purchase price of the property (excluding buyer-incurred closing costs such as sales taxes and notary fees). In the Netherlands, this was set at 106 percent from August 2011 and reduced annually by one percentage point to 100 percent by January 2018. The paper explicitly distinguishes the cap (the regulatory threshold) from the average LTV (the cross-sectional mean of the distribution, which the cap may or may not bind for all borrowers).

Underlying (or pre-cap) LTV distribution: The cross-sectional distribution of LTV ratios that would prevail in the absence of any LTV cap — approximated in the paper by the empirical distribution in the twelve months before the cap was introduced (2010M8-2011M7, adjusted for the June 2011 sales-tax cut). The shape, mean, and variance of this distribution determine the fraction of borrowers who are constrained by any given cap level and therefore govern the magnitude and symmetry of policy effects.

Mean-preserving change in heterogeneity: A change in the standard deviation of the LTV distribution that holds the mean (and, in the paper’s stylized scenarios, also the skewness and kurtosis) constant. The paper uses this construct to isolate the effect of dispersion per se on the macroeconomic consequences of cap changes, showing that a 25 percent increase in the standard deviation relative to the Dutch baseline more than doubles the macro effects relative to a 25 percent decrease.

Ex ante approach: The method of translating cap changes into average LTV changes that uses only the pre-cap distribution, applying successive cap levels to that single distribution. It does not require an LTV cap to have been in place and is therefore applicable for prospective analysis. It does not account for exceptions to the cap.

Ex post approach: The method that uses the most recent empirical LTV distribution preceding each cap change as the proxy for the counterfactual uncapped distribution, and that explicitly accounts for the observed share of borrowers above the cap (treated as exceptions). Preferred by the authors when feasible because it incorporates information about how the underlying distribution has evolved for reasons unrelated to the current cap change.

Asymmetry ratio: The ratio of the absolute value of the long-run house price (or GDP) response to a ten-percentage-point tightening in the cap to the absolute value of the response to a ten-percentage-point easing from the same initial cap level. A ratio exceeding one indicates that tightenings have larger effects than easings of equal magnitude from the same starting point. In the paper, this ratio is shown to depend critically on where the initial cap sits relative to the underlying distribution.

Non-linearity in LTV effects: The property that changes in the cap from a lower starting point have larger macroeconomic effects than changes from a higher starting point, for a given underlying distribution. This arises because the fraction of constrained borrowers increases as the cap is tightened, so a further tightening moves a larger share of individual LTVs. In the paper, this is documented through the increasing year-on-year effects in Table 1 and the large difference between the house price response to a tightening from 110 to 100 percent (6.12 percent) versus from 100 to 90 percent (14.27 percent).

Pearson system (as used in this paper): A parametric family of distributions in which every combination of the first four moments (mean, variance, skewness, kurtosis) corresponds to a unique distribution. The authors use it to construct smooth approximations to the empirical Dutch distribution with the same mean, skewness, and kurtosis but varying standard deviations, enabling a controlled comparison of heterogeneity scenarios.

How Do Rising U.S. Interest Rates Affect Emerging and Developing Economies? It Depends

Mon, 01 Jan 0001 00:00:00 +0000

Layer 1 — Overview

Research Question

This paper examines how the effects of rising U.S. interest rates on emerging market and developing economies (EMDEs) depend on the underlying source of the interest rate increase. Specifically, it asks: what mix of inflation, reaction, and real shocks has driven changes in U.S. interest rates in recent years; how do these different shock types affect EMDE financial markets, capital flows, borrowing costs, and fiscal outcomes; and how do they affect the likelihood of EMDE financial crises?

Motivation and Context

Written in late 2022 against the backdrop of the Federal Reserve’s most aggressive tightening cycle since the 1990s, the paper argues that the standard practice of treating all interest rate increases as equivalent is misleading. Whether rising U.S. rates reflect strengthening growth, rising inflation expectations, or a perceived hawkish shift in the Fed’s reaction function carries very different implications for EMDEs already burdened by post-COVID debt at record highs and scarring from the pandemic.

Methodology

Three distinct empirical approaches are used, chosen to match the data frequency and parsimony requirements of each research question.

A sign-restricted Bayesian VAR model with stochastic volatility is estimated on monthly U.S. data (January 1982 - September 2022) using four variables: 2-year Treasury yield, 10-year Treasury yield, S&P 500 index, and 5-year breakeven inflation expectations. Sign restrictions identify three shocks: (i) real shocks raise both yields, equity prices, and inflation expectations; (ii) inflation shocks raise yields and inflation expectations but lower equity prices; (iii) reaction shocks raise yields but lower both equity prices and inflation expectations.
Panel local projection models (Jorda 2005) are estimated at quarterly frequency for 17-38 EMDEs over 1997Q2-2019Q4, excluding the 2008Q4-2009Q4 global financial crisis and the COVID-19 pandemic. The models link the VAR-identified quarterly shock series (normalized to represent a 25-basis-point move in the 2-year yield) to EMDE financial, real, and fiscal variables, including local-currency bond yields, EMBI+ sovereign spreads, capital flows, real GDP components, CPI inflation, the real effective exchange rate, primary fiscal balance, government revenues, expenditures, gross debt, and debt composition.
A panel logit model with random effects is estimated on annual data for 139 EMDEs over 1985-2018, linking the three shock types to the probability of banking, currency, and sovereign debt crises (as defined by Laeven and Valencia 2020).

Key Findings

Shock decomposition: Real shocks account for the largest share of variance in 2-year U.S. yields over the full sample (39 percent at a 10-month horizon); inflation shocks explain 14 percent and reaction shocks 13 percent. However, since the start of 2022, reaction and inflation shocks together account for approximately three-quarters of the cumulative increase in yields, with real shocks playing a negligible role.

Financial market and macroeconomic spillovers: Conditional on a 25-basis-point shock, reaction shocks produce significantly adverse EMDE outcomes: widening sovereign spreads (EMBI+), declining capital flows, real exchange rate depreciation, and unlike inflation shocks, statistically significant declines in private consumption and fixed investment. Inflation shocks raise domestic EMDE CPI significantly. By contrast, real shocks are associated with declining sovereign spreads, rising capital flows, real exchange rate appreciation, and higher real exports, with other real GDP components unaffected.

Fiscal outcomes: In response to inflation and especially reaction shocks, EMDE governments improve their primary balances almost exclusively through expenditure cuts, consistent with tighter credit availability constraining fiscal space. Real shocks also improve primary balances, but through both revenue gains and expenditure reductions. Government debt declines in response to all three shock types, though the decline is statistically significant only for real shocks.

Debt composition: Reaction shocks shift debt composition toward shorter maturities and foreign-currency instruments (the latter reflecting exchange rate depreciation mechanically raising the local-currency value of foreign-currency debt). Real shocks shift composition toward longer maturities and higher external creditor participation, consistent with improved market access.

Heterogeneity by credit rating: Investment-grade and noninvestment-grade EMDEs show broadly similar responses to reaction shocks, with the exception of statistically larger yield responses for noninvestment-grade economies. The paper notes this finding contrasts with several prior studies that find stronger fundamentals buffer spillovers.

Crisis probabilities: A 25-basis-point increase in 2-year U.S. yields driven by a reaction shock almost doubles the baseline probability of financial crisis in the average EMDE, from 3.5 percent to 6.6 percent. Extrapolating the nonlinear logit relationship to the 114-basis-point reaction-shock-driven increase in 2-year yields that occurred from January through September 2022 implies the probability of financial crisis in the average EMDE rising approximately 36 percentage points, to nearly 40 percent. The paper cautions that no comparable yield episode occurred in the 1985-2018 estimation sample, so this extrapolation carries substantial uncertainty. Inflation shocks are associated with only small, statistically insignificant changes in crisis probability; real shocks reduce the probability of sovereign debt crisis while raising currency crisis probability by less than reaction shocks do.

Historical episode analysis: The 2013 taper tantrum was dominated by reaction shocks, causing 10-year yields to rise by approximately 100 basis points; sovereign spreads widened by 60 basis points in the May-June 2013 window and capital flows dropped sharply. The 2022 tightening episode was driven by reaction and inflation shocks (reaction shocks adding 114 basis points to 2-year yields through September 2022), with five-year breakeven inflation expectations breaching 3 percent for the first time in the two-decade history of the series. The 2004-2006 build-up to the global financial crisis involved a mix of all three shock types with real shocks prominent, and EMDE financial conditions remained broadly benign.

In depth

Q1. How are the three shock types identified, and what makes this identification strategy credible?

The identification uses sign restrictions imposed on a Bayesian VAR with stochastic volatility. A real shock is identified as one that simultaneously raises 2-year yields, 10-year yields, S&P 500 equity prices, and inflation expectations. An inflation shock raises all yields and inflation expectations but lowers equity prices the equity decline signals that higher rates are not accompanied by stronger growth prospects. A reaction shock raises all yields but lowers both equity prices and inflation expectations the fall in inflation expectations distinguishes it from an inflation shock and signals that markets perceive the Fed is tightening beyond what current inflation warrants. Covering both short- and long-maturity yields in the sign restrictions ensures the identified shocks capture both conventional and unconventional (e.g., quantitative easing tapering) policy moves.

At a 10-month horizon, real shocks explain 39 percent of the forecast error variance in 2-year U.S. Treasury yields, making them the dominant driver over the full sample (January 1982 - September 2022). Inflation shocks account for 14 percent and reaction shocks for 13 percent. Together the three identified shocks explain roughly two-thirds of total yield variation; the remaining one-third reflects residual or unclassified movements.

Q3. How did the composition of shocks driving 2-year yields change from 2021 into 2022?

Starting in September 2021, as inflation mounted and the Fed pivoted toward aggressive tightening, reaction and inflation shocks became the dominant drivers of 2-year yield increases. By September 2022, reaction and inflation shocks together accounted for approximately three-quarters of the cumulative increase in yields from the beginning of 2022, with reaction shocks alone contributing 114 basis points to the 2-year yield.

Q4. What are the financial market effects of a 25-basis-point reaction shock on EMDEs?

Reaction shocks produce significant adverse effects on EMDE financial markets within one quarter: 10-year local-currency government bond yields rise significantly, EMBI+ sovereign spreads widen significantly, capital flows decline significantly, and the real effective exchange rate depreciates significantly. Short-term (3-month) yields and equity prices also deteriorate, but these movements are not statistically significant at conventional levels.

Q5. How do financial market effects of inflation shocks compare to reaction shocks?

Inflation shocks generate adverse directional effects similar to reaction shocks rising 10-year yields, declining capital flows, real exchange rate depreciation, and falling equity prices but with the notable difference that, except for equity prices, these effects are generally not statistically significant. The paper thus finds that reaction shocks are more potent drivers of EMDE financial market tightening than inflation shocks.

Q6. How do real shocks affect EMDE financial conditions?

Real shocks produce outcomes broadly opposite to those from inflation and reaction shocks. They are associated with significant declines in EMBI+ sovereign spreads, significant increases in capital flows, significant real effective exchange rate appreciation, and significant increases in equity prices. Ten-year government bond yields do rise consistent with global bond market integration but this occurs alongside improving risk sentiment, not financial stress.

Q7. What are the macroeconomic (real activity) effects of the three shock types?

Reaction shocks produce a statistically significant decline in real GDP components, particularly in private consumption expenditure and gross fixed capital formation (fixed investment), within one quarter. Real shocks lead to higher real exports consistent with beneficial demand spillovers from stronger U.S. activity while leaving other GDP components unchanged. Inflation shocks induce a large and statistically significant increase in domestic EMDE CPI inflation, while real shocks reduce it; neither produces significant real GDP effects beyond the export channel.

Q8. How do EMDE fiscal balances respond differently to the three shock types?

Both inflation and especially reaction shocks are followed by an improvement in the EMDE primary balance (smaller deficit or larger surplus), achieved almost exclusively through declines in government expenditure. The paper attributes this to tighter credit availability and higher borrowing costs constraining fiscal space. Real shocks also improve primary balances, but the mechanism differs: both revenue increases and expenditure decreases contribute to the improvement. Declines in gross government debt occur in response to all three shocks but are statistically significant only for real shocks.

Q9. How does the composition of government debt shift in response to the different shocks?

Following inflation and reaction shocks, debt held by external creditors declines significantly as a share of total government debt, consistent with reduced access to global credit markets. Short-term debt eventually rises following both shock types. Foreign-currency debt rises considerably following reaction shocks likely reflecting the mechanical effect of currency depreciation boosting the local-currency value of pre-existing foreign-currency obligations. Conversely, following real shocks, external creditor participation rises significantly (improved market access), foreign-currency debt shares remain broadly stable, and short-term debt declines significantly (consistent with maturity extension by fiscal authorities seeking to minimize rollover risk under favourable conditions).

Q10. Do investment-grade and noninvestment-grade EMDEs respond differently to reaction shocks?

The paper finds little evidence of important differences between investment-grade and noninvestment-grade EMDEs in their responses to reaction shocks across most variables. Noninvestment-grade economies do show statistically larger increases in 10-year bond yields, and larger increases in EMBI+ spreads and 3-month yields than investment-grade economies though the latter two differences are not statistically distinguishable. For fiscal, GDP, and capital flow outcomes, the two groups respond similarly. The paper notes this finding is inconsistent with several prior studies but consistent with others, concluding the role of fundamentals remains unresolved.

Q11. How does the probability of financial crisis in EMDEs respond to the three shock types?

In the baseline (explanatory variables at sample means), the average EMDE faces a 3.5 percent probability of experiencing any type of financial crisis in a given year, with currency and banking crises the most common and sovereign debt crisis the least. Reaction shocks drive by far the largest increase: a 25-basis-point increase in 2-year yields from a reaction shock almost doubles the crisis probability to 6.6 percent. Inflation shocks produce small and statistically insignificant effects. Real shocks reduce the probability of sovereign debt crisis (consistent with their benign effects on financial markets) while raising currency crisis probability by less than reaction shocks.

Q12. What does the nonlinear logit relationship imply for the 2022 tightening cycle specifically?

Because the logit function is nonlinear, a doubling of the shock size leads to a more-than-proportional increase in crisis probability. Applying the estimated model to the 114-basis-point reaction-shock contribution to 2-year yields from January to September 2022, the model implies that the probability of financial crisis in the average EMDE increased by approximately 36 percentage points, to nearly 40 percent. The paper emphasizes this estimate carries wide uncertainty because no comparable yield increase occurred during the 1985-2018 estimation period, placing this extrapolation well outside the sample’s support.

Q13. What crisis dynamics were already materializing in 2022 consistent with the model predictions?

By the time of writing (late 2022), seven EMDEs had experienced currency depreciations of at least 30 percent against the U.S. dollar meeting the Laeven and Valencia (2020) threshold for a currency crisis and 21 EMDEs had reached agreements with the IMF for additional financing. The paper notes these developments had occurred despite standard macroeconomic factors (interest rate differentials and flight-to-safety flows) not fully explaining the magnitude of depreciations.

Q14. What robustness tests were conducted, and did they alter the main conclusions?

The VAR decomposition was re-estimated using weekly rather than monthly data. The three-shock model was simplified to two shocks (real versus monetary, combining inflation and reaction). The VAR was extended to include real GDP and PCE inflation with contemporaneous exclusion restrictions to insulate shock identification from current macroeconomic conditions. Inflation expectations were replaced with the Haubrich, Pennacchi, and Ritchken (2012) model-based measure throughout, rather than only pre-2003. For the crisis probability models, panel probit with random effects and panel logit with fixed effects were estimated alongside the baseline panel logit with random effects. In all cases, the results were not materially different: inflation and reaction shocks remained more adverse than real shocks for EMDE financial and fiscal variables, and only reaction shocks produced statistically significant increases in overall crisis probability. One noteworthy robustness finding: when combining inflation and reaction into a single monetary shock, the relative importance of the inflation component appears somewhat larger than when the two are separated.

Q15. What are this paper’s main contributions relative to existing literature?

The paper makes three stated contributions. First, it is the first to decompose the evolution of U.S. interest rates over the COVID-19 pandemic recession, subsequent recovery, and 2021-22 inflation surge into the separate contributions of real, inflation, and reaction shocks. Second, it extends prior work on EMDE spillovers (e.g., Arteta et al. 2015; Hoek, Kamin, and Yoldas 2021, 2022) by showing how different shock types affect government budget balances, revenues, expenditures, and debt composition, and by expanding the EMDE country sample. Third, it is the first to examine how real, inflation, and reaction shocks differentially affect the probability of banking, currency, and sovereign debt crises in EMDEs.

Key Concepts

Reaction shock: In this paper’s framework, a change in U.S. interest rates caused by a perceived shift in the Federal Reserve’s reaction function toward a more hawkish policy stance. Identified as a shock that raises both 2-year and 10-year Treasury yields while simultaneously lowering equity prices and lowering inflation expectations. The fall in inflation expectations distinguishes this shock from an inflation shock and signals that markets believe the Fed is tightening beyond what current inflation alone would warrant.

Inflation shock: A change in U.S. interest rates caused by rising expectations of U.S. inflation. Identified as a shock that raises both yields and inflation expectations but lowers equity prices. The equity decline signals that higher rates reflect inflationary pressure rather than improved growth prospects.

Real shock: A change in U.S. interest rates driven by improved prospects for U.S. real economic activity. Identified as a shock that simultaneously raises both yields, equity prices, and inflation expectations. The equity increase distinguishes this shock from the other two and signals that higher rates are accompanied by strengthening U.S. growth.

Sign-restricted Bayesian VAR with stochastic volatility: The paper’s primary model for decomposing U.S. yield movements. Sign restrictions on four variables (2-year yield, 10-year yield, S&P 500, 5-year inflation expectations) identify the three shock types without requiring timing restrictions. Stochastic volatility is incorporated to handle the heteroskedastic financial data and the COVID-19 period’s unusual size and nature; the model covers February 1982 to September 2022 at monthly frequency.

Panel local projection (Jorda 2005): The empirical framework linking the VAR-identified shock series to EMDE outcomes at quarterly frequency. Direct estimation of impulse responses at each horizon h avoids the misspecification accumulated in iterated VAR forecasts and permits straightforward incorporation of state-dependent (investment-grade vs. noninvestment-grade) heterogeneity via a dummy-variable interaction specification.

Capital flows (as used in this paper): Defined specifically as increases in net portfolio and other investment liabilities of EMDEs, excluding foreign direct investment liabilities. This definition isolates the more volatile, financially driven flows rather than the longer-horizon FDI component.

Financial crisis typology (Laeven and Valencia 2020): The crisis classification underlying the logit analysis. Sovereign debt crises are defined as a government default or restructuring of debt owed to private creditors. Banking crises require significant distress in the banking system combined with significant policy intervention measures. Currency crises are defined as a sharp nominal depreciation of at least 30 percent against the U.S. dollar. The paper uses these definitions from Laeven and Valencia (2020), extended through 2018 in Kose et al. (2021).

Primary budget balance improvement via expenditure compression: In the paper’s framework, the fiscal adjustment mechanism triggered specifically by inflation and reaction shocks: EMDE governments improve their primary balance (reduce deficits or increase surpluses) almost exclusively by cutting expenditures, rather than raising revenues, as a response to the credit tightening and higher borrowing costs associated with adverse U.S. interest rate shocks.

Ideological Alignment and Evidence-Based Policy Adoption

Mon, 01 Jan 0001 00:00:00 +0000

This paper investigates how the ideological alignment between knowledge-disseminating institutions and policymakers affects the adoption of evidence-based policies. The core research question is whether, and through which mechanisms, the ideology of the messenger — rather than the content of the message — determines whether local policymakers act on rigorous research evidence.

The authors conduct a country-wide randomized controlled trial (RCT) across 5,678 touristic Spanish municipalities. The policy recommendation derives from Hinnosaar et al. (2021), an RCT demonstrating that minor improvements to municipalities’ Wikipedia pages (adding photographs, local festival information, touristic landmark details) increased overnight tourist stays by 9%. This policy was chosen because it is ideologically neutral, low cost, within local policymakers’ remit, and its implementation is directly traceable via Wikipedia edit histories.

Municipalities were randomized into five treatment arms and a control group (approximately 950 municipalities each), stratified by ruling party ideology, population, and touristic accommodation count. Three arms received the same policy brief endorsed by: (1) an ideologically aligned think tank (FAES for right-wing municipalities, Fundación Alternativas for left-wing), (2) the ideologically opposite think tank, or (3) an ideologically nonsalient researcher from the London School of Economics. Two further arms received links to newspaper articles covering the same research from either an ideologically aligned outlet (El Mundo for right, Eldiario.es for left) or an ideologically opposite outlet. The control group received no information. The experiment ran from May to December 2022, with multiple reminder emails sent across the period.

The main outcome is a binary indicator for whether a municipality’s Wikipedia page was changed in line with the recommended guidelines during the study period, coded blind to treatment status by two independent coders.

Key findings: Pooled across all treatment arms, information provision increased the probability of policy adoption by approximately 0.98 percentage points (a 38% relative increase over the control group baseline), but this effect is only marginally above conventional significance thresholds (p-value = 0.13). The aggregate effect masks sharp heterogeneity by ideological alignment. When the informing institution’s ideology aligns with the policymaker’s, policy adoption increases by 1.68 percentage points (think tank) and 1.67 percentage points (newspaper) relative to the control group — equivalent to a 66% and 65% relative increase, respectively, both statistically significant at the 5% level. By contrast, information from an ideologically opposite institution produces a coefficient that is negligible and statistically indistinguishable from zero, indicating that misaligned information is no more effective than receiving no information at all. The ideologically nonsalient LSE researcher arm produced an intermediate effect (0.94 percentage points, 37% relative increase), but the p-value (0.27) exceeds conventional thresholds, and the effect is not statistically distinguishable from either the aligned or the control condition. Policy briefs and newspaper articles are equally effective when ideologically aligned (difference of 0.1 percentage points, p-value = 0.82).

To decompose mechanisms, the authors propose a three-stage framework: (1) selective exposure to information, (2) belief updating, and (3) policy implementation. Email click-through rates (access to the full policy brief or article once the informing institution is revealed) do not differ significantly across treatment arms, ruling out selective exposure as the operative mechanism. A post-intervention online survey experiment with 1,600 policymakers from 1,196 municipalities shows that those receiving information from an aligned or nonsalient institution updated their beliefs about policy effectiveness significantly more than those receiving information from an opposite institution, implicating belief updating as one operative channel. However, comparing the survey experiment (where nonsalient and aligned treatments produce similar belief updating) with the main experiment (where the aligned arm adopts at nearly twice the rate of the nonsalient arm, though not statistically distinguishable) suggests that ideological alignment also affects the third stage — policy implementation — beyond mere belief updating.

The estimated monetary cost of ideological misalignment is 2,192 euros per municipality per year, calculated using the impact of Wikipedia changes on touristic revenues from Hinnosaar et al. (2021).

Scope conditions: The context is Spanish local government, a policy that is explicitly non-ideological, low-cost, and easily implemented. Generalizability to ideologically charged or costly policies is not established. Left-wing municipalities show larger responses to aligned information, though this heterogeneity is not statistically significant at conventional levels.

Q: What is the baseline rate of policy adoption in the control group, and what does the aligned-institution treatment achieve in absolute terms?

A: The paper reports that ideologically aligned institutions increase the share of municipalities implementing recommended Wikipedia changes by 1.68 percentage points (think tank) and 1.67 percentage points (newspaper) relative to the control group. Working backward from the stated 66% and 65% relative increases, this implies a control group baseline of approximately 2.5 percentage points. The aligned effects are statistically significant at the 5% level.

Q: Does information from an ideologically opposite institution have any effect on policy adoption?

A: No. The coefficient for opposite-ideology treatment arms is negligible in magnitude, closely resembling the near-zero coefficients from the placebo analysis conducted for the same months in 2019 (pre-intervention). The authors conclude that receiving information from an ideologically opposite institution is statistically indistinguishable from receiving no information at all. This null result is consistent across heterogeneity analyses by mayor ideology, municipality population, Wikipedia page length, and party type.

Q: How does the ideologically nonsalient (LSE researcher) treatment compare to aligned and opposite arms?

A: The nonsalient arm increases policy adoption by 0.94 percentage points (a 37% relative increase), approximately half the effect of the aligned arm (1.68 percentage points). However, the p-value is 0.27, and the effect is not statistically different from either the aligned arm (p-value = 0.34) or the control group at conventional confidence levels. The result should therefore be interpreted with caution.

Q: Are policy briefs or newspaper articles more effective in promoting policy adoption?

A: Neither format is significantly more effective than the other. Conditional on ideological alignment, the difference between policy brief and newspaper article effects is 0.1 percentage points with a p-value of 0.82. Both are equally effective when ideologically aligned with the receiving policymaker, a finding the authors describe as a novel contribution to the policy communication literature.

Q: Does ideological alignment affect whether policymakers choose to access the full information (selective exposure)?

A: No. Click-through rates on the links to policy briefs or newspaper articles — measured after policymakers have seen the informing institution’s identity — do not differ significantly across treatment arms. The observed average click-through rate is 6.42%. This null result is consistent with the hypothesis that policymakers do not strategically filter information acquisition based on the messenger’s ideology, at least for non-ideological policies.

Q: What does the survey experiment reveal about belief updating?

A: In the post-intervention survey experiment with 1,600 policymakers, participants first reported beliefs about a purportedly beneficial (but actually harmful) policy, then were randomly assigned to receive information about its negative effects from an aligned, opposite, or nonsalient think tank. Those receiving information from an aligned or nonsalient institution updated their beliefs significantly more than those receiving information from an ideologically opposite institution. This implicates belief updating — not just selective exposure — as a channel through which ideological alignment affects policy adoption.

Q: Why do the authors conclude that ideological alignment also affects the third stage (policy implementation) beyond belief updating?

A: In the survey experiment, aligned and nonsalient institutions produce statistically similar belief updating. Yet in the main field experiment, the aligned arm adopts policy at nearly twice the rate of the nonsalient arm (1.68 vs. 0.94 percentage points), although this difference is not statistically significant. The authors interpret this gap as suggestive evidence that ideological alignment affects policy implementation through channels beyond belief updating — such as career concerns, party cues, or the political economy of implementation — though they acknowledge the evidence is indirect and the treatment difference is not statistically distinguishable.

Q: What is the estimated economic cost of ideological misalignment?

A: The authors estimate a cost of 2,192 euros per municipality per year attributable to ideological misalignment between the informing institution and the receiving policymaker. This calculation uses the estimated impact of Wikipedia changes on touristic revenues from Hinnosaar et al. (2021) and reflects not the cost of not implementing the policy, but the marginal cost of using an ideologically opposite rather than aligned institution to disseminate the research evidence.

Q: How did outside researchers’ predictions compare to actual results?

A: Researchers surveyed on the Social Science Prediction Platform correctly anticipated the rank ordering of treatment effectiveness (aligned > nonsalient > opposite > control) but substantially overestimated adoption rates in every arm. They predicted relative increases of 144%, 103%, and 48% for aligned, nonsalient, and opposite conditions respectively, compared to actual relative increases of roughly 65%, 37%, and ~0%. Email opening rates were the most accurately predicted (49% predicted vs. 38% actual). The results highlight the difficulty of translating evidence into policy even for simple, low-cost interventions.

Q: What are the main threats to validity and how are they addressed?

A: Three main threats are considered. First, differential email opening rates across treatment arms: addressed by showing the informing institution was revealed only after email opening, and confirmed by finding no significant differences in opening rates across groups. Second, spillovers between municipalities: the endline survey shows only 5 of 236 control-group respondents reported receiving any information from external sources; spillover distance analyses in Table D.II find no significant effect on control municipalities’ adoption rates. Third, contamination bias in multi-arm RCTs with strata fixed effects: addressed by replicating main results using the Goldsmith-Pinkham et al. (2022) method, yielding nearly identical estimates.

Q: What heterogeneity is observed across left- and right-wing municipalities?

A: The positive effect of receiving information from an ideologically aligned institution appears larger for left-wing municipalities, with coefficients approximately three times larger than for right-wing municipalities, but this difference is not statistically significant at conventional confidence levels. The authors caution that the strength of ideological alignment may differ systematically between the partner think tanks on the left and right, making direct comparisons between left- and right-wing effects difficult to interpret cleanly.

Q: How does the paper relate to prior work on evidence-based policymaking?

A: The closest prior work is Hjort et al. (2021) and Mehmood et al. (2024), which examine the impact of scientific evidence access on actual policy adoption, and DellaVigna and Kim (2022), which identifies ideology as a factor in the diffusion of innovative policies across governments. The present paper’s main contribution is being the first to isolate the causal effect of ideological alignment on policy adoption using a large-scale field experiment with real, authoritative ideological institutions — rather than surveys or hypothetical scenarios — while using a non-ideological policy recommendation to avoid confounding messenger ideology with policy ideology.

Ideological alignment: In this paper’s usage, the congruence between the political ideology of the institution disseminating research evidence (think tank or newspaper) and the political ideology of the local government receiving that information. Alignment is operationalized by matching right-wing municipalities with right-leaning institutions (FAES, El Mundo) and left-wing municipalities with left-leaning institutions (Fundación Alternativas, Eldiario.es).

Evidence-based policy adoption: The actual implementation by local policymakers of a policy recommendation derived from published peer-reviewed research — measured here as whether a municipality’s Wikipedia page was edited in line with specific recommended guidelines during the study period, not merely expressed intention or stated support.

Knowledge brokers: Institutions, such as think tanks, that serve as intermediaries between academic researchers and policymakers, translating and disseminating research findings in accessible formats (policy briefs) to bridge the gap between evidence and policy.

Nonsalient ideology: A condition in which the informing institution carries no salient or recognizable partisan affiliation, operationalized here by a foreign research university professor (LSE) whose institutional identity does not carry a clear left-right signal in the Spanish political context.

Three-stage policy adoption framework: The authors’ conceptual structure positing that ideology can interfere at three sequential stages: (1) selective exposure — whether policymakers choose to access information once the messenger’s ideology is revealed; (2) belief updating — whether policymakers revise their assessment of a policy’s effectiveness upon receiving evidence; and (3) policy implementation — whether policymakers act on updated beliefs to adopt the policy.

Selective exposure: The tendency of individuals to avoid information from sources whose ideology conflicts with their own prior beliefs; in this paper, operationalized as differential click-through rates on links to policy briefs or news articles after the informing institution’s identity is revealed.

Motivated reasoning: A documented tendency, also observed in policymakers, to reject or discount evidence that contradicts ideologically held prior beliefs — the mechanism proposed to explain why opposite-ideology information fails to update beliefs as effectively as aligned-ideology information.

Intergenerational Impacts of Secondary Education: Experimental Evidence from Ghana

Mon, 01 Jan 0001 00:00:00 +0000

This paper provides experimental evidence on the intergenerational impacts of secondary education subsidies in a low-income context, leveraging a randomized controlled trial (RCT) conducted in rural Ghana with a 15-year longitudinal follow-up. The study exploits a 2008 scholarship lottery in which 682 students — drawn from 2,064 rural youth who had been admitted to public senior high school but had not enrolled due to financial constraints — were randomly selected to receive four-year secondary school scholarships covering full tuition and fees. Scholarship receipt increased senior high school completion by 27–28 percentage points for both men and women (from 39.8% to 67.2% for women; from 49.7% to 77.9% for men), and raised average years of education by 1.33 years.

The central research question is whether secondary education subsidies generate intergenerational benefits — specifically, whether children of scholarship recipients have better survival and cognitive development outcomes — and what mechanisms drive any such effects.

For female scholarship recipients, the scholarship significantly altered fertility timing and partnership. By 2013, female recipients were 6.9 percentage points less likely to have ever been pregnant (on a control-group base of 48.3%), with the decline driven almost entirely by a 7 percentage point (17%) reduction in unwanted pregnancies. Though total fertility eventually caught up by 2022, recipients were still less likely to be married or cohabiting as of 2019 and were significantly more likely to have a partner with tertiary education.

Children of female scholarship recipients experienced substantially lower mortality. Among control-group female respondents, 3.5% of children died before age one and 4.0% before age three. These rates fell to 1.7% (p=0.028) and 2.2% (p=0.065) respectively among children of female recipients — a roughly 45–51% reduction in under-one and under-three mortality.

Child cognitive development gains emerge only once children reach school age. Children of female recipients show no significant cognitive score differences at 18 months, 2.5 years, or 3.5 years, but score 0.238 standard deviations higher at age five (p=0.005) and 0.252 standard deviations higher at age seven (p=0.035). Effects span language, math and numeracy, spatial reasoning, and executive function, but not socio-cognitive development. These effect sizes fall between the 75th and 80th percentile of RCT-based educational intervention effect sizes in low- and middle-income countries.

The primary mechanism is not higher income or greater monetary investment in children. The study finds no significant treatment effect on household SES index (0.107 SDs, p=0.103), no impact on formal schooling inputs, and no difference in parental aspirations or knowledge of child stimulation’s importance. Instead, more-educated mothers seek more prenatal care, engage in more preventive health behaviors, and — critically — spend more time interacting with their children in stimulating ways. Day-long LENA (Language Environment Analysis) recordings at 18 months confirm 20% more adult-child conversational turns per minute (effect size 0.068, p=0.005) and 17% more child vocalizations per minute (effect size 0.32, p=0.014) for children of female recipients.

For male scholarship recipients, no analogous intergenerational benefits appear. Their partners are not more educated (in fact slightly less educated on tertiary rates), their children show no mortality improvement, and cognitive scores are if anything negative at age five (point estimate -0.22, p=0.069). The absence of effects is attributed to male scholarship recipients having caregivers — overwhelmingly mothers — with no more education than in the control group, and to children of male recipients being 8.7 percentage points less likely to live with their father.

A cost-benefit analysis finds internal rates of return (IRR) of 27%–76% for a female-only means-tested scholarship program and 20%–51% for a mixed-gender program. The cost per under-three death averted ($15,184 for female-only) places the scholarship program within the range of the 10th-percentile most cost-effective WHO-recommended child health interventions.

Scope conditions: the study estimates effects for students who qualified for senior high school but faced binding financial constraints in rural Ghana in 2008 — a population that is well-prepared academically but economically disadvantaged. Results may not generalize to students who would not have qualified for secondary school or to contexts where financial barriers are not binding.

Q: What was the experimental design and who was in the study sample? A: In 2008, 2,064 rural Ghanaian students who had been admitted to senior high school (SHS) but had not enrolled — typically due to inability to pay fees — were sampled. After a baseline survey, 682 were randomly selected (approximately one-third) by lottery to receive a four-year scholarship covering full tuition and fees for a day (non-boarding) student, stratified by district, school, gender, and exam-year cohort. The two-thirds comparison group received no scholarship. Students were on average 17 years old at baseline and just over 31 at the last follow-up in Spring 2023.

Q: How large was the scholarship’s effect on educational attainment? A: Scholarship receipt raised SHS completion from 39.8% to 67.2% among women (a 69% increase) and from 49.7% to 77.9% among men (a 57% increase). Overall, the scholarship led to an average of 1.33 more years of education. For women only, it also significantly raised tertiary education: by 2023, scholarship receipt increased tertiary completion by 10.8 percentage points for women, but had no significant tertiary effect for men.

Q: What were the effects on fertility and family formation for female scholarship recipients? A: By 2013, female recipients were 6.9 percentage points less likely to have ever been pregnant (base: 48.3% in control), driven almost entirely by a 7 percentage point (17%) reduction in unwanted pregnancies. By 2019, recipients were still 6 percentage points less likely to have started childbearing and had 0.152 fewer children on average (p=0.065). Total fertility eventually caught up by 2022. By 2016, female recipients were 12.1 percentage points (24% of control mean) less likely to have ever lived with a partner, and by 2019 were 6.2 percentage points less likely to be married or cohabiting. Conditional on having a partner, they were significantly more likely to have a partner who completed tertiary education (p=0.071).

Q: What were the effects on fertility and family formation for male scholarship recipients? A: Male recipients showed few changes in fertility or marriage behavior. They were 7.8 percentage points (30% of control mean) more likely to still be living with their parents as of 2019. Their partners were not more educated; in the cognitive games subsample, treatment actually reduced the share of partners with tertiary education by 3.6 percentage points from a control base of 4.3%.

Q: What were the child mortality results for children of female scholarship recipients? A: Among children of female control respondents, 3.5% died before age one and 4.0% before age three. These fell to 1.7% (p=0.028) and 2.2% (p=0.065), respectively, among children of female recipients — approximately a halving of under-one and under-three mortality. These point estimates are robust to varying the covariates (linear vs. fixed effects for birth year, dropping or adding controls). After multiple-hypothesis testing adjustment using the Romano-Wolf step-down procedure, the p-value for survived-to-one rises from 0.028 to 0.119.

Q: What were the child mortality results for children of male scholarship recipients? A: The estimated effects for children of male recipients were smaller and statistically insignificant: a 1.4 percentage point increase in survived-to-one (p=0.161) and 0.9 percentage points in survived-to-three (p=0.549). These estimates are not significantly different from those for female recipients. Results were sensitive to sample perturbations given the smaller sample: only 26 of 1,016 children of male respondents died before age one.

Q: What child cognitive development gains did children of female scholarship recipients show, and at what ages? A: No significant differences emerged at 18 months (-0.066 SDs, p=0.489), 2.5 years (-0.024 SDs, p=0.850), or 3.5 years (0.026 SDs, p=0.736). Significant gains appeared at age five (0.238 SDs, p=0.005) and age seven (0.252 SDs, p=0.035). Effects span language (0.15 SDs at five; 0.27 SDs at seven), math and numeracy (0.15 SDs; 0.26 SDs), spatial reasoning (0.20 SDs; 0.12 SDs), and executive function (0.25 SDs; 0.20 SDs), but not socio-cognitive development. These effect sizes fall between the 75th and 80th percentile of educational RCT effect sizes in low- and middle-income countries.

Q: What cognitive development effects did children of male scholarship recipients show? A: No significant positive effects emerged at any age. Point estimates were negative at all ages except 18 months, and marginally significantly negative at age five (-0.22 SDs, p=0.069). The difference in treatment effects between children of male and female recipients is statistically significant at age five (p=0.005).

Q: Why do cognitive gains appear only at age five and not earlier? A: The authors offer three interpretations: first, that the cognitive tests for younger children are noisier instruments (cross-sectional and longitudinal correlations within domains are much lower for 1.5-year tests than 5-year tests); second, that impacts on cognitive development may take time to materialize; third, that marginal survivors in the treatment group may start with a cognitive deficit (e.g., surviving a cerebral malaria episode), and maternal education effects require time to overcome this initial handicap. Gains concentrate on skills underlying literacy and numeracy, consistent with more educated mothers bridging home and school environments.

Q: What is the primary mechanism driving intergenerational effects? A: The primary mechanism is changes in parenting behaviors, not income. Female recipients do not invest more money in children (no significant difference in SES index or child investment index). Instead, they seek more prenatal care, engage in significantly more preventive health behaviors, and interact more with their children in cognitively stimulating ways. Day-long LENA recordings at 18 months show 20% more conversational turns per minute (effect size 0.068, p=0.005) and 17% more child vocalizations per minute (effect size 0.32, p=0.014). Caregiver reports confirm more playing, singing, and doing simple mathematics with children.

Q: Does the income effect of scholarship receipt explain the child outcomes? A: No. Duflo et al. (2024) find no significant earnings impacts until 2019 or later, meaning children tested at ages five and seven by 2023 largely grew up before their mothers’ earnings improved. The household SES index shows only a 0.107 SD gain (p=0.103), indistinguishable from the effect for children of male recipients. There is also no evidence of a quality-quantity trade-off: caregivers of scholarship recipients do not have fewer children to care for.

Q: Does the increase in maternal age at birth explain the child mortality reduction? A: It is not the primary driver. Maternal age at birth increases by only 0.349 years on average (p=0.142) for children of female recipients, and 0.64 years for first-born children (p=0.040). Point estimates on mortality for first-born children are somewhat smaller than for the full sample, suggesting maternal age is not the main channel. Moreover, maternal age at birth falls for children of male recipients yet their survival point estimates are positive, which further argues against maternal age as the primary mechanism.

Q: How does the education of the primary caregiver mediate the results? A: For 84% of children in the sample, the primary caregiver is the child’s mother. Children of female scholarship recipients have caregivers who are 25 percentage points more likely to have completed secondary school and 5 percentage points more likely to have completed tertiary education. Children of male scholarship recipients have caregivers with no more education than the control group, because the recipients’ partners — the typical caregivers — are not more educated. Treatment effects for female recipients are not altered when father’s education is added as a control, confirming maternal education as the main driver.

Q: What threat to validity arises from co-residence of the father? A: Children of male scholarship recipients are 8.7 percentage points less likely to live with their father (p=0.024), compared to no such effect for children of female recipients (92% of whom live with their scholarship-recipient mother). LENA recordings show negative treatment effects for children of male recipients — fewer adult words and conversational turns — consistent with father absence mechanically reducing auditory engagement and possibly leaving single mothers less time to verbally interact with each child.

Q: How are multiple-hypothesis testing concerns addressed? A: The pre-analysis plan pre-specified child survival and child cognitive development as primary outcomes. The authors apply the Romano-Wolf step-down procedure for multiple hypothesis testing adjustment. After adjustment, the p-value for survived-to-one for children of female recipients rises from 0.028 to 0.119; the cognitive development effects at age five and seven remain significant.

Q: How does the study address potential sample selection bias in the child outcomes sample? A: The authors use entropy balancing (Hainmueller, 2012) to reweight observations so that baseline (2008) characteristics are balanced between treatment and control within the subsample of recipients who had children. Results are qualitatively unchanged for both female and male recipients. The authors also note that children of female recipients are younger on average (4.71 months, p=0.067), which is why the study collects data at fixed age windows (14-22 months, 2.5 years, 3.5 years, 5 years, 7 years) rather than in a single cross-sectional wave.

Q: What is the cost-effectiveness and cost-benefit result for secondary school scholarships? A: Social costs are estimated at $585 per recipient for a mixed-gender program and $505 for a female-only program (combining school fees, materials, and foregone wages). The cost per under-three death averted is $23,582 for mixed-gender and $15,184 for female-only — placing the female-only program within the range of the 10th-percentile most cost-effective WHO-recommended child health interventions. The IRR is 27%–76% for a female-only means-tested scholarship program and 20%–51% for a mixed-gender program. These are likely conservative, as they exclude welfare gains from avoiding unwanted pregnancies, greater female agency, and recipient health benefits.

Q: What is the scope of the experiment and to what population do findings generalize? A: The study estimates ITT effects for students in rural Ghana who qualified for SHS on exam performance but faced binding financial constraints in 2008 — a population that is academically prepared but economically disadvantaged. Results do not directly apply to students who would not have qualified, to contexts without binding financial barriers, or to settings where secondary school quality or the marriage market differs substantially. The study also cannot yet observe complete fertility, since scholarship-lottery participants were only 31 years old on average at last follow-up.

LENA (Language Environment Analysis): A day-long recording device worn by a child that uses speech recognition software to generate count-based metrics — adult word count, adult-child conversational turns, and child vocalizations per minute — providing an objective measure of the child’s auditory environment and caregiver engagement quality without reliance on self-report.

IRT Score (Item Response Theory Score): A latent-trait measure of child cognitive ability estimated from a one-parameter logistic model applied to binary correct/incorrect responses across cognitive game questions, assigned a difficulty level to each question and a latent ability to each child, then standardized. Used as the primary cognitive development outcome across age windows.

Incarceration Effect: The hypothesis that education delays fertility mechanically only while students are in school (analogous to incarceration preventing activity), with no persistent effect once they exit. The authors rule this out by showing that the fertility gap between female treatment and control groups persists well after the majority of scholarship recipients have graduated.

Quality-Quantity Trade-off (Becker 1991): The economic framework predicting that more educated parents, facing higher opportunity costs of children and lower costs of investing in child quality, will have fewer but better-invested-in children. The authors find delayed and reduced fertility but do not find that recipients have fewer children to care for in the cognitive assessment sample, suggesting the child quality gains operate primarily through parenting practices rather than resource concentration.

Intent-to-Treat (ITT) Effect: The treatment effect estimated by comparing all lottery winners to all losers regardless of whether winners actually enrolled, which captures the effect of the scholarship offer (including compliance costs). The cost-benefit analysis uses ITT estimates, so the cost of subsidizing inframarginal students who would have attended anyway is incorporated.

Entropy Balancing: A reweighting procedure (Hainmueller, 2012) that assigns weights to observations in the control group so that the weighted distribution of baseline covariates matches that of the treatment group, used to assess whether imbalances in the subsample of participants who had children drive the results. The authors apply this as a robustness check for both mortality and cognitive development outcomes.

Unwanted Pregnancy: A pregnancy reported by the respondent as unplanned at the time of conception, which the authors use to distinguish fertility reduction from a change in desired fertility versus a reduction in unintended out-of-wedlock pregnancies. The scholarship’s early fertility impact is almost entirely a reduction in unwanted pregnancies (7 percentage point decline, 17% reduction).

Labor Market Shocks and Monetary Policy

Mon, 01 Jan 0001 00:00:00 +0000

Overview

Research question. The paper asks two related questions: (1) How much, and through which channels, do employer-to-employer (EE) worker transitions affect macroeconomic outcomes — particularly inflation? (2) What is the optimal monetary policy within a class of Taylor rules when EE flows are taken explicitly into account?

Motivation. Standard monetary policy frameworks condition on the unemployment rate as the primary labor market slack measure and underemphasize the “quality” dimension of employment. The paper documents a striking empirical pattern: the 2016–2019 recovery and the 2021–2022 recovery from COVID-19 featured nearly identical declines in the unemployment rate, yet exhibited dramatically different EE rate dynamics and inflation outcomes. During 2016–2019, the EE rate remained flat despite a roughly 25 percent decline in the unemployment rate from trend. During 2021–2022, the EE rate rose by around 8 percent above trend over a comparable unemployment decline. Correspondingly, unit labor cost (ULC) growth reached approximately 6 percent during the COVID-19 recovery when unemployment fell below 4 percent, compared with only about 2 percent ULC growth in the 2016–2019 period at similar unemployment levels.

Methodology. The authors develop a Heterogeneous Agent New Keynesian (HANK) model with a frictional labor market featuring on-the-job search (OJS). Workers are heterogeneous in wealth (mutual fund shares), human capital, match-specific productivity, and endogenous piece-rate wages. Human capital stochastically appreciates when employed and depreciates when unemployed, capturing scarring effects and job-stayer wage growth. Wage determination follows a Bertrand competition protocol based on flow output: workers switch to higher-productivity matches and extract the full surplus from the new firm, while outside offers from lower-productivity firms can still trigger rebargaining with the incumbent firm and raise the piece rate without a job switch. Three vertically integrated sectors — labor services, intermediate goods, and final goods — are linked so that the real price of labor services pl is the real marginal cost for intermediate firms and the sole driver of inflation in the New Keynesian Phillips curve (absent aggregate productivity shocks). The economy is subject to AR(1) shocks to the discount rate β (demand), aggregate labor productivity z (supply), and OJS efficiency ν (the relative search efficiency of employed workers). The model is solved using the Sequence-Space Jacobian (SSJ) method, extended to handle discretized worker distributions as direct inputs to equilibrium conditions.

The model is calibrated to U.S. pre-Great Recession data (2004–2006), targeting the fraction of hand-to-mouth individuals (16 percent of SIPP sample), unemployment rate (5.1 percent), EU separation rate (3.8 percent quarterly), EE rate (2 percent quarterly from LEHD), earnings drop upon job loss (35 percent), wage growth of job switchers (9 percent), and the labor share (0.67). Shock processes are estimated by minimizing deviations from empirical correlations and standard deviations of output, unemployment, EE rate, and inflation over 1995:Q3–2008:Q4.

Main findings — positive analysis. Shocks to OJS efficiency account for 43.1 percent of fluctuations in inflation in the variance decomposition, and 78.7 percent of fluctuations in the EE rate. The mechanism: a higher OJS efficiency lowers the expected match value EJ for labor services firms through three channels — (i) a compositional shift toward employed job seekers who extract the entire match surplus, (ii) shorter expected match duration as workers face higher poaching probabilities, and (iii) more frequent wage rebargaining where outside offers bid up wages without accompanying productivity gains. To maintain the free-entry condition, the real price of labor services pl must rise, increasing the real marginal cost and inflation. This direct labor market effect explains 139 percent of the total increase in pl; general equilibrium effects through reduced tightness θ — which raises expected match values by making vacancies easier to fill and workers less likely to be poached — offset −42 percent; the remainder (3 percent) comes from real rate changes driven by the monetary policy reaction.

In two historical simulations, muted OJS efficiency during 2016–2019 generated approximately 0.23 percentage points lower annualized inflation at the peak relative to a counterfactual economy with the same unemployment path but an endogenously rising EE rate. Conversely, elevated OJS efficiency during 2021–2022 generated approximately 0.56 percentage points higher annualized inflation compared to the flat-EE-rate counterfactual. The paper notes that strong worker mobility accounts for roughly 10 percent of the approximately 6 percentage point total rise in annual inflation during the COVID-19 recovery episode.

An important cross-model comparison shows that the Representative Agent New Keynesian (RANK) version of the model overestimates the decline in demand, output, and labor market tightness upon a positive OJS shock, and underestimates the rise in real rate, marginal cost, and inflation. Household heterogeneity is therefore quantitatively important: hand-to-mouth households’ demand responds directly to labor income increases from job switches, mitigating the demand decline and amplifying inflation.

Main findings — normative analysis. The optimal monetary policy within an augmented Taylor rule — adding an EE gap term ΦEE(EEt − EE*) alongside the standard inflation and unemployment gap terms — prescribes Φ*_u = −3.18 and Φ*_EE = 2.22 (with Φπ fixed at 1.5). This yields a 78.7 percent reduction in the central bank loss relative to the baseline Taylor rule. A policy that ignores EE dynamics and optimizes only the unemployment gap coefficient (finding Φu = −2.71, ΦEE = 0) produces a 12 percent larger central bank loss than the full optimal policy. In terms of welfare, the optimal policy delivers 0.16 percent additional lifetime consumption equivalent in the aggregate. Workers at the bottom of the match quality distribution gain the most (0.24 percent), as do the unemployed (0.20 percent), while those at the top of the wealth distribution gain the least due to larger share price fluctuations under the more aggressive policy.

Scope conditions. Results are derived conditional on a dual-mandate central bank objective (variance of inflation and output gaps), within a class of Taylor-type rules (not fully optimal Ramsey policy), under first-order approximation around a non-stochastic steady state. The historical simulations abstract from supply shocks active in the normative exercises and assume the economy starts from steady state in 2016.

In depth

Q1. What is the OJS efficiency shock, and how does it differ from a standard demand or supply shock?

An OJS efficiency shock is modeled as a time-varying shift in νt, the relative job search efficiency of employed workers compared with unemployed workers. Unlike demand shocks (discount rate β innovations) and productivity shocks (aggregate z innovations), which move inflation and unemployment in opposite directions under standard New Keynesian logic (divine coincidence), OJS efficiency shocks move inflation and unemployment in the same direction: a positive OJS shock raises inflation while also raising unemployment (because the higher real rate induced by the central bank’s reaction reduces demand and employment). This makes OJS shocks behave like cost-push shocks and introduces a genuine policy trade-off for a dual-mandate central bank.

Q2. What are the three mechanisms through which higher OJS efficiency raises the real price of labor services, and what is the quantitative contribution of each?

The decomposition (Figure 8) shows that the direct effect of ν on EJ — encompassing the composition channel (more employed job seekers who extract the full surplus), the match-duration channel (shorter expected match lives), and the wage rebargaining channel (outside offers raise wages without productivity gains) — explains 139 percent of the total increase in pl. The general equilibrium reduction in labor market tightness θ, which raises EJ and partially offsets the cost increase, explains −42 percent in total: −18 percent through increased supply of labor services L (productivity-enhancing job switches improve the match distribution) and −24 percent through reduced output Y (lower aggregate demand). Real rate effects account for the remaining 3 percent net (8 percent from the inflation channel and −5 percent from the unemployment channel). Labor market effects in total therefore explain 97 percent of the marginal cost increase.

Q3. Does the positive relationship between EE rates and inflation require wage increases upon job switches?

No. The paper demonstrates (Section 2.4.2, Figure 3) that even when the piece rate for workers hired from unemployment is set to α = 0.95 (so that outside offers have negligible wage effects), a positive OJS efficiency shock still generates a decline in output and a rise in inflation in both the RANK and TANK models. Quantitatively, the inflation response is similar across the baseline and near-zero composition-channel specifications, confirming that the shorter expected match duration is the primary driver of the increase in the real price of labor services. The match duration channel operates independently of wage increases: firms anticipate shorter matches and require a higher flow price to break even on vacancy costs.

Q4. How does household heterogeneity change the quantitative effects of OJS shocks relative to the RANK benchmark?

Under a constant real rate, in the RANK model a higher OJS efficiency increases the real price of labor services and inflation but has no effect on aggregate demand or output (because higher labor income for the PIH household is exactly offset by lower firm profits). In the TANK model, hand-to-mouth households consume their entire labor income, so the rise in labor income from job switches directly boosts their demand, raising output and tightness and further amplifying inflation. Under an endogenous real rate, the RANK model overestimates the decline in demand and output, and underestimates the rise in real rate and inflation, compared with the TANK model. The TANK model requires a substantially larger equilibrium real rate increase to contain inflation because HtM households’ demand is less elastic to the real rate than PIH households'.

The six AR(1) parameters governing β, z, and ν (three persistence parameters ρj and three standard deviations σj) are estimated by minimizing the sum of squared deviations between model-generated and empirical moments: the autocorrelation of output; correlations of the unemployment rate, EE rate, and inflation with output; and standard deviations of output, unemployment rate, EE rate, and inflation. Data cover 1995:Q3–2008:Q4. Estimated values are ρβ = 0.909, ρz = 0.332, ρν = 0.936 and σβ = 0.001, σz = 0.002, σν = 0.003. The variance decomposition (Table 4) assigns 43.1 percent of inflation variance to OJS efficiency shocks ν, 52.0 percent to demand shocks β, and 4.9 percent to productivity shocks z.

Q6. How is the “missing inflation” during 2016–2019 quantified, and what is the counterfactual?

The exercise simulates two economies both replicating the same unemployment path — a 15 percent decline in unemployment relative to its 5.2 percent steady state, spread linearly over 16 quarters, followed by mean reversion. The first economy uses only positive demand shocks, which generate an endogenously rising EE rate consistent with the historical unemployment-EE correlation. The second economy additionally introduces negative OJS efficiency shocks to keep the EE rate unchanged, as observed in the data during 2016–2019. Annualized inflation in the second economy is 0.23 percentage points lower at the peak (16 quarters after the shock), implying that had the EE rate risen normally, inflation would have been around 2 percent in 2019 rather than the observed 1.8 percent.

Q7. How is the inflationary role of elevated EE transitions during 2021–2022 quantified?

Using the same unemployment path as the 2016–2019 exercise, the COVID-19 recovery economy combines positive demand shocks with positive OJS efficiency shocks to replicate the observed 0.16 percentage point (8 percent above trend) increase in the EE rate. Comparing this economy to the flat-EE-rate economy from the prior exercise, the elevated EE rate generates 0.56 percentage points higher annualized inflation. Because annual inflation rose approximately 6 percentage points in the data during this episode, the model attributes roughly 10 percent of the total inflation increase to strong worker mobility.

Q8. What are the optimal Taylor rule coefficients when EE dynamics are included, and what is the welfare cost of ignoring them?

The optimal policy over the augmented Taylor rule it = i* + Φπ(πt − π*) + Φu(ut − u*) + ΦEE(EEt − EE*), with Φπ fixed at 1.5 and a dual-mandate loss function W = var(πt − π*) + 0.25·var(Yt − Y*), prescribes Φ*_u = −3.18 and Φ*_EE = 2.22. This reduces the central bank loss by 78.7 percent relative to the baseline rule (Φu = −0.25, ΦEE = 0). If the EE gap term is excluded and only the unemployment gap coefficient is re-optimized (finding Φu = −2.71), the central bank loss is 12 percent higher than under the full optimal policy.

Q9. How does the optimal policy affect macroeconomic volatility, and who gains most from it?

Table 5 shows that the optimal policy substantially reduces volatility of inflation (standard deviation falls from 0.0013 to 0.0011), output (0.0059 to 0.0020), consumption (0.0059 to 0.0020), unemployment (0.0047 to 0.0013), labor market tightness (0.0600 to 0.0175), and the real marginal cost pl (0.0203 to 0.0081), at the cost of higher real rate volatility (0.0019 to 0.0033) and share price volatility (0.1975 to 0.3051). In terms of welfare (Table 6), the unemployed gain 0.20 percent in lifetime consumption equivalents (versus 0.15 percent for the employed), workers at the bottom quintile of match quality gain 0.24 percent (versus 0.16 percent at the top), and wealth-poor individuals in the bottom share quintile gain 0.23 percent (versus 0.11 percent at the top, whose gains are eroded by larger share price fluctuations).

Q10. How does the model extend the SSJ computational method, and why is this extension necessary?

The standard SSJ method of Auclert, Bardoczy, Rognlie, and Straub (2021) handles settings where only scalar aggregates enter equilibrium conditions in sequence space. In this model, the discretized distributions of employed workers µE(h, x) and unemployed workers µU(h) at the job search stage enter directly into the expected match value EJ (because human capital and current match productivity determine output and wage levels upon new contacts), and the distribution λE(h, x, α) at the production stage enters into labor services firm profits ΓS. The authors treat worker distributions as histograms and compute Jacobians for each mass point, combining the SSJ method with Reiter (2009)-style projection. This substantially increases computation time but remains feasible, extending the SSJ method to multi-stage models with search frictions where endogenous distributions are state variables.

Q11. What are the three sources of wage growth in the HANK model, and what is their relevance for inflation dynamics?

First, human capital h stochastically appreciates during employment (at rate πE = 0.018 per quarter, calibrated to annual job-stayer wage growth of approximately 2 percent), raising wages through a higher piece-rate base. Second, job switches to higher-productivity matches yield wage increases as the worker extracts the full surplus from the new firm (the new piece rate equals x/x’, the ratio of old to new match productivity). Third, outside offers with productivity x’ satisfying αx < x’ < x — not good enough to trigger a switch but better than the current bargaining threat — cause the incumbent firm to raise the piece rate to x’/x via rebargaining, increasing wages without a job change. The second and third channels are the ones directly affected by OJS efficiency shocks and are inflationary: they raise labor costs beyond productivity gains.

Q12. Why do OJS shocks have a shorter match duration channel even without wage increases?

When OJS efficiency ν rises, each employed worker faces a higher probability νtf(θt) of contacting another firm each period. Even if wages do not change upon contact (as in the α = 0.95 robustness exercise), a labor services firm posting a vacancy expects that any match it forms will be shorter-lived: the worker is more likely to be poached in the future. This shortens the expected present discounted value of the match for the firm, reducing EJ. To satisfy the free-entry condition (expected profit = vacancy cost κ), the price of labor services pl must rise, increasing the real marginal cost and inflation. Figure 3 confirms a nearly identical inflationary response under α = 0.95 as under the baseline, isolating this match-duration mechanism.

Key Concepts

OJS efficiency shock (νt shock). A time-varying shift in the relative job search efficiency of employed workers compared with unemployed workers. Modeled as an AR(1) process for νt (estimated persistence ρν = 0.936). An increase in νt raises the probability that employed workers contact outside firms each period, boosting the EE rate. In the model, this acts as a cost-push shock: it raises inflation and unemployment simultaneously, breaking divine coincidence and creating a policy trade-off for a dual-mandate central bank.

Expected match value (EJt). The ex-ante expected value to a labor services firm of a filled vacancy, conditional on contacting a worker, defined as a weighted average of match values J across the pool of job seekers (unemployed and employed). The free-entry condition Vt = κ/q(θt) = EJt pins down the real price of labor services pl: when EJt declines (due to shorter match durations or compositional shifts toward high-surplus-extracting workers), pl must rise to maintain zero expected profit for vacancy posters.

Composition channel. The mechanism by which a rise in OJS efficiency shifts the composition of the job-seeker pool toward employed workers, who (under Bertrand competition) extract the entire flow surplus of a new match and receive wage equal to plF(h,x). Since firms receive zero rent from poached workers, an increase in the fraction of employed in the applicant pool lowers EJt and requires a compensatory increase in pl.

Match duration channel. When OJS efficiency ν rises, each existing match faces a higher probability of dissolution because the worker is more likely to be poached. The reduced expected match duration lowers the present discounted value of a match for the firm (even holding wages fixed), reducing EJt and raising pl. Demonstrated as the primary driver of inflation in the α = 0.95 robustness exercise where wage increases upon job switches are near zero.

Piece-rate α (endogenous). The share of match output F(h,x) that the worker receives as wage, determined through Bertrand competition on flow output following Postel-Vinay and Robin (2002). A worker hired from unemployment starts at α = x̄/x’ (where x̄ is the lowest match productivity). Job switches to higher-x’ firms reset α = x/x’. Rebargaining upon a credible outside offer from a firm with αx < x̃ < x raises α to x̃/x. The piece rate endogenizes wage dynamics for switchers, stayers, and job losers, allowing the model to discipline these moments in the data.

Divine coincidence (and its breakdown under OJS shocks). In standard New Keynesian models, demand and productivity shocks move inflation and unemployment gaps in opposite directions, so stabilizing inflation also stabilizes the output gap. OJS efficiency shocks break this property: they generate simultaneous increases in inflation and unemployment, introducing a genuine trade-off between the two mandates and making EE-augmented Taylor rules welfare-improving relative to rules that respond only to unemployment.

Sequence-Space Jacobian (SSJ) method with distributed worker states. An extension of the Auclert, Bardoczy, Rognlie, and Straub (2021) computational method to settings where discretized distributions of workers (µE(h,x) and µU(h)) enter directly into equilibrium conditions — specifically into the free-entry condition via EJt and into firm profits. The authors treat distributions as histograms and compute Jacobians for each mass point, combining SSJ with Reiter (2009)-style projection to efficiently solve for transitional dynamics under aggregate uncertainty.

Life-Cycle Wages and Human Capital Investments: Selection and Missing Data

Mon, 01 Jan 0001 00:00:00 +0000

Layer 1 – Overview

Research Question

This paper asks how wage inequalities build up over the life cycle when individual wage trajectories are plagued by interruptions in private-sector participation, and when the standard Missing At Random (MAR) assumption used to handle those gaps may be violated. Specifically, it asks: what is the causal effect of career interruptions on both the level and the dispersion of wages after twenty years of potential experience, and does endogeneity of those interruptions matter for the dispersion result?

Data and Sample

The empirical analysis uses the 2011 DADS Grand Format-EDP panel, a French administrative dataset merging social security records (DADS) and census extracts (EDP). The working sample covers males who entered the private sector between 1985 and 1992, aged 16-30 at entry, and observed through 2011. The authors require at least 15 years of observed private-sector wages, yielding a working sample of 7,004 males and 137,315 person-year observations. Education is grouped into four levels (high-school dropouts, high-school graduates, some college, college graduates). Participation outside the private sector – including public-sector employment, self-employment, unemployment, and non-employment – constitutes the “alternative sector” and generates missing wage observations. On average, cumulative duration outside the private sector is 3.7 years, and the average number of interruptions is 1.44.

Model and Methodology

The paper builds on a structural Ben Porath (1967) human capital model extended to two sectors (private sector and an alternative sector), yielding a reduced-form log-wage equation with five individual-specific coefficients: an intercept (initial human capital), a linear trend in potential experience (growth rate), a curvature term in potential experience (Mincer concavity), the cumulative years of interruptions, and a curvature term in interruptions. Because parameters are individual-specific, the wage equation is a random-coefficient model estimated with a fixed-effects approach.

Selection into the private sector is addressed not by a standard MAR assumption but by a weaker “Missing At Random Conditionally On Factors” (MARCOF) assumption. Sector-preference shocks, human capital prices, and depreciation rates are each decomposed into a common factor (time-varying) and an individual factor loading, plus a residual that is mean-independent of factors and loadings. Conditional on factors and factor loadings, wage residuals and sector choices are independent, making covariates – including the interruption variables – exogenous. The preferred specification includes two unobserved factors, selected by four of six Bai-Ng (2002) information criteria.

Estimation proceeds via an Expectation-Maximization (EM) algorithm adapted from Bai (2009) and Song (2013), with initial values from Moon and Weidner (2018)’s nuclear-norm convex estimator. Because individual parameters converge at rate sqrt(T) and summary statistics of their distributions suffer from incidental-parameter bias, the authors use bias-correction methods from Jochmans and Weidner (2019) for quantiles and inter-decile ranges, and from Arellano and Bonhomme (2012) for variances. Monte Carlo experiments confirm that variances remain poorly corrected even when T > 20, so the paper focuses on inter-decile ranges as the dispersion measure.

Counterfactual “average structural functions” (Blundell and Powell, 2003) are constructed by holding individual parameters fixed and manipulating the history of interruptions. These compare four scenarios: the observed benchmark, the counterfactual with no interruptions (potential wage), the counterfactual with no current-period selection, and both combined.

Main Findings

Downward bias from omitting interruptions and factors. Omitting interruption variables and unobserved factors strongly downward biases estimated returns to experience after 20 years. Most of this bias is attributable to interruptions rather than to the interactive factor effects: selectivity is mainly captured through the interruption channel, not through residual factor structure.
Effect on mean wages. Potential experience increases log wages by approximately 65% over 20 years, consistent with cross-country evidence from homogeneous Mincer equations. The average cost of interruptions after 20 years is approximately 10% of log wages. Reassigning interruptions to the beginning of the working life has a persistent negative effect on mean log wages that never fully recovers over 20 years, while reassigning them to the end increases mean wages above the no-interruption benchmark at every experience level.
Effect on wage dispersion – a new stylized fact. Interruptions decrease, not increase, the inter-decile range of log wages after 20 years. After 20 years, with an average interruption duration of 2.47 years, interruptions decrease the inter-decile range by 0.52 log points (approximately 38%). This compression operates differentially: the 90th percentile falls by 0.34 and the 10th percentile rises by 0.18.
Endogeneity explains the dispersion compression. When years of interruption are randomly reassigned across time (holding total interruption years fixed), the inter-decile range diverges upward from the observed benchmark after about 5 years. This shows that the dispersion-reducing effect of actual interruptions is due to the endogenous timing of those interruptions – specifically to the negative correlation between the timing of interruptions and potential log wages – rather than to the correlation between the structural coefficients on interruptions and potential wages (which is also negative, with a Spearman rank correlation of -0.32 between eta_i1 and eta_i3). Endogenously chosen interruptions smooth inequality over time.
Current-period selection is negligible. Current-period selection into private-sector employment has no statistically significant effect on median, mean, variance, or inter-decile range of wages at any experience level, as confirmed by the small inter-decile range of the interactive factor component.

Scope Conditions

Results pertain to cohorts of French males entering the private sector between 1985 and 1992, restricted to those with at least 15 observed private-sector years. The French context is distinctive: wage inequality in the working population was stable over 1985-2011, driven in part by minimum wage policy and payroll tax exemptions for lower-skilled workers, in contrast to rising inequality in the United States and Germany. Results on timing of interruptions (eta_i3 and eta_i4) are identified only for individuals with at least two interruptions followed by re-entry (roughly those with K_T >= 2). The paper does not analyze female wages.

In depth

Q1. What is the structural model and how does it generate a reduced-form wage equation?

The model is a Ben Porath (1967) two-sector human capital model in which individuals divide time between investing in human capital and earning wages in either the private sector (e) or an alternative sector (n). Human capital accumulation in each sector has a sector-specific return rate (rho^s) and depreciation (lambda^s_t). Period utility is log income minus a quadratic investment cost, plus a sector preference shock. Solving the dynamic program backwards (because of log-linearity) yields closed-form optimal investments that are linear in the individual-specific terminal value of human capital (kappa). The resulting log-wage equation (Proposition 5) is a function of five terms: an intercept (eta_i0), a linear trend in potential experience t (eta_i1), a geometric curvature term beta^{-t} (eta_i2), cumulative years of interruptions x^(3)_it (eta_i3), and a curvature in interruptions x^(4)_it (eta_i4), all with individual-specific coefficients. This provides a tractable random-coefficient structure.

Q2. What is the MARCOF assumption and why is it weaker than MAR?

MARCOF – Missing At Random Conditionally On Factors – posits that sector-preference shocks, human capital prices, and depreciation rates each follow factor structures: a common time-varying factor (phi_t) multiplied by an individual loading (theta_i) plus an i.i.d. residual. The residuals are assumed mean-independent of factors and loadings, and independent over time. Under standard MAR, missingness is assumed independent of outcomes conditional on observables alone. Under MARCOF, residuals in the wage equation and the sector choice equation are independent conditional on (unobserved) factors and factor loadings. This is weaker than MAR because it allows the unobservable determinants of wages and participation to share common factors, accommodating the high persistence observed in human capital stocks (20-year lag correlation of 0.28, far above the geometric decay benchmark of 0.024).

Q3. How are the individual-specific parameters identified?

Under exogenous selection (or, under MARCOF, conditional on factors), identification of eta_i0, eta_i1, and eta_i2 requires variation in potential experience within the individual’s time series. Identification of eta_i3 and eta_i4 separately requires individuals to experience at least two spells out of the private sector each followed by re-entry (at least four transitions, so K_T >= 2). An individual with only one interruption spell generates proportional variation in x^(3) and x^(4), so only a linear combination of eta_i3 and eta_i4 is identified. The “flat spot” approach – using the observed fact that individuals aged 50-55 have stopped investing in human capital – separately identifies time, cohort, and age effects and provides the restriction that factors are orthogonal to the level, trend, and curvature in potential experience.

Q4. What do the distributions of estimated individual-specific coefficients look like?

Focusing on the main (two-factor) specification with bias correction: the median of the growth parameter eta_i1 is positive (consistent with rising wages with experience) and the median of the curvature parameter eta_i2 is negative (consistent with concavity). However, heterogeneity is substantial: the 90th percentile of eta_i1 is 6.2 times the median, and the first quartile of eta_i1 is negative (implying declining potential wages for a non-negligible share). For the interruption coefficients eta_i3 (year of interruptions) and eta_i4 (curvature), bias-corrected medians are close to zero in the sub-sample with >=2 interruptions, but dispersion is large and symmetric around zero. Bias correction reduces the 90th percentile of eta_i1 by approximately 20% and reduces the absolute 10th percentile of eta_i3 by approximately 27%.

Q5. How important are interruptions relative to potential experience and factors in explaining wage variation?

A wage decomposition using inter-decile ranges (preferred over variance due to bias) shows that the potential experience component is the largest contributor to wage dispersion, followed by the interruption component (described as “sizable”), while factors play a minor role. Crucially, the potential experience and interruption components are highly negatively rank-correlated: the Spearman rank correlation between the growth coefficient eta_i1 and the interruption coefficient eta_i3 is -0.32. This negative correlation is central to understanding why interruptions compress dispersion rather than expanding it.

Q6. What is the finding on the effect of interruptions on mean wages, and what does the timing experiment show?

After 20 years, the average cost of interruptions (relative to a counterfactual of no interruptions) is approximately 10% of log wages. The timing of interruptions matters: reassigning interruptions to the beginning of the working life causes a persistent loss in mean log wages that does not fully recover over the 20-year horizon, while reassigning them to the end raises mean log wages above the no-interruption level at every experience level. For median wages, the early-interruption loss is eventually recovered (median log wages do catch up), but the mean does not catch up. These asymmetries are consistent with early interruptions having a larger negative effect on human capital accumulation due to the geometric structure of investment returns.

Q7. What is the key finding on wage dispersion and what explains it?

Interruptions compress the inter-decile range of log wages by 0.52 log points (approximately 38%) after 20 years, with average interruption duration of 2.47 years. This compression is asymmetric: the 90th percentile of wages falls by 0.34 and the 10th percentile rises by 0.18. The dispersion-reducing effect is established by comparing the benchmark (observed interruptions) to the counterfactual of no interruptions. When interruptions are instead randomly reassigned across time (holding total interruption duration fixed), the inter-decile range diverges upward from the benchmark starting around 5 years of experience. This demonstrates that the compression is due to the endogenous timing of interruptions – individuals who have high potential wages tend to time their interruptions in ways that reduce the measured spread of actual wages – rather than to the negative structural coefficient (eta_i3 < 0 for high-wage workers on average).

Q8. How does the paper handle the incidental parameter problem for distributional statistics?

Because individual parameters are estimated at rate sqrt(T) and the panel is unbalanced (some individuals observed for as few as 15 years while the model has up to 7 individual parameters), standard distributional statistics like the variance suffer from substantial incidental parameter bias. Monte Carlo experiments show that bias-corrected variance estimates remain strongly biased even at T > 20. Inter-decile ranges are better behaved and the Jochmans and Weidner (2019) bias-correction procedure reduces their bias satisfactorily. This is why the paper reports inter-decile ranges as its primary dispersion measure rather than variances. The bias in corrected inter-decile ranges is at most approximately 10% of the uncorrected estimate.

Q9. What does the paper show about the MAR assumption in the context of this data?

The results directly challenge the MAR assumption that is standard in the life-cycle earnings literature. Under MAR, interruptions would be treated as random conditional on observables, and their endogeneity would be ignored. The paper shows that treating interruptions as endogenous (through the MARCOF + structural model approach) substantially changes estimated returns to experience (there is a strong downward bias when interruptions and factors are omitted) and reverses the sign of the effect of interruptions on dispersion (under exogenous interruptions, randomly reassigned, dispersion would be higher than observed; the actual compression is an artifact of endogenous timing). The conclusion is that MAR assumptions produce systematically misleading pictures of life-cycle wage inequality dynamics.

Q10. What are the robustness and external validity considerations?

The working sample excludes individuals observed fewer than 15 years. A robustness exercise compares the subsample observed 10-14 years to a censored version of the 20+ subsample with matched marginal distributions of observation counts. Median profiles for the uncensored and censored 20+ samples are similar, and inter-decile ranges are slightly more dispersed in the censored sample only for potential experience greater than 7. However, the 10-14 year sample shows substantially different patterns – larger median gaps between benchmark and no-interruption cases, and a larger inter-decile range – consistent with lower private-sector returns to human capital for that group. The authors conclude that selection into the 15+ working sample matters, and results are explicitly restricted to that working sample. The French context (stable aggregate wage inequality, minimum wage policy) limits direct comparability to countries with rising inequality.

Key Concepts

MARCOF (Missing At Random Conditionally On Factors): The paper’s central identifying assumption, weaker than standard MAR. It posits that sector-preference shocks, human capital prices, and depreciation rates follow factor structures (common time-varying factor x individual loading + i.i.d. residual), and that residuals are mean-independent of factors, loadings, and their own histories. Conditional on factors and loadings, wage residuals and sector-choice residuals are independent, making selection exogenous.

Interactive effects / factor structure for selection: An approach in which unobserved confounders are modeled as a bilinear product of time-varying common factors (phi_t) and individual factor loadings (theta_i). This allows flexible correlation between wage processes and participation choices without requiring exclusion restrictions or instrumental variables. The paper’s preferred specification uses two unobserved factors identified by Bai-Ng information criteria.

Average structural functions: Objects defined by Blundell and Powell (2003) that integrate counterfactual outcomes (wages evaluated at a manipulated interruption history) over the distribution of individual-specific parameters. They allow estimation of the causal impact of a change in interruption timing or presence while holding individual structural parameters fixed, under identification conditions analogous to those of Chernozhukov et al. (2013).

Individual-specific coefficients (random coefficients): The five parameters (eta_i0, eta_i1, eta_i2, eta_i3, eta_i4) governing each individual’s wage equation, with structural interpretations: initial log human capital, return to potential experience, curvature (Mincer concavity), effect of cumulative interruption years, and curvature in interruptions. Their individual-specificity is the source of the incidental parameter problem for distributional statistics.

Flat spot approach: An identification device (from Heckman, Lochner, and Taber, 1998; Bowlus and Robinson, 2012) that uses median wages of workers aged 50-55 – who are assumed to have stopped investing in human capital – as consistent estimates of human capital prices by education group and year. This separates the volume of human capital from its price, and provides the restriction identifying the level, trend, and curvature factors from the time-varying unobserved factors phi_t.

Interruption variables x^(3) and x^(4): Reduced-form variables derived from the structural model summarizing the history of private-sector participation gaps. x^(3)_it is the cumulative number of periods spent in the alternative sector prior to date t; x^(4)_it is a geometric-weighted version of those interruptions that reflects the timing (early vs. late) through the discount factor beta. They enter the wage equation with individual-specific coefficients that are identified only for workers with at least two complete interruption spells.

Mincer dip: A U-shaped profile in wage variance (or inter-decile range) over potential experience, predicted by the Ben Porath model because high-return workers invest more at the start of their careers (reducing current wages), causing their wage profile to cross below then above low-return workers. Estimated in this paper at approximately 5 years of potential experience under the main specification.

Incidental parameter bias in distributional statistics: The bias that arises when estimating moments or quantiles of the distribution of individual-specific parameters that converge at rate sqrt(T) rather than sqrt(N). The paper shows through Monte Carlo experiments that variance estimates remain substantially biased even after Arellano-Bonhomme (2012) correction when T >= 20, while inter-decile ranges corrected by Jochmans-Weidner (2019) are more reliable.

Life-cycle worker flows and cross-country differences in aggregate employment

Mon, 01 Jan 0001 00:00:00 +0000

Layer 1 — Overview

Research question. The paper asks: what are the sources of cross-country differences in aggregate employment across European economies, and which types of worker flows — between employment (E), unemployment (U), and nonparticipation (N) — drive those differences? The authors pay particular attention to heterogeneity by gender and age, motivated by the observation that cross-country employment dispersion is concentrated among women, youth, and older workers, and that a large portion of the dispersion is traceable to differences in labor force participation rather than unemployment rates alone.

Data. The empirical analysis draws on microdata from the EU Statistics on Income and Living Conditions (EU-SILC), an annual survey covering 32 European countries for 2004–2019. Germany is covered using the German Socio-Economic Panel (GSOEP, 2003–2018) because GSOEP longitudinal coverage begins earlier. The combined sample contains 7,064,306 individual-year observations for 2,221,672 individuals. Labor force status is recorded monthly via a retrospective calendar; transition probabilities are estimated at the quarterly frequency after correcting for measurement error (a “de-NUN-ification” procedure following Elsby et al. [2015]) and time-aggregation bias (Shimer [2012]).

Methodology — empirical. Six quarterly transition probabilities among E, U, and N are estimated by gender and single year of age (16–65). The life-cycle profile of each probability is extracted nonparametrically by regressing age-time cells on age and time dummies, removing business-cycle variation. To decompose cross-country employment differences into contributions of the six transition rates while handling the path-dependence of the decomposition (6! = 720 possible orderings), the authors apply the Shapley-Owen decomposition, which assigns to each transition rate its average marginal contribution across all orderings. An initial first-pass decomposition allocates the aggregate employment gap between any two countries into three parts: demographics, initial conditions (distribution across E, U, N at age 16), and transition probabilities. Transition probabilities account for 93–105% of the cross-country variance in aggregate employment, while demographics and initial conditions together explain less than 10%.

Methodology — structural model. The authors build a life-cycle Diamond-Mortensen-Pissarides (DMP) model with three labor market states, calibrated separately by gender and country for France, Germany, Italy, Spain, and the U.K. — the five largest economies in the sample. A key feature is that all primitives (technology, search and matching) are age-independent; life-cycle variation in worker flows arises endogenously from the finite retirement horizon and from two search margins: (i) an intensive margin — variable search intensity s in [0,1] chosen optimally each period — and (ii) an extensive margin — the endogenous labor force participation decision modeled as a discrete choice with i.i.d. extreme-value utility shocks. The model also incorporates permanent match quality (an experience good revealed stochastically with probability alpha per period following Jovanovic [1979]), transitory match-quality shocks (persistent AR(1) process), exogenous job-destruction shocks (per-period probability delta), a two-tier UI system, a two-tier EPL system capturing temporary vs. permanent contracts, and proportional value-added and social-security taxes.

Main empirical findings.

For male workers, employment-to-unemployment (EU) transitions account for approximately half of the cross-country variance in aggregate male employment across all 32 countries, rising to about three-quarters when looking at the five largest economies, and exceeding 85% for prime-age males (ages 25–54). Transitions in the reverse direction (UE) explain less than 30% of the variance across all 32 countries and play almost no role among the five largest economies. The labor force participation margin (combining NE and EN transitions) explains a non-negligible 25–30% of the aggregate male employment gap.
For female workers, at least half of the cross-country variance in employment is explained by participation-related flows, primarily transitions from nonparticipation to employment (NE). In the full 32-country sample, NE alone explains 65% of the variance in female employment rates across all ages (16–65). Its role is somewhat smaller in the five largest economies, where EN transitions also play a larger role. Crucially, the sum of NE and EN variance contributions for women is at least as large as the sum of UE and EU contributions, underlining the indispensability of a three-state model.

Main quantitative (model-based) findings. The model decomposes cross-country employment differences into technology (the distribution of permanent match quality, job-separation risk delta, and information frictions alpha), search parameters (vacancy costs, non-work utility, search-cost parameters), and policies (UI generosity, firing costs, taxes). The total employment variance across the five economies and two gender groups is 0.36 percentage points squared. Technology differences over-explain this variance (contribution of 0.65), while policies play almost no role (contribution of -0.04) and search frictions have a negative variance contribution (-0.25). The negative sign of search and policy contributions reflects the negative cross-country correlation between these factors and technology: countries with high employment rates (e.g., France) tend to have more generous UI and higher taxes, which the model attributes to compensating technology advantages. For individual countries: France is about 4.4 percentage points above the cross-country benchmark, driven by technology and partly offset by the highest replacement ratios and labor tax rates in the sample (67% and 56%, respectively). Spain is about 7 percentage points below the benchmark, driven by the lowest measured labor productivity (78% of Germany’s level) and the highest employment outflow rates (~4–5% per quarter vs. ~2% in France).

The channels through which technology affects employment are predominantly the employment inflows, not outflows. The exogenous job-separation risk delta affects aggregate employment mostly through its impact on expected duration of future employment spells, which reduces search incentives and job-finding rates from both unemployment and nonparticipation, and lowers labor force attachment. Similarly, mean permanent match quality (mu_x) and labor taxes (tau_ss) operate mainly through the inflow margin. Technology effects are amplified by search effort margins, particularly for women and youth: women face higher non-work utility (interpreted as labor-market frictions or opportunity costs), implying a lower employment surplus and therefore a higher surplus elasticity; for young workers, the long remaining horizon amplifies the effect of technology variations on discounted lifetime earnings, generating relatively higher search-effort responses.

Scope conditions. The analysis is confined to European countries. The structural decomposition covers only the five largest European economies. The authors acknowledge that parameters labeled as “job-separation risk” may also capture employment protection and temporary contracts not explicitly modeled, or non-monetary quit motives, so the attribution to “technology” should be interpreted with that caveat in mind. The model operates in a complete-markets, no-savings environment without on-the-job search.

In depth

Q1. What fraction of cross-country employment variance is explained by transition probabilities vs. demographics and initial conditions?

A: In the full 32-country sample, transition probabilities account for 94.7% of the cross-country variance in aggregate male employment and 99.9% for female employment. In the five largest economies, the corresponding figures are 93.5% (men) and 104.9% (women) — the slight excess above 100% reflects the negative contribution of initial conditions for women. Demographics and initial conditions together explain less than 10% of the variance, with somewhat larger demographic effects in Baltic and Eastern European countries, plausibly due to emigration-driven changes in age composition.

Q2. For male workers, which specific transition probability dominates the cross-country employment variance, and how does this vary by age and across country groupings?

A: EU (employment-to-unemployment) transitions account for approximately 51% of the cross-country variance in aggregate male employment (ages 16–65) across all 32 countries, rising to 77% in the five largest economies, and to 89% for prime-age males (ages 25–54) in the same group. By contrast, UE (job-finding from unemployment) explains at most 29% across all 32 countries and virtually nothing in the five largest economies. For prime-age men, EU remains dominant throughout; toward the end of the working life, EN (employment-to-nonparticipation) transitions become the main driver as workers move into retirement.

Q3. For female workers, what is the primary driver of cross-country employment variance, and does the pattern differ from men?

A: For women, transitions from nonparticipation to employment (NE) explain 65% of the cross-country variance in female employment across all ages in the 32-country sample. This dominance is more concentrated at ages 20–30, when participation entry is particularly heterogeneous across countries, likely reflecting fertility and child-rearing patterns. The sum of NE and EN contributions for women equals or exceeds the combined UE and EU contributions in both country groupings, demonstrating a fundamentally different demographic structure of employment differences for women relative to men.

Q4. How does the model generate life-cycle variation in transition rates despite having age-independent primitives?

A: The model produces age-varying transition rates through two mechanisms operating on age-independent fundamentals. First, variable search intensity declines as workers age because the remaining time to retirement shortens, reducing the expected lifetime returns to job search — the “horizon effect” (Cheron et al. [2011, 2013]). This mechanism explains virtually all of the life-cycle variation in the NE job-finding rate and an overwhelmingly large share of the variation in the UE rate, as shown by counterfactual exercises that fix search intensity at its life-cycle average. Second, information frictions about permanent match quality generate declining separation rates over the working life: young workers disproportionately hold matches with unrevealed quality and thus face higher reallocation risk upon quality revelation; as workers age, their employment share shifts toward matches with revealed quality, which have lower separation rates due to sorting.

Q5. What does the structural decomposition (Table 7) reveal about the role of technology vs. policies in explaining cross-country employment differences?

A: The variance decomposition in Table 7 shows that technology parameters (permanent match-quality distribution, job-separation risk delta, and match-quality revelation probability alpha) account for a variance contribution of 0.65 (against total employment variance of 0.36), over-explaining the cross-country dispersion. Labor market policies (UI benefits, firing costs, taxes) have a near-zero variance contribution of -0.04. Search parameters contribute -0.25. The result that policies explain little does not mean they have no level effect: in simple comparative statics, the model predicts that more generous UI and higher labor taxes lower employment. However, in the cross-country calibration, countries with higher employment rates tend to have more interventionist policies, so the cross-country correlation between policies and technology masks individual policy effects at the variance level.

Q6. How do technology effects propagate to employment differences through worker flows, and why is the inflow channel dominant?

A: Table 8 decomposes employment elasticities with respect to delta (job-separation risk), mu_x (mean log permanent match quality), and tau_ss (social security tax rate) into contributions from (i) the NE job-finding rate, (ii) the share of nonemployed in the labor force (labor force attachment, u-tilde), (iii) the differential between UE and NE rates, and (iv) the employment outflow rate (pEO). At the aggregate level, the separation risk delta has an employment elasticity of -0.28, of which the outflow contribution (dpEO = -0.08) is smaller in absolute magnitude than the sum of inflow contributions (dpNE = -0.06, du-tilde = -0.07, dpDelta = -0.06). Mean match quality mu_x has an employment elasticity of 0.53, primarily mediated through inflows. The mechanism is that changes in delta or mu_x alter expected lifetime earnings, which in turn change search incentives and participation decisions, generating correlated movements in job-finding rates and labor force attachment that amplify the employment impact beyond what a simple outflow change would imply.

Q7. Why do women and youth show larger search-effort responses to technology variations?

A: For women, the calibrated non-work utility yo is higher in all five countries than for men (interpreting this as extra costs and wedges on the returns to working), which implies a smaller employment surplus. A smaller surplus generates a higher elasticity of surplus with respect to parameter changes, and since search intensity and participation decisions depend on expected surplus, women exhibit larger employment elasticities to technology variations. The aggregate employment elasticity of delta is -0.39 for women vs. -0.19 for men; for mu_x, it is 0.78 for women vs. 0.33 for men. For youth (ages 20–29), the long remaining horizon amplifies the effect of technology changes on discounted expected lifetime earnings, which in turn amplifies participation incentives: the labor force attachment channel (du-tilde) contributes -0.13 for youth compared to -0.07 at the aggregate, while dE = -0.31 for youth vs. -0.28 aggregate for delta.

Q8. What is the quantitative role of individual technology sub-components (match quality, job-separation risk, information frictions)?

A: Panel B of Table 7 breaks down technology into three sub-components. Match quality (mean mu_x and variance sigma^2_x) and job-separation risk (delta) are the key drivers; the match-quality revelation probability (alpha, “match revelation”) plays almost no independent role (variance contribution approximately 0.00). For France, the primary positive technology contributor is mean match quality (consistent with France’s labor productivity slightly above the German benchmark). For Germany and the U.K., the low job-separation risk is the primary positive contributor. For Spain, the high job-separation risk — calibrated to match Spain’s employment outflow rate of around 4–5% per quarter versus 2% in France — is the main negative contributor, reflecting the widespread prevalence of temporary contracts.

Q9. What role do labor market policies play at the country-specific level, even though they explain little cross-country variance?

A: Panel C of Table 7 shows that employment protection legislation plays almost no role for any country. Labor taxes are quantitatively important: they explain the relatively high employment rate in the U.K. (the country with the lowest social security contribution rate, about 20%), contributing positively. In France, where labor taxes exceed 50% of the average wage, the policy contribution is strongly negative, roughly offsetting the large positive technology contribution. UI benefits lower aggregate employment — Italy, with calibrated UI benefits lower than France’s, has a smaller employment gap vis-a-vis the benchmark partly because of this. The finding that policies explain little variance while having large individual-country effects is explained by the negative cross-country correlation: countries with generous policies also tend to have favorable technology, so policy and technology contributions partially offset each other in the variance decomposition.

Q10. How does the model fit untargeted moments, particularly the empirical Shapley-Owen variance decomposition?

A: The model is calibrated to aggregate transition rates by gender, and to moments describing labor productivity, vacancy rates, and policy targets. Despite having age-independent primitives, the calibrated model captures the empirical life-cycle profiles of transition rates as untargeted moments: declining NE and UE rates with age, rising EN rates near retirement, and the hump-shaped patterns. More stringently, the model replicates the empirical Shapley-Owen variance decomposition: it correctly predicts that EU separations account for most of the employment variance for men, and that NE inflows are relatively more important for women and youth. A notable limitation is that the model overshoots the UN (unemployment-to-nonparticipation) transition rate for a significant share of data points — but the authors note that flows between U and N play almost no role in cross-country employment variance.

Q11. What is the “horizon effect” and how does it operate in this model?

A: The horizon effect, coined by Cheron et al. [2011, 2013] in a two-state (E/U) DMP model, refers to the phenomenon that as workers approach retirement, the expected returns to job search fall because the remaining period of employment is shorter. This reduces search intensity from both unemployment and nonparticipation, lowering job-finding rates, and in the present model also affects the match-acceptance probability: workers near retirement find it optimal to remain in unemployment to collect UI benefits rather than accept a job offer, further reducing the UE rate. The current paper generalizes this effect to a three-state setting by incorporating the labor force participation margin alongside search intensity, generating plausible declining job-finding rates and increasing EN rates at older ages from age-independent parameters.

Q12. How does the paper handle the gender dimension in the model calibration?

A: The model assumes that men and women share the same production and matching technology parameters within a country (A, cv, delta, alpha, mu_x, sigma^2_x, sigma^2_z), but allows the search-cost and non-work-utility parameters (ceu, cnu, cu, kappa_u, kappa_n, yo) to differ by gender. The gender-specific search parameters are identified from the gender-specific transition rates: for example, kappa_u (marginal search cost in unemployment) for women is inferred from the female UE transition rate, relative to the normalization for men. The non-work utility yo is consistently higher for women in all five countries, rationalizing lower female employment through a lower employment surplus. This generates a higher surplus elasticity for women, which in turn explains why women’s employment is more responsive to technology variations across countries.

Key Concepts

Shapley-Owen Decomposition. A method from cooperative game theory (Shapley [1953], Owen [1977]) used here to decompose cross-country differences in employment into contributions of individual worker-flow transition rates (or structural parameters). It computes the marginal contribution of each component averaged over all 6! = 720 orderings of the six transition rates, yielding a unique, symmetric, exact decomposition that sums to the total employment gap. Unlike sequential decompositions, it is path-independent.

Extensive Margin of Search Effort. The binary labor force participation decision: whether a nonemployed worker enters the unemployment state (and thus accesses the superior search technology at a flow cost) or remains in nonparticipation. In the paper’s model, this is captured as a discrete choice between states U and N, governed by i.i.d. extreme-value utility shocks, yielding a closed-form logit participation probability.

Intensive Margin of Search Effort. The continuous choice of search intensity s in [0,1] by nonemployed workers (both unemployed and nonparticipants), which scales the probability of meeting a vacancy per period. The optimal intensity equates the marginal cost of search (convex in s) to the marginal benefit (the expected surplus from meeting a firm times the contact rate). Search intensity declines with age because the remaining working life shortens, reducing the discounted value of a job.

Permanent Match Quality (x). A time-invariant, match-specific productivity component drawn from a log-normal distribution upon meeting a firm, but initially unobserved by both worker and firm (an experience good). With per-period probability alpha, the quality is revealed; prior to revelation, the parties form expectations over the distribution. Revelation triggers reallocation of bad matches, generating a negative relation between job tenure and separation probability (following Jovanovic [1979]).

Horizon Effect. The mechanism by which workers reduce search effort as they approach retirement because the expected present value of future employment spells shortens. In this paper the concept, coined by Cheron et al. [2011, 2013] in a two-state DMP setting, is extended to include the labor force participation margin: near-retirement workers not only search less intensively but also become more likely to choose nonparticipation (or to remain unemployed to collect benefits rather than accept a job), generating the observed life-cycle decline in job-finding rates from age-independent parameters.

Technology Parameters (theta). In the paper’s structural decomposition, “technology” refers specifically to the vector (mu_x, sigma^2_x, alpha, delta) — the mean and variance of log permanent match quality, the match-quality revelation probability, and the exogenous job-destruction probability. These are contrasted with search-cost parameters (phi) and policy parameters (psi). The label “technology” is acknowledged to potentially also capture employment protection and quit motives not explicitly modeled.

Life-Cycle DMP Model. A finite-horizon version of the Diamond-Mortensen-Pissarides search-and-matching framework in which workers live for J periods, all primitives are age-independent, and life-cycle variation in worker flows arises endogenously from the interaction of the finite horizon with search intensity, labor force participation, and match-learning mechanisms. The model distinguishes three labor market states (E, U, N) and uses Nash bargaining to split the employment surplus.

Manipulation-Robust Prediction

Mon, 01 Jan 0001 00:00:00 +0000

This paper addresses the problem of algorithmic manipulation: when consequential decisions are encoded in machine learning algorithms, individuals strategically alter their behavior to achieve desired outcomes, undermining the predictive validity of the algorithm. The authors develop a “strategy-robust” approach to training decision rules that explicitly models the incentives and costs of manipulation, producing rules that remain stable even when fully transparent. They then deploy and evaluate this approach in a large field experiment in Kenya — the first real-world implementation and evaluation of such a strategy-robust empirical decision rule.

The theoretical framework considers a policymaker who observes training data with features x_i and optimal decisions y_i, and wishes to estimate a decision rule to apply to new instances where behavior may be manipulated. While the standard approach (OLS or LASSO) selects a rule optimal for the training distribution, the strategy-robust approach models how individuals will adjust behavior in response to the incentive structure implied by any given rule. Under linear decision rules and quadratic manipulation costs, each individual shifts behavior by C_i^{-1} * beta away from their “bliss level,” where C_i captures individual- and behavior-specific manipulation costs. The strategy-robust estimator finds the rule that minimizes prediction error in the counterfactual world where people manipulate — a “Stackelberg” solution that commits the policymaker to a rule while anticipating equilibrium behavioral responses. Unlike LASSO, which penalizes all features equally without regard to their manipulability, the strategy-robust approach attenuates the weight on features that are both easily manipulated and subject to manipulation noise.

The empirical setting is a smartphone app (“Smart Sensing”) deployed to 1,557 participants in Nairobi, Kenya, in collaboration with the Busara Center. The app passively collected over 1,000 behavioral indicators (calls, texts, app usage, mobility, etc.) and delivered weekly financial “challenges” that rewarded participants based on decision rules randomly assigned to them. Average weekly payouts were calibrated to approximate typical digital credit loan amounts in Kenya at the time (approximately $4.80). The experiment has two phases: a training phase using control (beta = 0) and simple single-behavior incentive rules to estimate manipulation cost parameters via GMM, and an implementation phase using complex multi-feature decision rules to compare strategy-robust versus LASSO classifiers.

The main findings are as follows. First, participants demonstrably manipulate behavior: a joint F-test that incentive diagonals all equal zero is rejected with p < 0.001. The number of texts sent was 49 times more responsive to incentives than the number of people called during the workday. Outgoing communications are cheaper to manipulate than incoming, and simple behaviors (e.g., average talk time) more manipulable than complex ones (e.g., standard deviation of talk time). Individuals who self-report higher tech skills find manipulation 9% easier on average, and the 90th percentile of gaming ability finds manipulation twice as easy as the 10th percentile.

Second, in the implementation phase, strategy-robust decision rules outperform LASSO when the decision rule is made transparent to participants. Across all pooled outcomes, strategy-robust rules reduce RMSE by 11% (p = 0.024) relative to LASSO under transparency. For the single income-prediction outcome alone, the improvement is 5% ($0.19 RMSE reduction) but not statistically significant (p = 0.507).

Third, the framework enables estimation of the “cost of transparency.” Making naive LASSO rules transparent lowers performance by 23%. Switching to strategy-robust rules under full transparency reduces that performance decline to 9.2% — a 60% reduction in the cost of transparency. The model predicts this cost to be 9.8%, close to the implemented value of 11.3%.

The scope of the findings is bounded by the linear model with quadratic manipulation costs, a particular population of Kenyan smartphone users, and financial incentive magnitudes comparable to small digital credit loans. The mechanism relies on experimentally estimating manipulation cost parameters, though the authors also show that expert elicitation provides a correlated but noisier substitute (correlation 0.30 with experimental estimates).

Q: What is the core market failure the paper addresses, and why do standard fixes fail?

A: Standard machine learning training assumes the relationship between observed features and outcomes is stable, but implementing a consequential decision rule creates incentives for individuals to manipulate the features on which the rule is based (Goodhart’s Law; Lucas critique). The two common industry responses — restricting to “stable” predictors and keeping rules secret — are inadequate: restricting predictors amounts to a dogmatic prior that manipulation costs are either infinite or zero, while secrecy is increasingly at odds with demands for algorithmic transparency and fails anyway when sophisticated actors reverse-engineer the rule. Periodic retraining treats manipulation as generic covariate shift, can produce non-converging oscillations, and requires observing mistakes before learning from them.

Q: How does the strategy-robust estimator differ from OLS and LASSO?

A: OLS maximizes fit within the unincentivized training sample but ignores that implementing beta will shift behavior; LASSO adds a regularization penalty but still assumes behavior remains fixed at bliss levels and so penalizes all features equally regardless of manipulability. The strategy-robust estimator replaces each individual’s observed behavior x_i with their anticipated counterfactual behavior x_tilde_i(beta) = x_i + C_i^{-1} * beta, and finds the beta that minimizes prediction error in this manipulated distribution — a Stackelberg equilibrium. It attenuates features that are easily manipulated or subject to high manipulation noise, shifting weight toward harder-to-manipulate features even when the latter are less predictive in the training data.

Q: What are the three ways the strategy-robust estimator differs from standard estimators?

A: First, it anticipates level shifts in behavior: behaviors respond to beta, so observed training behaviors are replaced by counterfactual manipulated behaviors. Second, it accounts for signaling and noise: when manipulation ability correlates with the outcome of interest, manipulation can be informative about type (as in Spence 1973), but unobserved heterogeneity in gaming ability that is unrelated to outcomes introduces noise that attenuates coefficients on manipulable behaviors. Third, it achieves subgame perfection by anticipating how behaviors would respond to off-path deviations in beta, rather than assuming behaviors are fixed when beta deviates — yielding a Stackelberg rather than a one-step best-response solution.

Q: How were manipulation cost parameters estimated in the Kenya experiment?

A: In the training phase, each participant was randomly assigned to simple single-behavior incentive rules (e.g., “earn 12 Ksh. per incoming call this week, up to 250 Ksh.”) or control rules (beta = 0). This random variation in per-behavior incentives identifies how sensitive each behavior vector is to incentives, enabling GMM estimation of individual and behavior-specific cost parameters C and the heterogeneity scaling parameter omega. Off-diagonal elements of C were regularized to zero due to noisy estimation; diagonal elements used LASSO penalization with lambda = 1.0 set by cross-validation. Observable heterogeneity was allowed to vary with self-reported tech skills, which explained the most variation in preliminary analysis.

Q: What patterns were found in manipulation costs across behaviors?

A: Outgoing communications are cheaper to manipulate than incoming communications. Text messages, being relatively cheap to send, are more manipulable than calls. Simple behaviors such as average call duration are more manipulable than complex behaviors such as the standard deviation of talk time. Cross-behavior elasticities exist but are mostly noisy: 94.5% of off-diagonal incentive effects are not statistically significant (p < 0.05), 3.6% are significantly positive, and 1.8% are significantly negative.

Q: How large is heterogeneity in gaming ability, and what predicts it?

A: Individuals who self-report advanced or higher tech skills find it on average 9% easier to manipulate behaviors. Including unobserved heterogeneity, the 90th percentile of gaming ability finds manipulation twice as easy as the 10th percentile. Much of the heterogeneity arises from unobservables not captured by observables in the model.

Q: What happened when the naive LASSO rule was made transparent versus when the strategy-robust rule was made transparent?

A: Under the transparent treatment, participants received the full coefficients of the decision rule plus access to an interactive earnings calculator. Making naive LASSO rules transparent lowered performance by 23% relative to the opaque naive rule (RMSE $3.780 versus $4.641 in pooled outcomes). Switching to strategy-robust rules under full transparency reduced the performance decline to 9.2% — corresponding to a 60% reduction in the cost of transparency. The model predicted this cost to be 9.8%, which is close to the implemented value of 11.3%.

Q: What does the reduced-form evidence on behavior change under complex decision rules show?

A: Under the opaque treatment, participant behavior responses to complex decision rules were largely statistically insignificant and often in the wrong direction — 38.5% of estimated behavioral effects are in the same direction as the incentivized behavior. Under the transparent treatment, 75.4% of point-estimated effects are in the same direction as the incentive, confirming that transparency is a prerequisite for meaningful manipulation in this setting.

Q: How does the paper compare strategy-robust estimation to iterative retraining?

A: Simulation results show that iterative retraining of a naive LASSO model approaches the performance of the strategy-robust method after approximately 4 iterations. However, simulated performance of iterative retraining then begins to deteriorate; for the intelligence outcome, performance eventually falls below baseline performance before any retraining began. This illustrates that myopic best responses can produce non-convergent or suboptimal dynamics, while the strategy-robust approach finds the equilibrium rule directly.

Q: How does the paper compare strategy-robust estimation to the “intuitive” approach of simply excluding highly manipulable features?

A: The intuitive approach of excluding features above a manipulability threshold reduces predicted manipulability but also discards useful predictors. In some cases, the exclusions leave LASSO with no behaviors predictive enough to include, reducing performance. The strategy-robust approach can extract signal even from manipulable behaviors by adjusting their weights to account for manipulation noise, and outperforms the intuitive exclusion approach in the simulations reported in the Supplemental Appendix.

Q: Can manipulation costs be estimated without an experiment?

A: The authors briefly explore expert elicitation as a nonexperimental alternative: 171 individuals were surveyed to predict how Kenyans would manipulate phone behaviors when incentivized. Experts generally predicted lower costs (more manipulability) than observed experimentally, but the correlation between expert predictions and experimental estimates is 0.30. Using expert-elicited costs to train the strategy-robust model improved simulated performance substantially for one focal outcome and had an inconsequential negative effect for the other. Costs can also potentially be estimated from market prices and first principles when a structural model of underlying manipulations is available.

Q: What is the paper’s interpretation of its results through the lens of the Lucas critique?

A: The paper frames its contribution as a machine learning interpretation of Lucas (1976): just as implementing an economic policy changes the behavioral relationships on which the policy was calibrated, implementing a predictive decision rule beta changes the distribution of the very features the rule is based on. The key insight is that this counterfactual world has predictable structure — including a feature in the model tends to induce manipulation in that feature of a magnitude directly related to beta — so counterfactual fit can be estimated and rules can be optimized to perform well in the equilibrium they induce.

Q: What are the policy implications for algorithmic transparency?

A: The framework allows a policymaker to quantify and reduce the performance cost of transparency. The estimated equilibrium cost of transparency is roughly 10% when using strategy-robust rules, substantially less than the approximately 23% cost of making naive rules transparent. This means that strategy-robust rules can be disclosed — satisfying demands for a “right to explanation” under regulations such as GDPR — while losing far less performance than opaque naive rules would lose if disclosed.

Strategy-robust decision rule: A decision rule trained to anticipate that individuals will manipulate the features on which it is based, by replacing observed training behaviors with anticipated counterfactual manipulated behaviors in the loss function. It yields a Stackelberg equilibrium in which the policymaker commits to a rule while correctly forecasting the equilibrium behavioral response.

Manipulation costs (C_i): Individual- and behavior-specific quadratic costs that determine how far an individual shifts behavior from their bliss level in response to the incentive implied by a decision rule’s coefficient vector beta. Higher costs imply less behavioral response; costs are parameterized to allow separable heterogeneity by person and by behavior.

Bliss level (x_i): An individual’s unincentivized behavior — the behavior they would exhibit absent any decision rule (i.e., when beta = 0). Estimated from control periods in the experiment.

Gaming ability (gamma_i): Individual-level scaling factor for manipulation costs; a higher value means lower costs and easier manipulation. Modeled as a function of observable characteristics (e.g., self-reported tech skills) and unobservable heterogeneity.

Counterfactual fit: Predictive fit evaluated in the counterfactual state of the world where the decision rule is implemented and agents manipulate their features in response. The strategy-robust approach maximizes counterfactual fit, sacrificing within-sample fit (as measured on unmanipulated training data) to improve performance in deployment.

Cost of transparency: The reduction in predictive performance of a decision rule when its coefficients are disclosed to the individuals being evaluated. In the experiment, disclosure reduces performance of naive LASSO rules by 23% and strategy-robust rules by 9.2%, implying strategy-robust rules reduce the cost of transparency by 60%.

Stackelberg equilibrium: The solution concept in which the policymaker (leader) commits to a decision rule, correctly anticipating the best-response behavior of individuals (followers), rather than taking behavior as fixed or updating myopically. The strategy-robust estimator implements this equilibrium concept.

Performative prediction: The broader phenomenon, drawing on Perdomo et al. (2020), whereby a decision rule changes the distribution of the data it is applied to. The paper’s strategy-robust approach is an empirically estimable solution within this framework.

Marriage, Fertility, and Cultural Integration in Italy

Mon, 01 Jan 0001 00:00:00 +0000

Bisin and Tura study the cultural integration of immigrants in Italy by estimating a structural model of marital matching embedded with intra-household decisions — fertility, socialization of children, and divorce — along cultural-ethnic lines. The central research question is how to decompose the demand for integration (from immigrants) and the supply of cultural acceptance (from natives) in explaining the pace and heterogeneity of cultural convergence.

The empirical analysis exploits administrative individual-level data from ISTAT’s ADELE Laboratory covering the universe of marriages formed in Italy from 1995 to 2012 and the universe of births and separations over the same period. After matching marriage, birth, and separation records, the final sample comprises more than 4 million marriages, representing 92.6% of all marriages celebrated in Italy over the period. Seven cultural-ethnic groups are studied: Italian (majority), Europe-EU15, Other Europe, North Africa–Middle East, Sub-Saharan Africa, East Asia, and Latin America. The model is a transferable-utility (TU) frictionless marriage market in which the joint marital surplus depends on a systematic component — itself the outcome of a collective household decision problem — and an idiosyncratic component capturing unobserved individual heterogeneity (following Choo and Siow, 2006). Parameters are estimated via method of moments, with identification drawing on cross-sectional variation across ethnic-group pairings and across Italy’s 20 administrative regions. Cultural socialization is proxied by language transmission (whether Italian is spoken at home with children).

The data confirm strong positive assortative mating along cultural-ethnic lines, with particularly high homogamy rates for Sub-Saharan African and East Asian minorities. Homogamous minority households show notably lower rates of Italian-language use at home — for East Asian parents, 20% in a homogamous marriage versus 92% in a heterogamous marriage. Heterogamous marriages have higher separation rates (7.5% for mixed families with at least one Italian spouse versus 6.4% for homogamous Italian couples) and lower fertility.

The estimated cultural intolerance parameters — measuring the psychological value a parent places on socializing a child to his/her own ethnic identity relative to a child acquiring a different identity — are strictly positive, asymmetric across directions, and highly heterogeneous across groups. North Africa–Middle East immigrants exhibit the highest minority intolerance (estimated at 97.85), more than six times that of Europe-EU15 immigrants (6.69). Latin America (93.13), Sub-Saharan Africa (87.08), and East Asia (81.22) also show high intolerance. On the native side, Italian intolerance is highest toward Sub-Saharan African immigrants (78.23) and lowest toward Europe-EU15 immigrants.

Long-run simulations over successive generations show that all minorities eventually converge to the Italian majority along the language dimension, but at heterogeneous rates. Seventy-five percent of second-generation immigrants speak Italian at home with their children (one-generation integration rate). Europe-EU15 and Other Europe minorities converge almost completely within a single generation. Latin America shows the slowest path, with only 70% integration after four generations. East Asia and Sub-Saharan Africa also integrate more slowly, driven respectively by high fertility rates and strong selection into homogamous marriages.

A counterintuitive counterfactual result is central to the paper: if Italian cultural intolerance were reduced to zero (full acceptance), cultural integration of minorities would slow by 15 percentage points over a generation (from 93% to 78% by the third generation). The mechanism is that greater native acceptance enables immigrants to sustain their own language even within heterogamous (mixed) marriages, increasing demand for such marriages and raising minority fertility, thereby preserving cultural distinctiveness.

Finally, doubling immigration inflows while holding population shares constant reduces third-generation integration from 93% to 86% (a 7-percentage-point reduction). Effects are concentrated among Sub-Saharan African (20-percentage-point reduction) and East Asian (6-percentage-point reduction) minorities, with little impact on European and North African minorities. When inflows are reweighted toward Sub-Saharan African and East Asian groups, integration losses for those minorities range from 20 to 60 percentage points by the third generation.

Q: What is the paper’s core methodological contribution? A: The paper embeds a collective household decision problem — covering fertility, socialization, and divorce — within a transferable-utility frictionless marriage matching framework. This allows marital utility to emerge endogenously from intra-household decisions rather than being specified exogenously. The key innovation is that socialization incentives and technologies differ systematically between homogamous and heterogamous marriages, and these differences feed back into marital matching and long-run cultural dynamics.

Q: What does “cultural intolerance” mean in this model, and how is it identified? A: Cultural intolerance is the psychological value a parent obtains from socializing a child to his/her own ethnic identity, relative to having a child adopt a different cultural-ethnic identity. It is the main parameter driving socialization effort and resistance to cultural integration. Identification relies on two sources of cross-sectional variation: differences in matching patterns, fertility, separation, and socialization rates across cultural-ethnic group pairings, and exogenous variation in the ethnic composition of the regional population across Italy’s 20 administrative regions.

Q: How heterogeneous are the estimated cultural intolerance parameters across minority groups? A: The parameters are highly heterogeneous. North Africa–Middle East immigrants have the highest estimated minority intolerance (97.85), more than six times the EU15 estimate (6.69). Latin America (93.13), Sub-Saharan Africa (87.08), and East Asia (81.22) are also substantially higher than EU15. The matrix is asymmetric: Italian intolerance toward Sub-Saharan Africans (78.23) is higher than toward North Africans (67.88), even though those two groups show comparable minority intolerance levels.

Q: What are the three mechanisms beyond intolerance parameters that explain heterogeneous integration dynamics? A: First, selection into homogamous marriages: Sub-Saharan Africa’s particularly strong selection into homogamy gives those households access to superior coordinated socialization technology, sustaining cultural heterogeneity despite similar intolerance levels to other groups. Second, fertility rates: East Asian minorities have particularly high estimated fertility, which amplifies the transmission of their cultural identity across generations. Third, socialization effectiveness in heterogamous marriages: Latin American immigrants are uniquely able to socialize children to their own language even when married to native Italians, making their integration the slowest despite being in many mixed marriages.

Q: What is the counterintuitive result about Italian cultural intolerance and integration speed? A: Lowering Italian cultural intolerance to zero would reduce minority integration by 15 percentage points over one generation, with third-generation integration falling from 93% to 78%. The intuition is that higher native acceptance enables immigrants to maintain their own language more effectively within heterogamous marriages, which in turn increases immigrant demand for intermarriage with natives and raises minority fertility — both of which slow cultural convergence rather than accelerating it.

Q: How do divorce dynamics differ between homogamous and heterogamous households? A: Heterogamous households exhibit higher separation rates than culturally homogeneous unions: 7.5% for mixed families with at least one Italian spouse versus 6.4% for homogamous Italian couples. In the model, divorce by heterogamous households can be a strategic choice by mothers with high cultural intolerance, since custody grants single mothers greater unilateral control over socialization. Divorce probabilities are decreasing in the number of children for both family types. Interestingly, heterogamous households invest more in socialization when divorced than when married, because the high-intolerance parent can act without spousal opposition.

Q: How well does the model fit the data? A: The raw correlation between predicted and observed gains to marriage is 0.84. The correlation between predicted and observed foreign-language socialization rates is 0.83, for both homogamous and heterogamous families. The dataset covers 92.5% of all marriages in Italy from 1995 to 2012, representing over 4 million marriages matched with birth and separation records at a 98.5% one-to-one match rate.

Q: What happens to cultural integration when immigration inflows are doubled with an overweighting of North Africa–Middle East, Sub-Saharan Africa, and East Asian immigrants? A: North Africa–Middle East immigrants reduce third-generation convergence by only 4 percentage points. By contrast, East Asian and Sub-Saharan African minorities produce integration losses ranging from 20 to 60 percentage points by the third generation. This wide range reflects how the interaction between high fertility, strong homogamy selection, and effective socialization in heterogamous marriages amplifies cultural persistence when these groups constitute a larger share of inflows.

Q: What is the one-generation cultural integration rate, and which groups diverge most from it? A: Seventy-five percent of second-generation immigrants speak Italian at home with their children, constituting the one-generation baseline integration rate. Europe-EU15 and Other Europe minorities converge almost completely within one generation, as does North Africa–Middle East. Latin America diverges most sharply downward, with only 70% integration even after four generations, and shows a partial retreat from integration in the first generation. Sub-Saharan Africa and East Asia also fall below the 75% one-generation benchmark.

Q: How does the paper relate to the debate on native labor market effects of immigration? A: The paper notes that sizeable negative labor market effects of immigration on natives are far from well-documented in the empirical literature, with results ranging from negative wage effects (Borjas) to positive or heterogeneous effects (Card, Ottaviano-Peri, Dustmann et al.). The authors therefore focus on the cultural externalities channel, which they argue better explains voter opposition to immigration, and study cultural integration structurally rather than examining wage outcomes.

Cultural intolerance: The psychological value a parent obtains from socializing a child to his/her own ethnic identity, relative to having a child adopt a different cultural-ethnic identity. It is specific to the household type (homogamous vs. heterogamous) and is the primary parameter measuring the strength of a group’s resistance to cultural integration.

Cultural socialization / language transmission: The costly investments parents make to transmit their own cultural-ethnic traits to children. In the empirical model, socialization is proxied by whether a parent speaks his/her own non-Italian language at home with children. Socialization technologies are more efficient in homogamous (same-ethnicity) marriages than heterogamous ones.

Homogamous vs. heterogamous marriage: A homogamous marriage is one in which both spouses share the same cultural-ethnic identity; a heterogamous marriage is one in which spouses differ. The distinction is load-bearing throughout the model: homogamous households have coordinated socialization incentives and superior technology, higher fertility, and lower separation rates.

Transferable utility (TU) matching: A marriage market framework in which utility is transferable between spouses, so that the equilibrium allocation maximizes aggregate marital surplus and equilibrium transfers are determined by outside options. The model is frictionless, meaning matching is driven purely by preferences over the characteristics of potential spouses.

Cultural integration (language dimension): In the paper’s long-run simulations, cultural integration is defined as the share of second- (or later-) generation immigrants who speak Italian at home with their own children. It is the empirical outcome used to track convergence to the majoritarian culture across generations.

Assortative mating along cultural-ethnic lines: The tendency for individuals to match with spouses of the same cultural-ethnic group. The paper finds positive assortative mating for all groups, with particularly strong homogamy for Sub-Saharan African and East Asian minorities, and explains it as the equilibrium outcome of the TU matching model given cultural intolerance preferences.

Socialization technology asymmetry: The model’s assumption that homogamous married parents hold a more efficient socialization technology than heterogamous parents, but that divorced heterogamous households invest more in socialization than married heterogamous ones, because the high-intolerance parent can act unilaterally without spousal opposition.

Measuring and Mitigating Racial Disparities in Tax Audits

Mon, 01 Jan 0001 00:00:00 +0000

Overview

Research Question. Do Black taxpayers face higher IRS audit rates than non-Black taxpayers, despite race-blind audit selection? And if so, why — and what would mitigation look like?

Data and Methodology. The authors use comprehensive administrative microdata covering approximately 148 million individual income tax returns and 780,627 operational audits for tax year 2014, supplemented with 71,878 research audits from the IRS National Research Program (NRP) pooled over 2010-2014. Because neither the researchers nor the IRS observe taxpayer race, the authors employ Bayesian Improved First Name Surname Geocoding (BIFSG), which imputes the probability that a taxpayer is Black from first name, surname, and Census Block Group. They develop a novel partial identification strategy: two estimators (a probabilistic estimator and a linear estimator) that, under conditions verified using a matched North Carolina voter-registration dataset containing self-reported race, asymptotically bound the true racial audit disparity from below and above respectively. To address the selective labels problem — underreporting is observable only for audited returns — the authors combine operational audit data with NRP random-sample audits to simulate counterfactual audit selection algorithms.

Main Findings.

Magnitude of the disparity. The probabilistic estimator implies a racial audit disparity of 0.81 percentage points; the linear estimator implies 1.34 percentage points. Against a base audit rate of 0.54% for the overall U.S. population in 2014, these bounds imply that Black taxpayers are audited at between 2.9 and 4.7 times the rate of non-Black taxpayers.

Role of the EITC. The disparity is concentrated among EITC claimants. The estimated disparity within the EITC population is 1.96 to 2.90 percentage points, compared to only 0.10 to 0.18 percentage points among non-EITC claimants. In relative terms, Black EITC claimants are audited at 2.9 to 4.4 times the rate of non-Black EITC claimants. A formal decomposition attributes 70-73% of the overall disparity to higher audit rates among Black EITC claimants, 20-21% to racial differences in EITC claiming rates, and 7-8% to differential audit rates among non-EITC filers. Within EITC claimants, 78.5% of the observed audit disparity is attributable to the Dependent Database (DDb) program.

Source of the disparity — algorithmic objective. Using counterfactual audit selection algorithms estimated on NRP data, the authors find that allocating EITC audits to maximize detected total underreporting (from any source) would produce audit rates of 0.74% for Black EITC claimants versus 1.63% for non-Black EITC claimants — reversing the disparity. In contrast, the status quo, which prioritizes detecting overclaimed refundable credits, yields 3.00% for Black claimants versus 1.04% for non-Black claimants. The primary driver is a difference in the types of noncompliance that are more prevalent by race: dependent-claiming errors are more common among Black EITC claimants (dependent error rate of 26.6% vs. 16.3% for non-Black), while the highest underreporting via business income underreporting is disproportionately concentrated among non-Black EITC claimants. An algorithm focused on refundable credit overclaims implicitly targets dependent errors and therefore selects Black taxpayers at higher rates.

Prediction model bias. Even conditional on the refundable-credit objective, the status quo disparity (1.96 p.p.) exceeds the disparity that would arise under an oracle that uses actual rather than predicted refundable credit overclaims (1.08 p.p.), suggesting that prediction errors are unevenly distributed by race. The refundable credit prediction algorithm generates a disparity of 1.75 p.p., approximately 60% larger than the oracle. The authors find suggestive evidence of missingness in birth certificate data (paternal information is disproportionately missing for children claimed on Black taxpayers’ returns) and differential predictive accuracy in the DDb risk score across race.

Operational consequences. Switching the objective from refundable credit overclaims to total underreporting would shift the composition of audited returns from predominantly dependent-eligibility issues (80% of refundable credit oracle-selected returns contain a dependent error) toward business income (86% of total-underreporting oracle-selected returns have business income underreporting). EITC returns with substantial business income (gross receipts above $25,000) cost on average $369.70 to audit versus $23.09 for other EITC returns. Holding the audit rate fixed, the switch would raise average examination costs by nearly an order of magnitude, while also increasing detected underreporting (mean adjustment of $22,578 per return under the total underreporting oracle versus $9,595 under the refundable credit oracle).

Scope Conditions. Results pertain primarily to tax year 2014. The paper finds similar patterns for tax years 2010, 2012, 2016, and 2018. The analysis covers Black versus non-Black taxpayers; disparities for other racial and ethnic groups are not the focus. The selective labels identification strategy relies on the NRP random-audit sample and the bounding conditions verified in the North Carolina matched data.

In depth

Q1. Why can’t the disparity be attributed simply to Black taxpayers being more likely to claim the EITC, combined with EITC claimants facing higher audit rates generally?

The authors test this directly by estimating racial audit disparities separately within EITC claimants and non-claimants. If differential EITC claiming rates were the full explanation, the within-EITC disparity would be close to zero. Instead, the disparity among EITC claimants (1.96-2.90 p.p.) is larger in absolute terms than the overall disparity (0.81-1.34 p.p.), indicating that Black EITC claimants face substantially higher audit rates than non-Black EITC claimants even holding EITC claimant status fixed. The formal decomposition attributes 70-73% of the overall disparity to differential audit rates within the EITC claimant population, not to differential claiming rates across the population.

Q2. How does the partial identification strategy work, and what are its key identifying assumptions?

The authors derive two estimators of the racial audit disparity that use BIFSG-imputed race probabilities rather than observed race. The probabilistic estimator weights each taxpayer’s contribution by their estimated probability of being Black; it is downward-biased when there is a positive residual covariance between audits and true race after conditioning on imputed race (E[Cov(Y,B|b)] > 0). The linear estimator regresses audit status on imputed race probability; it is upward-biased when there is a positive residual covariance between audits and imputed race after conditioning on true race (E[Cov(Y,b|B)] > 0). When both covariance terms are positive, the probabilistic and linear estimates bound the true disparity from below and above. The authors verify both conditions are positive and statistically significant (p < 0.01) in the matched North Carolina dataset, for the full population and the EITC population specifically.

Q3. Does the racial audit disparity within EITC claimants disappear when comparing taxpayers with similar levels of underreporting?

No. The authors use NRP data to estimate audit rates by race within each underreporting decile among EITC claimants. Within every decile of the underreporting distribution, the estimated audit rate for Black taxpayers exceeds that for non-Black taxpayers. An oracle algorithm that selects returns in descending order of actual underreporting produces an audit rate of 0.74% for Black EITC claimants and 1.63% for non-Black EITC claimants — the opposite of the status quo pattern (3.00% for Black, 1.04% for non-Black). This rules out total-dollar underreporting as the primary driver of the observed disparity.

Q4. Why does focusing audit selection on refundable credit overclaims specifically lead to higher audit rates for Black taxpayers?

Two mechanisms operate simultaneously. First, EITC eligibility is linked to children, so detecting erroneously claimed dependents generates large refundable credit adjustments. The dependent error rate is higher among Black EITC claimants than non-Black EITC claimants (26.6% vs. 16.3% in the probabilistic estimate, or 30.8% vs. 15.4% in the linear estimate). Second, the highest-dollar noncompliance via underreported business income is disproportionately concentrated among non-Black EITC claimants: among EITC claimants in the top 1% of business income underreporting, the probabilistic estimate shows 0.05% are Black compared to 0.21% non-Black. An algorithm aimed at refundable credit overclaims implicitly targets dependent errors and therefore selects Black taxpayers at higher rates; one aimed at total underreporting would prioritize business income underreporting instead and therefore select non-Black taxpayers at higher rates.

Q5. How do the simulated algorithms compare to the actual IRS algorithms?

The authors cannot directly replicate the IRS’s confidential DDb algorithm, but they provide three pieces of evidence that their refundable credit prediction algorithm is a reasonable proxy. First, public governmental documents describe DDb’s stated goal as identifying taxpayers who do not meet refundable credit eligibility requirements. Second, when selecting audits based on predicted refundable credit overclaims using largely the same features available to IRS, the authors generate a disparity (1.75 p.p.) close to the status quo disparity (1.96 p.p.). Third, operational audits of EITC returns are strongly associated with their predicted refundable credit overclaims measure but show a much weaker association with predicted total underreporting.

Q6. What does the status quo disparity exceeding the refundable credit oracle disparity reveal about prediction model design?

The status quo disparity (1.96 p.p.) is approximately 80% larger than the disparity that would arise if the IRS were perfectly informed about actual refundable credit overclaims and selected accordingly (oracle disparity: 1.08 p.p.). The refundable credit prediction algorithm generates a disparity of 1.75 p.p., approximately 60% larger than the oracle. This gap between the oracle and prediction disparity is consistent with prediction errors being distributed unevenly by race. The authors find that birth certificates of children claimed on Black taxpayers’ returns are substantially more likely to be missing paternal identity information, which may reduce the predictive accuracy of the DDb model for this population. They provide suggestive evidence that modifying the predictive features used could reduce the disparity without substantially degrading credit overclaim detection.

Q7. What are the downstream operational consequences of switching the algorithmic objective?

Switching from refundable credit overclaims to total underreporting would shift audited issues from dependent eligibility (80% of refundable credit oracle-selected returns have a dependent error) toward business income (86% of total underreporting oracle-selected returns have business income underreporting). Auditing business income returns is substantially more resource-intensive: $369.70 per return on average for returns with gross receipts above $25,000, versus $23.09 for other EITC returns. Holding the current EITC audit rate fixed, the share of audited returns with substantial business income would rise from 3% to 93%, raising total examination costs by nearly an order of magnitude. However, because total detected underreporting per audited return would also rise substantially (mean of $22,578 vs. $9,595), the increase in detected noncompliance would exceed the increase in audit costs, and the qualitative pattern persists even when accounting for higher per-return costs.

Q8. Is the disparity consistent across years, and is it driven by a particular audit type?

The authors find comparable audit disparities for tax years 2010, 2012, 2016, and 2018, confirming the 2014 results are not year-specific. The disparity is concentrated in correspondence audits: the estimated disparity in correspondence audit rates is 0.804-1.328 p.p. for the full population, while the disparity in field/office audit rates is only 0.010-0.016 p.p. The disparity is present in both pre-refund and post-refund audits, though pre-refund audits show a larger disparity even among correspondence audits alone. Among EITC claimants, the correspondence audit channel is nearly entirely responsible for the group-level disparity.

Q9. What heterogeneity exists within EITC claimants?

The disparity is especially pronounced among unmarried male EITC claimants with dependents: among this subgroup, the audit rate for Black men exceeds the audit rate for non-Black men by more than 4 percentage points, and both are an order of magnitude above the overall U.S. population audit rate. Disparities are smaller among joint filers, unmarried women, and unmarried men without dependents, though the ratio of Black to non-Black audit rates remains substantial across all subgroups. The concentration of the disparity among unmarried men with dependents is consistent with the role of dependent-claiming errors, which are more likely to arise in family structures characterized by nonmarital cohabitation — a pattern more prevalent among Black Americans due to lower marriage rates.

Q10. Can the disparity be attributed to disparate treatment — i.e., race-conscious selection?

The authors rule out disparate treatment for the EITC population. The DDb audit selection process for EITC returns is automated (no manual review), and IRS does not use race or geography as an input into audit selection. The disparity is therefore the product of disparate impact: race-neutral selection criteria interact with racially correlated patterns of tax return characteristics to produce differential audit rates. For higher-income non-EITC taxpayers, where audit selection may involve human classifiers, the authors cannot rule out disparate treatment.

Key Concepts

Audit Disparity (D). Defined in the paper as D = E[Y|B=1] - E[Y|B=0], the difference in audit rates between Black taxpayers (B=1) and non-Black taxpayers (B=0). This is a group-level difference in selection rates, not conditional on any other characteristic, and is the primary estimand throughout.

Probabilistic Disparity Estimator. An estimator that calculates group-specific audit rates by weighting each taxpayer’s contribution by their BIFSG-imputed probability of being Black (or non-Black). It is shown to be downward-biased when E[Cov(Y,B|b)] > 0, i.e., when there is residual positive association between true race and audits after conditioning on imputed race.

Linear Disparity Estimator. An estimator based on regressing audit status (Y) on BIFSG-imputed race probability (b). It is shown to be upward-biased when E[Cov(Y,b|B)] > 0, i.e., when imputed race probability predicts audits even after conditioning on true race. Together, the probabilistic and linear estimators form bounds on the true disparity under conditions verified empirically.

BIFSG (Bayesian Improved First Name Surname Geocoding). A probabilistic race imputation method that uses Bayes rule under a conditional independence assumption (first name, surname, and geography are independent given race) to compute Pr[Black | first name, surname, Census Block Group]. Applied here to all 148 million tax returns; calibrated and validated against matched North Carolina voter registration data with self-reported race.

Selective Labels Problem. The problem that noncompliance (underreporting) is observed only for returns selected for audit, not for the full filing population. In this paper it means the IRS cannot directly observe the underreporting distribution for unaudited returns. The authors address this using NRP random-audit data, which allows estimation of the unaudited underreporting distribution and construction of counterfactual selection algorithms.

Algorithmic Objective. The paper distinguishes between (1) the prediction component of audit selection — which model to use to forecast noncompliance — and (2) the objective component — what type of noncompliance to predict and pursue (overclaimed refundable credits versus total underreporting from any source). The paper finds that the objective, not just prediction error, is an independent driver of the racial audit disparity.

Dependent Database (DDb) Program. The IRS’s primary EITC audit selection program, responsible for approximately 75% of audited EITC returns in 2014. DDb flags returns based on rules, heuristics, and proprietary risk scores, with the stated goal of identifying taxpayers who do not meet refundable credit eligibility requirements. Selection through DDb is fully automated, without human classifier review.

National Research Program (NRP). A stratified random sample audit program through which the IRS conducts near-line-by-line examinations of a small fraction of the filing population each year (approximately 2% of audited returns in 2014). The paper pools 71,878 NRP audits from 2010-2014 to identify the distribution of underreporting in the full EITC filing population and to estimate counterfactual selection algorithms.

Merger Effects and Antitrust Enforcement: Evidence from US Consumer Packaged Goods

Mon, 01 Jan 0001 00:00:00 +0000

This paper by Bhattacharya, Illanes, and Stillerman makes two contributions to the debate over US antitrust enforcement stringency. First, it documents the price, quantity, and assortment effects of a comprehensive set of consummated mergers in US consumer packaged goods (CPG). Second, it develops and estimates a model of agency enforcement decisions to quantify antitrust stringency and simulate counterfactual outcomes under stricter regimes.

Data and scope. The analysis covers 129 product markets across 47 transactions in US CPG from 2006 to 2017, using the NielsenIQ Retail Scanner Dataset (covering 35,000–50,000 stores and 2.6–4.5 million UPCs). The sample is restricted to all deals valued at $280 million or more where both the acquirer and target sold products in at least one overlapping product market-DMA. Geographic markets are NielsenIQ designated market areas (DMAs). The sample is defined to avoid selection bias from studying only mergers that attracted press attention or were litigation targets.

Identification strategy. The empirical approach is a before-after event study within geography and product. For each merger, a brand-specific linear time trend is estimated from the 36 months prior to the merger announcement, controlling for UPC-DMA fixed effects, month-of-year fixed effects, input cost indices, and log median household income. Post-merger outcomes (24 months after completion) are measured as deviations from the extrapolated pre-merger trend. The identifying assumption is that secular demand and cost trends are gradual and well-captured by a linear trend. Pre-trend placebo tests show no significant departures from trend in the pre-period, and randomized-date placebos confirm that the linear trend is a better predictor of post-period outcomes under random merger dates than under actual merger dates, supporting the interpretation that observed post-period departures reflect merger effects.

Price effects. The average price effect of consummated CPG mergers is small: across specifications, estimates range from -0.6% to 1.0%, with a baseline mean of 0.3%. However, heterogeneity is substantial. The standard deviation of merger-level price effects is 4.0–7.5 percentage points. In the baseline specification, the first quartile of price effects is -2.1% and the third quartile is 3.7%. Merging and non-merging party price changes are positively correlated (correlation = 0.49), consistent with strategic complementarity. Thirty-six percent of mergers lead both groups to lower prices; 36% lead both groups to raise prices.

Quantity and assortment effects. Total quantities fall on average by 0.4–1.0% across specifications, with 60% of mergers producing quantity reductions. Merging parties exhibit a larger average quantity decline of 6.4%. Mergers also lead to a 2.7% average reduction in the number of stores served by merging parties, a 2.2% reduction in the number of brands sold in a DMA by merging parties, and a 3.2% reduction for non-merging parties. Brands with less than 5% of the merged entity’s sales are 6 percentage points more likely to be dropped post-merger.

Enforcement model. To interpret these outcomes relative to enforcement, the authors develop a model in which the agency receives a noisy signal of a merger’s price effect and challenges the merger if the posterior mean exceeds a threshold that is decreasing in deal size. They estimate the model by maximum likelihood using data on enforcement actions (6 mergers receiving remedies, 4 withdrawn under antitrust pressure) and realized price changes. The estimated sales-weighted average threshold is 4.8–6.3%: agencies act as if they challenge CPG mergers only when they expect a price increase exceeding this level. The posterior standard deviation of the agency’s assessment is 2.5–3.2 pp (aggregate prices) to 4.1–4.8 pp (merging-party prices).

Counterfactual stringency. Tightening the threshold from approximately 6.1% to 2.5% would roughly quadruple the challenge probability (from 0.075 to 0.30), reduce aggregate price changes of consummated mergers by approximately 1.4 pp, and lower the share of allowed anti-competitive mergers from roughly 50% to 35%. Critically, type I errors (blocking pro-competitive mergers) remain negligible at thresholds down to approximately 3%; at 0% threshold only 10% of blocked mergers would be type I errors. The primary cost of tighter enforcement is a significantly larger agency workload, not an increase in blocked pro-competitive mergers.

Scope conditions. Results pertain specifically to large CPG mergers (deal size ≥ $280 million) sold through US retail outlets, 2006–2017. Findings on structural presumptions show DHHI and merging share have predictive value for price changes, but structural metrics alone explain less than 10% of the variance in price effects (adjusted R-squared never exceeds 10% even with third-order interactions).

Q: What is the average price effect of consummated CPG mergers and how should it be interpreted? A: Across specifications, the average price effect is between -0.6% and 1.0%, with a baseline mean of 0.3%. This small average does not imply that enforcement is strict: Carlton (2009) shows that with perfect foresight, the largest observed price change — not the average — would indicate stringency. Because agencies face uncertainty, the distribution of realized price changes reflects both inframarginal approved mergers and the noise in agency forecasts.

Q: How large is the heterogeneity in merger price effects? A: The standard deviation of merger-level price effects is 4.0–7.5 percentage points across specifications. In the baseline, the first quartile of price effects is -2.1% and the third quartile is 3.7% for all parties combined. Merging parties specifically show a first quartile of -3.2% and third quartile of 3.7%, meaning a full quarter of mergers raise merging-party prices by more than 3.7%.

Q: How do merging and non-merging party prices co-move? A: Price changes for merging and non-merging parties are positively correlated (correlation = 0.49, s.e. = 0.08), consistent with strategic complementarity in pricing. Thirty-six percent of mergers lead both groups to lower prices, 36% lead both to raise prices, 13% cause merging parties to lower while non-merging parties raise, and 15% cause the reverse. The timing evidence shows merging-party prices begin changing upon merger completion, with rivals following suit.

Q: What happens to quantities following mergers? A: Total quantities fall on average between 0.4% and 1.0% across specifications, with 60% of mergers producing quantity reductions. Merging parties bear the bulk of quantity adjustment, with an average quantity decline of 6.4% and a standard deviation and interquartile range both around 30 pp. Non-merging party quantity changes are much less variable. The correlation between merging and non-merging party quantity changes is 0.36 (s.e. 0.08), which is positive — at odds with theoretical predictions from demand systems with the “type aggregation property” (Nocke and Schutz, 2018, 2024), where mergers should produce negatively correlated quantity changes.

Q: What non-price competitive responses do mergers trigger? A: Merging parties reduce the number of stores they serve by 2.7% on average, though in 38% of mergers store networks expand. Both merging and non-merging parties reduce product portfolios: merging parties drop the number of brands in a DMA by 2.2% on average and non-merging parties by 3.2%. Brands most likely to be dropped are those with less than 5% of the merged entity’s sales (6 pp more likely to be dropped), brands in small DMAs, and brands with small DMA shares.

Q: Do the Merger Guidelines’ structural presumptions (HHI, DHHI, merging share) predict price effects? A: DHHI and merging share have statistically significant but quantitatively modest predictive power. A 100-point increase in average DHHI is associated with a 0.2 pp increase in merging-party price changes and 0.3 pp for non-merging parties. Price effects are significantly larger when merging share exceeds 30%. However, structural metrics alone explain very little variance: adjusted R-squared never exceeds 10% even with third-order interactions of HHI, DHHI, merging share, private label share, and market size. Within-merger, DHHI is positively correlated with local price changes, and markets with DHHI above 200 exhibit significantly higher price effects than those below.

Q: How do the authors model antitrust enforcement and identify its stringency? A: The agency observes a noisy signal of a merger’s price effect, forms a posterior distribution combining a normally distributed prior (mean X’beta, standard deviation sigma_p*) with a normally distributed signal error (standard deviation sigma_epsilon), and challenges the merger if the posterior mean exceeds a threshold that is decreasing in deal size. The model is estimated by maximum likelihood: for approved mergers, the realized price change is observed; for withdrawn/remedied mergers, the posterior mean must have exceeded the threshold. Six mergers (from four deals) received remedies for horizontal market power concerns and four mergers (from two deals) were withdrawn under antitrust pressure, forming the challenged set.

Q: What is the estimated enforcement threshold and how does it vary across mergers? A: The sales-weighted average threshold is 4.8–6.3% using aggregate price changes and 6.6–7.8% using merging-party price changes. The threshold is lower for larger mergers: a 10% increase in merging-party sales is associated with an approximately 0.06 pp decrease in the threshold. The first quartile of thresholds across mergers is 4.5–5.6% and the third quartile is 5.6–6.9%, reflecting that the agencies apply stricter standards to larger deals.

Q: How accurate are the agencies’ forecasts of merger price effects? A: Using only the prior (structural characteristics), the agency’s accuracy in classifying mergers as anti-competitive versus pro-competitive is 56% (s.e. 3 pp). Adding the signal increases accuracy to 83% (s.e. 9 pp). The correlation between the prior mean and the true price change is 0.29 (s.e. 0.08); the correlation between the posterior mean and the true price change is 0.85 (s.e. 0.15). The posterior standard deviation is 2.5–3.2 pp for aggregate price changes and 4.1–4.8 pp for merging-party price changes.

Q: What would happen under stricter antitrust enforcement? A: Tightening the average threshold from 6.1% to 2.5% would raise the challenge probability from approximately 0.075 to 0.30 — roughly quadrupling it — and would reduce aggregate price changes of consummated mergers by approximately 1.4 pp (from roughly 0.2% to -1.2%). Moving to a 0% threshold would result in challenges to 57% of mergers, with 60–70% of consummated mergers then causing price decreases.

Q: How large are type I and type II errors at the current and counterfactual thresholds? A: At the current threshold (~6.1%), approximately 50% of allowed mergers are type II errors (anti-competitive mergers that should have been challenged). Type I errors (pro-competitive mergers wrongly blocked) are negligible at the current threshold and only become non-trivial starting around a 3% threshold. At a 2.5% threshold, the type II error share falls to 35%; at a 0% threshold, to 16%, while type I errors reach 10% of blocked mergers. The primary trade-off of stricter enforcement is therefore a larger agency workload, not an increase in blocking pro-competitive mergers.

Q: What identification strategy is used and how is it validated? A: The strategy is a within-product, within-geography before-after comparison using a brand-specific linear pre-merger trend as the counterfactual. Validation proceeds through three checks: (1) coefficient plots from an extended event study show no significant pre-trends after controlling for the linear trend; (2) a plot of brand trends against estimated price effects shows little explanatory power (statistically significant negative correlation but small magnitude, not consistent with results being driven by trend extrapolation); (3) placebo tests randomizing merger dates within the same markets yield a distribution centered at zero, narrower than the true distribution, and a significantly higher mean squared prediction error in the post-period, confirming that the linear trend is a better predictor under randomly assigned merger dates than under true dates.

Q: Why do the authors not use alternative control group approaches? A: Non-merging firms in the same market are rejected as controls because they may strategically respond to the merger. Synthetic controls using similar-industry untreated markets are rejected because deals often treat multiple similar markets (ruling out natural donors) and estimates prove sensitive to individual donors. Geographic controls (markets where merging parties have small shares) are rejected because they omit all 39 national mergers, untreated markets are not randomly selected, and regional pricing by non-merging parties could propagate effects into untreated regions, biasing estimates toward zero.

Merger retrospective. In this paper’s usage, an ex-post empirical study of the price, quantity, and assortment effects of a consummated merger, using pre-merger trends as the counterfactual, as opposed to forward-looking merger simulation.

Enforcement stringency. The marginal price increase at which the antitrust agency would expect to challenge a merger. Measured here as the sales-weighted average posterior-mean threshold: the value above which the agency acts as if it would propose a remedy, estimated at 4.8–6.3% for US CPG mergers.

Type I error (antitrust). The mistake of challenging (blocking) a merger that would have reduced prices (a pro-competitive merger). In the model, this occurs when an adverse signal causes the agency to block a merger whose true price effect is below the threshold.

Type II error (antitrust). The mistake of allowing a merger that increases prices (an anti-competitive merger). In the model, this occurs when a favorable signal causes the agency to approve a merger whose true price effect is above the threshold. Estimated at approximately 50% of allowed mergers at the current enforcement threshold.

Structural presumptions. The HHI-based rules in the 2010 and 2023 Merger Guidelines that create a presumption of competitive harm when DHHI exceeds specified thresholds (e.g., DHHI > 200 and post-merger HHI > 2,500 for the “red zone”). The paper finds DHHI and merging share have statistically significant but low explanatory power (adjusted R-squared below 10%) for actual price changes.

Prior and signal (in the enforcement model). The agency’s prior is a normal distribution over the merger’s true price effect, parameterized by structural characteristics (HHI, DHHI). The signal is a noisy draw centered on the true price effect, capturing information gathered through due diligence (e.g., evidence of efficiencies). The posterior mean — combining prior and signal — determines whether the agency challenges the merger.

Product market-deal pair (merger). The unit of observation in the empirical analysis: a specific NielsenIQ product module (e.g., soluble coffee) within a specific acquisition transaction (e.g., a food conglomerate merger). The sample contains 129 such pairs across 47 deals.

Optimal Public Transportation Networks: Evidence from the World's Largest Bus Rapid Transit System in Jakarta

Mon, 01 Jan 0001 00:00:00 +0000

This paper studies how commuter preferences over wait times, travel times, and transfers should shape the design of urban bus networks, using the world’s largest Bus Rapid Transit (BRT) system — TransJakarta in Jakarta, Indonesia — as the empirical laboratory. The setting provides unusually rich identification: between January 2016 and February 2020, TransJakarta launched 93 new BRT and non-BRT feeder routes in a staggered, city-wide expansion, during which the operating bus fleet more than doubled from roughly 700 to over 1,600 vehicles. The authors combine over 500 million smart-card tap records, GPS tracking of every bus at 5–10 second intervals, and anonymized smartphone location data covering 35 million weekday trips from 2.3 million devices.

The paper proceeds in three steps. First, the authors classify new route launches into three event types and estimate their causal impact on ridership via difference-in-differences. Event 1: a new direct connection between an origin-destination pair already served by transfer only, with no travel-time improvement — raises BRT ridership by 0.16 log points. Event 2: a new direct connection that also reduces travel time (by 0.29 log points on average) — raises ridership by 0.27 log points. Event 3: additional buses on an already-directly-connected pair, which increases the bus arrival rate by 0.32 log points and reduces wait times — raises ridership by 0.09 log points, implying a ridership elasticity with respect to wait times of approximately −0.29 for BRT. For non-BRT routes the implied wait-time elasticity is −1.05, raising the possibility of multiple equilibria in service levels. Crucially, none of the three event types produce detectable increases in aggregate trip volumes measured by smartphone data, implying the ridership gains reflect modal substitution toward the bus rather than trip generation.

Second, the authors estimate a structural demand model. At its core is a route-choice model in which bus arrivals follow independent Poisson processes, so wait times are exponentially distributed and idiosyncratic. This formulation avoids the red-bus/blue-bus aggregation problem endemic to logit models. Commuters are also allowed to be partially inattentive to routes whose travel time exceeds the fastest available option by more than an estimated threshold. Structural parameters are recovered by classical minimum distance, matching seven reduced-form moments. Key findings: wait time is valued 2.4 times more than time on the bus for BRT routes, and 4.2 times more for non-BRT routes. There is no additional transfer penalty beyond the wait time and travel time costs of the second leg. Commuters pay significantly less attention to options with travel time more than roughly 34–44 percent above the fastest option in their choice set.

Third, the authors use the estimated preference parameters to characterize optimal bus networks. Because the optimization problem is high-dimensional (418 grid cells, 1,536 possible edges, yielding on the order of 10^500 configurations) and exhibits neither global convexity nor simple complementarity, they reformulate the social planner’s problem as a discrete choice over networks with additive logit shocks — effectively sampling from a multinomial logit distribution via simulated annealing. The result: optimal networks cover approximately 66 percent of grid cells versus 42 percent under the actual TransJakarta network, and would give 91 percent of Jakarta residents bus access versus 73 percent currently. Bus frequency in the city center is somewhat lower in the optimal network. Despite commuters’ high sensitivity to wait times, the current network concentrates too many buses in the city center where wait times are already short, rather than extending reach to underserved areas. Comparative statics show that doubling the wait-time cost parameter produces much more concentrated optimal networks (23 percent of origin-destination pairs connected, 41 percent fewer than baseline), while increasing the transfer penalty by the equivalent of 15 minutes of wait time raises the direct-connection share of served pairs from 12 to 16 percent.

Q: What are the three event types and why are they analytically distinct?

A: Event 1 is the launch of the first direct route between an origin-destination pair already connected by transfer, where the direct route is not faster than the existing transfer option; it isolates the effect of directness absent a travel-time change. Event 2 is the same but with a faster direct route (average reduction of 0.29 log points in travel time), combining directness and speed improvements. Event 3 is the launch of a new route that overlaps an existing direct route, increasing bus frequency and cutting wait times (arrival rate up 0.32 log points) without substantially changing travel time or directness. The three events together provide variation across the key dimensions — directness, speed, and frequency — needed to separately identify commuter preference parameters.

Q: What are the main ridership effects and how large are they in levels?

A: For BRT routes, Event 1 raises ridership by 0.16 log points (approximately 19 additional riders per week for a treated origin-destination pair with a baseline of 111 weekly riders), Event 2 by 0.27 log points (approximately 24 additional riders per week), and Event 3 by 0.09 log points (approximately 20 additional riders per week). For non-BRT routes, proportional effects are larger but level effects are similar: Event 1 yields roughly 34 additional weekly riders, Event 2 roughly 21, and Event 3 roughly 15. Event-study graphs show clear, discrete jumps in ridership at route launch with no pre-trends, and some gradual adjustment in the months following.

Q: What does the paper find about aggregate trip generation versus modal substitution?

A: Using smartphone location data to measure all trips regardless of mode, the authors find no statistically significant increase in aggregate trip volumes for any of the three event types. For BRT Event 1, the estimated aggregate-trip coefficient is −0.008 with a standard error of 0.051, allowing rejection at the 95 percent level of any positive impact above roughly 0.091 log points — small relative to the precise 0.11 log-point bus ridership effect in the same sample. The authors interpret this as evidence that the ridership gains over the 10-month post-event window reflect substitution from private modes (motorcycles, cars, taxis) toward TransJakarta rather than trip generation, and they use this null result to justify holding destination choices fixed in the structural model.

Q: How does the model avoid the red-bus/blue-bus aggregation problem?

A: The paper’s route-choice model assumes bus arrivals follow independent Poisson processes, so wait times are exponentially distributed. A key proposition (Proposition 1) proves that splitting one route into two identical routes with half the buses each produces exactly the same choice probabilities and expected utility as the original single route — because the sum of two independent Poisson processes is itself Poisson with the summed rate. Standard logit models fail this invariance because splitting a route creates two options with independent error draws, artificially inflating expected utility. The invariance property is essential for the optimal network design exercise, where the planner freely reallocates buses across routes.

Q: What are the estimated preference parameters and what do they imply about commuter behavior?

A: The paper estimates that wait time is valued 2.4 times more than time on the bus for BRT routes and 4.2 times more for non-BRT routes. There is no additional transfer disutility beyond the wait time and travel time costs implied by the extra leg. Commuters become substantially inattentive to routes with travel time more than approximately 34 percent above the fastest available option (BRT threshold) or 44 percent (non-BRT). The high relative cost of waiting versus riding reflects both the discomfort of waiting at exposed non-BRT stops and the fact that TransJakarta runs without a published schedule, so commuters cannot minimize wait time by timing arrivals.

Q: What explains the non-BRT wait-time elasticity exceeding −1?

A: For non-BRT routes, Event 3 raises ridership by 0.450 log points while raising the bus arrival rate by 0.425 log points, yielding an implied elasticity of ridership with respect to wait times of −1.05. Because the baseline arrival rate for non-BRT treated pairs is 2–4 times lower than for BRT pairs, the absolute reduction in wait time per additional bus is much larger. An elasticity exceeding −1 in absolute value implies that adding buses on some non-BRT routes could increase ridership enough to maintain or even raise average ridership per bus — the extreme form of the Mohring effect — suggesting the possibility of a high-ridership/low-wait-time equilibrium distinct from the current low-ridership/high-wait-time one.

Q: How is the optimal network characterized and what algorithm is used?

A: The social planner chooses a network to maximize utilitarian welfare (average expected utility across all commuters) from the estimated demand model, plus a network-level logit shock capturing cost and other factors outside the model. This transforms the combinatorially explosive optimization into sampling from a multinomial logit distribution over networks, which the authors approximate using simulated annealing. They run the algorithm multiple times to obtain a sample of networks drawn asymptotically from the planner’s distribution, then estimate optimal network characteristics and comparative statics from sample analogs. The theoretical framework is general and, the authors note, applicable to other high-dimensional spatial planning problems where welfare differences can be computed for pairs of counterfactuals.

Q: How does the optimal network differ from the current TransJakarta network?

A: The typical optimal network covers approximately 66 percent of 2km grid cells versus 42 percent for the actual network, and 91 percent of Jakarta residents would have bus access versus 73 percent currently. The optimal network reduces bus frequency in the city center relative to the current network, accepting longer wait times there in order to extend reach to peripheral areas. The paper finds no tension between distributional and efficiency concerns in this setting — expanding coverage improves both aggregate welfare and access for underserved areas.

Q: What do the comparative statics reveal about the sensitivity of optimal network design to preference parameters?

A: Doubling the wait-time cost parameter leads to substantially more concentrated optimal networks: only 23 percent of origin-destination pairs are connected, 41 percent fewer than in the baseline optimal network. This is because higher wait-time costs make it more valuable to concentrate buses on fewer routes to achieve short headways. Increasing the transfer penalty by the equivalent of 15 minutes of wait time raises the share of connected location pairs with a direct (non-transfer) connection from 12 to 16 percent. These comparative statics link micro-level preference parameters to macro-level network topology, clarifying which parameters most influence design choices.

Q: How does the paper validate the destination imputation from tap-in-only smart card data?

A: For the subset of BRT stations where tap-out is enforced (36 percent of stations), the authors estimate bivariate regressions of imputed daily ridership shares against actual observed ridership shares, obtaining R-squared of 0.85. They also show robustness by varying the grid cell size from 500 meters to 2 kilometers, finding no systematic decline in treatment effect magnitudes, which rules out large displacement effects within the network as an explanation for the results.

Q: Does the response to network improvements vary by local poverty rates?

A: The authors interact all six event types with an indicator for above-median poverty rate at the origin grid cell (from SMERU 2014 data), controlling for population. They find no clear pattern of heterogeneity by income level — richer and poorer areas respond similarly to service improvements. The paper notes this absence of heterogeneity as relevant context for interpreting optimal network design: the case for extending reach is not offset by a differential preference for frequency among poorer commuters.

Mohring Effect: The externality arising from ridership responsiveness to wait times — more riders justify more buses, which reduce wait times for all riders, further increasing ridership. The paper estimates a BRT wait-time elasticity of −0.29, confirming the effect operates in Jakarta; for non-BRT the elasticity of −1.05 suggests the possibility of multiple equilibria in service levels.

Negative Exponential Distribution Model (Daganzo 1979): The route-choice model used in the paper, in which bus arrivals on each route follow independent Poisson processes and wait times are exponentially distributed. The model is invariant to aggregation of identical routes (avoids the red-bus/blue-bus problem) and yields tractable closed-form expressions for choice probabilities and expected utility.

Partial Inattention: The model feature whereby commuters assign near-zero effective arrival rates to bus options whose travel time exceeds the fastest available option by more than an estimated threshold (34–44 percent depending on route type). Captures the empirical finding that commuters in a large, complex network do not appear to consider all available options.

Event Types (1, 2, 3): The paper’s taxonomy of service improvements induced by new route launches. Event 1 isolates the value of directness (new direct route, no speed gain). Event 2 combines directness and speed (new direct route that is also faster). Event 3 isolates the value of frequency (additional buses on an already-direct route, reducing wait time without changing travel time).

Optimal Network Characterization via Social Planner’s Logit: The paper’s approach to the combinatorially intractable network optimization problem. The planner is modeled as making a logit discrete choice over all possible networks, with welfare from the demand model plus a network-level idiosyncratic shock. Sampling via simulated annealing yields estimates of optimal network characteristics and comparative statics without requiring identification of a single globally optimal network.

Network Concentration vs. Extensiveness Tradeoff: The core design tension the paper formalizes — for a fixed bus fleet, concentrating buses on fewer routes reduces wait times on served routes but leaves more areas without coverage, while spreading buses across more routes extends reach at the cost of longer headways. The estimated preference parameters (high wait-time sensitivity) make this tradeoff non-trivial; nonetheless, the paper finds the current network is too concentrated relative to the optimum.

Policy Biases in a Model with Labor‐Market Frictions

Mon, 01 Jan 0001 00:00:00 +0000

Layer 1 — Overview

Research Question

Dennis and Kirsanova ask whether shocks to labor-market matching efficiency and worker bargaining power pose a significant problem for monetary policy, and whether the inability to commit (discretion versus commitment) generates important stabilization bias in a model with labor-market matching frictions. They also examine how several popular simple monetary policy rules perform in response to these and other shocks.

Model and Methodology

The paper develops a fully nonlinear DSGE model featuring: (1) a goods market characterized by monopolistic competition and Rotemberg-style quadratic price-adjustment costs; and (2) a labor market characterized by a constant-returns-to-scale matching function (Mortensen-Pissarides) and Nash bargaining over wages and hours worked. Because the flex-price equilibrium is inefficient — owing to both monopolistic competition and the matching friction — a linear-quadratic approximation is not valid for the discretionary policy problem, and the authors solve the model using Smolyak sparse-grid methods with Chebyshev polynomial basis functions.

The model is calibrated to quarterly U.S. data. Key parameter values include: discount factor β = 0.99 (annualized real interest rate ≈ 4 percent), elasticity of substitution across goods ε = 11 (steady-state markup of 10 percent), price-adjustment cost φ = 80, quarterly separation rate δ = 0.12, job-finding rate f = 0.65 (delivering an employment rate close to 0.94 and an unemployment rate near 5.95 percent in steady state), elasticity of matching function with respect to unemployment ξ = 0.72, and workers’ mean bargaining power equal to ξ = 0.72 (satisfying the Hosios condition at steady state). Five AR(1) shocks are included: aggregate technology (persistence 0.95, standard deviation 0.008), matching efficiency (persistence 0.80, standard deviation 0.032), bargaining power (persistence 0.80, standard deviation 0.028), consumption preference (persistence 0.70, standard deviation 0.006), and elasticity of substitution (persistence 0.85, standard deviation 0.12).

Main Findings

The central finding is that optimal monetary policy — whether conducted under commitment (Ramsey) or discretion — is highly efficient at responding to labor-market shocks, producing impulse responses that closely replicate the flex-price equilibrium for real variables. Specifically, in response to matching efficiency shocks and bargaining power shocks, the commitment and discretionary equilibria both track the flex-price equilibrium closely for output, consumption, employment, tightness, and the real wage.

Discretion generates a pronounced inflation bias of approximately 1.82 percent per annum — large but not implausible — but does not generate a meaningful stabilization bias for the class of shocks studied (technology, matching efficiency, bargaining power, and consumption preference). The one exception is the elasticity of substitution shock (analogous to a markup shock in linearized models): for this shock, the impulse responses under discretion diverge noticeably from those under commitment, revealing a discretionary stabilization bias — consistent with conventional New Keynesian results.

Regarding simple rules, strict inflation targeting (SIT) performs closely in line with commitment and discretion for all shocks. The two Taylor-type rules — one responding to inflation and output growth, the other to inflation and the unemployment rate — generate substantially greater volatility in inflation and the nominal interest rate relative to optimal policy. The unemployment-gap Taylor rule is the worst performer among the three simple rules; nevertheless, all three simple rules produce household welfare outcomes close to those under optimal monetary policy. The suboptimality of the simple rules is most evident in nominal variables, particularly inflation and the nominal interest rate, and less evident in real variables — though labor-market inefficiencies under the Taylor-type rules do emerge in response to matching efficiency and bargaining power shocks, with hours worked and the real wage deviating noticeably from flex-price outcomes.

The probability of encountering the zero lower bound is, for all policies considered, considerably less than 0.5 percent across one million simulated observations, suggesting that ZLB concerns are not material for the shocks under study.

Scope Conditions

These results hold within the context of a model with a fixed labor force (no participation margin), balanced-budget fiscal authority, no capital accumulation, and Nash bargaining over both wages and hours. The Hosios condition is satisfied at steady state (though the authors report that relaxing it has little effect on results). The analysis abstracts from the zero lower bound constraint when solving the model.

In depth

Q1. What is the Hosios condition and what role does it play in this model?

The Hosios condition requires that workers’ bargaining power equal the elasticity of matches with respect to unemployment in the matching function (ξ = 0.72). When the condition holds, bargaining is efficient in the sense that the decentralized search equilibrium replicates the social planner’s allocation. The authors impose it at steady state (mean bargaining power & = ξ = 0.72) so that the flex-price equilibrium is distorted only by monopolistic competition, not by inefficient search. The authors state they also analyzed versions where the Hosios condition does not hold and found it had little effect on results.

Q2. How are matching efficiency shocks transmitted through the economy, and how does optimal policy respond?

An improvement in matching efficiency raises the rate at which vacancies are filled and the unemployed find jobs, increasing employment from existing vacancy and unemployment levels. Employment rises, unemployment falls, labor market tightness increases, and the real wage rises. Firms substitute toward more workers (extensive margin) and away from hours-per-worker (intensive margin), so hours worked per employee decline even as aggregate hours rise. Both commitment and discretion track the flex-price equilibrium closely for all these real variables. Some difference is visible in inflation: under discretion the real wage rises by more than under commitment, pushing real marginal costs and inflation higher in the short run.

Q3. How does a bargaining power shock affect the economy under optimal monetary policy?

An increase in worker bargaining power shifts the match surplus toward workers, raising real wages and hours worked per employee. Firms, receiving a smaller surplus share, post fewer vacancies and hire fewer workers, leading to a decline in employment, a fall in labor market tightness, and a rise in unemployment. The employment decline is large enough to lower household income, goods production, and aggregate consumption. Under both commitment and discretion, the real economy tracks the flex-price equilibrium closely. Notable differences between commitment and discretion appear in inflation: under discretion, the inflation response on impact is larger and more persistent than under commitment, and monetary policy tightens more aggressively (higher nominal rate) under discretion.

Q4. What is the key difference between the commitment and discretionary equilibria, and why is stabilization bias mostly absent?

Commitment (Ramsey) policy differs from discretionary policy primarily in the level of inflation, not in the dynamics of the real economy. Discretion generates an inflation bias of approximately 1.82 percent per annum. However, the impulse responses for real variables (output, consumption, employment, tightness, real wage) under commitment and discretion are very similar to each other and to the flex-price equilibrium for four of the five shocks. This indicates that forward guidance — which commitment provides and discretion does not — is not an important factor in this model’s response to these shocks. The intuition is that the economy’s fluctuations in response to matching efficiency and bargaining power shocks are largely efficient, so the central bank needs only to avoid creating additional distortions, which both commitment and discretion achieve.

Q5. What distinguishes the elasticity of substitution shock from the other shocks in terms of policy performance?

The elasticity of substitution shock behaves similarly to a markup shock in linearized models: an increase in substitutability reduces firms’ monopolistic power, lowers the price markup, raises output and consumption, increases hours worked, posted vacancies, employment, and the real wage. For this shock, the impulse responses under discretion diverge noticeably from those under commitment — the decline in inflation is larger and more persistent under discretion than under commitment, and the nominal interest rate response differs in sign across policies. This is the only shock in the model for which a meaningful discretionary stabilization bias is evident, consistent with conventional wisdom from linearized New Keynesian models that markup shocks generate stabilization bias.

Q6. How do the three simple rules compare with optimal policy for labor-market shocks?

Strict inflation targeting (SIT) behaves similarly to commitment and discretion and hence closely replicates the flex-price equilibrium for all five shocks. The two Taylor-type rules — one responding to inflation and output growth (parameterized with φ_π = 2.5, φ_y = 0.5/4) and one responding to inflation and the unemployment rate (φ_π = 2.5, φ_u = 1.5/4) — both generate substantially more volatility in inflation and the nominal interest rate relative to optimal policy. The unemployment-gap Taylor rule generally results in inflation moving more in response to shocks and in the economy returning more slowly to baseline, making it the worst-performing simple rule. However, all three simple rules produce welfare outcomes close to those under optimal policy; the suboptimality of the Taylor-type rules is most evident in nominal rather than real variables.

Q7. Does the zero lower bound (ZLB) pose a concern under any of the policies studied?

Based on simulating one million observations from each model, the unconditional probability of encountering the ZLB is very small — well below 0.5 percent — for all policies considered. The commitment policy has a ZLB probability of approximately 0.077 percent, reflecting its near-zero average inflation. Discretion’s positive inflation bias of 1.82 percent reduces the ZLB probability to approximately 0.001 percent. The Taylor-type rules — especially the unemployment-gap rule (ZLB probability approximately 0.296 percent) — have higher probabilities than discretion, though these remain very small. These results suggest that for the shocks analyzed, violations of the ZLB are extremely unlikely.

Q8. What are the steady-state and stochastic simulation mean outcomes, and how do they compare across regimes?

The deterministic steady-state unemployment rate is approximately 5.95 percent, rising slightly to a mean of 6.04 percent in the stochastic flex-price economy. The stochastic means for output, consumption, employment, and the real wage are all slightly below their deterministic steady states across all regimes, because in the absence of capital households respond to increased volatility by substituting away from labor toward leisure (precautionary leisure) rather than precautionary saving. Mean outcomes for real variables under discretion (e.g., output mean ≈ 0.3730, unemployment mean ≈ 6.025 percent) and commitment (output mean ≈ 0.3729, unemployment mean ≈ 6.028 percent) are very similar to each other and to the flex-price means (output mean ≈ 0.3728, unemployment mean ≈ 6.038 percent). The key difference is in inflation: commitment delivers near-zero mean inflation (≈ 0.00043 percent annually) while discretion delivers ≈ 1.82 percent annually.

Q9. Why is a nonlinear solution method used, and what does this allow the paper to capture that log-linearized approaches cannot?

The nonlinear solution is required because the flex-price equilibrium is not efficient (monopolistic competition and the matching friction both create distortions), so the discretionary policy problem cannot be formulated as a linear-quadratic problem. The nonlinear approach allows the paper to analyze both level biases (the steady-state inflation bias) and stabilization biases (the dynamic response to shocks) in a unified framework — something that log-linearization around the efficient steady state would preclude. Related papers by Furlanetto and Groshenny (2016) and Zhang (2017) focus on log-linearized models and the natural rate of unemployment; this paper focuses instead on optimal policy and policy biases.

Q10. What role does the consumption preference shock play, and how does it differ from the other shocks?

The consumption preference shock is the only shock in the model that acts somewhat like a demand shock. A one standard deviation increase raises the utility obtained from consumption, leading households to increase consumption and hours worked (at a slightly lower real wage), which induces firms to post more vacancies and raise employment. Most of the labor market response comes through higher hours rather than higher employment. Both commitment and discretionary policy cope well with this shock — the real economy closely tracks the flex-price equilibrium — because the shock has relatively little impact on inflation (inflation declines slightly due to lower real marginal costs from the lower real wage). The nominal interest rate rises because the increase in the real interest rate (driven by households’ desire to borrow) more than offsets the decline in inflation.

Key Concepts

Matching efficiency shock: A stochastic shock to the parameter mt in the constant-returns-to-scale matching function Mt = mt * u_t^xi * v_t^(1-xi), which governs the overall rate at which unemployed workers and posted vacancies are matched. A decline in mt reduces the number of matches formed at any given levels of unemployment and vacancies, raising unemployment and reducing employment. The paper treats this as an empirically relevant shock motivated by evidence of a sustained decline in aggregate matching efficiency during the Great Recession.

Discretionary inflation bias: The tendency for a central bank conducting policy without the ability to commit to produce systematically higher inflation than would occur under a commitment (Ramsey) regime. In this model, discretion generates an annualized inflation rate of approximately 1.82 percent, while commitment produces near-zero average inflation. This reflects the time-inconsistency problem (Kydland and Prescott, 1977; Barro and Gordon, 1983) arising from the interaction of monopolistic competition and price stickiness.

Stabilization bias: A distortion that arises under discretionary policy, in which the central bank’s inability to commit leads it to respond to shocks in a manner that departs from optimal commitment responses, producing suboptimal dynamics for real variables in addition to the inflation bias. In this paper, stabilization bias is found to be largely absent for matching efficiency, bargaining power, technology, and consumption preference shocks, but is present for the elasticity of substitution shock.

Hosios condition: The condition, derived in Hosios (1990), that efficient decentralized search-and-matching equilibrium requires workers’ bargaining power to equal the elasticity of matches with respect to the unemployment rate (ξ). In the paper’s notation: & = ξ. When the condition holds, the flex-price equilibrium replicates the social planner’s allocation in the labor market; deviations cause either excessive or insufficient vacancy posting.

Labor market tightness (θ): Defined as the ratio of vacancies to unemployed searchers, θt = vt/ut. When tightness is high, the labor market is tight and firms have difficulty filling vacancies (low job-filling rate q(θ)) while workers find jobs easily (high job-finding rate f(θ)). Tightness is the key state variable linking vacancy posting decisions by firms to employment dynamics and wage bargaining outcomes.

Bargaining power shock: A stochastic shock to the worker’s share of the Nash bargaining surplus (&t), which follows an AR(1) process. The Hosios condition holds at steady state but is violated when the shock is realized. A positive shock shifts surplus from firms to workers, raising real wages, depressing vacancy posting, and reducing employment, while a negative shock has the reverse effect.

Rotemberg price-adjustment cost: A quadratic cost φ/2 * (π_t)^2 * y_t paid by firms when they change prices, creating price stickiness without the “menu cost” lumpiness of Calvo pricing. This creates a role for monetary policy and generates a nonlinear Phillips curve. The coefficient φ is set to 80, based on the estimate in Ireland (2001).

Flex-price equilibrium: The benchmark equilibrium in which prices are fully flexible and bargaining is efficient (Hosios condition satisfied exactly). In this equilibrium there is no role for monetary policy over the price-adjustment margin, and the economy responds to shocks in a manner that is efficient conditional on the remaining frictions (monopolistic competition and the matching friction). The paper uses deviations of commitment and discretionary outcomes from this benchmark to measure the efficiency of optimal monetary policy.

Professional Motivations in the Public Sector: Evidence from Police Officers

Mon, 01 Jan 0001 00:00:00 +0000

This paper studies how public sector workers balance professional motivations against private economic concerns, using arrest decisions by Dallas Police Department (DPD) officers as the empirical laboratory. The central institutional feature exploited is that arrests made near the end of an officer’s shift typically require the officer to stay and work overtime, generating private costs that must be weighed against the professional benefits of making an arrest (e.g., crime reduction or duty fulfillment). The paper further leverages variation from DPD’s “secondary employment” program: approximately 30% of officers held a registered second job at some point during 2019–2021, and on days when a second job is scheduled after the police shift, the opportunity cost of late-shift policing is higher.

The data cover all DPD arrests from January 2015 to December 2021, linked to officer shift assignments, charge types, prosecutorial outcomes (whether the Dallas County Attorney chose to prosecute), and second-job schedules. The sample excludes traffic violations and arrests without shift information. The authors observe wide variation in prosecution rates by charge type: drug and gang offenses exceed 70%, property and violent crimes run 30–50%, and minor charges fall below 20%.

Four main findings emerge. First, arrest rates fall sharply in the last 30–40 minutes of a shift, with the decline most pronounced for drug and gang charges (approximately 50% drop in arrest rate) and smallest for violent charges, consistent with officers having more discretion over the former. Second, arrests that do occur late in the shift are of higher quality: conditional on being made, they are approximately 1.5–2.5 percentage points more likely to result in prosecution than arrests made earlier, with the quality premium larger in more discretionary charge categories (drugs/gang > property > violent). Third, on days when an officer has a second job scheduled, arrest rates are lower by roughly 5–10% relative to baseline across the full shift, with effects concentrated in the second half; and the conditional probability of prosecution on those days is 1–2 percentage points higher than on non-second-job days. The second-job effect appears even earlier in the shift than the overtime effect alone, consistent with the second job magnifying the opportunity cost mechanism.

Fourth, the authors estimate a dynamic structural model of the arrest decision. At each moment of the shift the officer chooses whether to arrest, trading off a professional benefit b_p against a private cost c(t, secondjob) that rises when overtime begins and rises further on second-job days. Structural estimates indicate the overtime cost is large enough to reduce the expected professional value of an arrest in the final 30 minutes of the shift by roughly 20–30%. The additional second-job cost reduces expected professional value by a further 10–20%. Counterfactual simulation implies that eliminating the overtime cost would increase overall arrests by approximately 5–8%, a magnitude the authors describe as economically significant. Welfare analysis shows that the desirability of high overtime costs depends on whether citizens weight quantity of arrests or quality: under quality-weighted preferences the current overtime-cost regime may be socially optimal because officers self-select toward arrests they perceive as likely to result in prosecution; under quantity preferences, reducing overtime costs would increase police activity.

The identification strategy relies on within-officer variation in second-job scheduling, absorbing officer fixed effects (and officer-by-month fixed effects in robustness checks) and time fixed effects. The key identifying assumption is that second-job days are not systematically assigned to low-crime or low-patrol days. Supporting evidence includes balance tests showing second-job status is uncorrelated with local crime call patterns conditional on fixed effects, and the observation that officers who take second jobs do not exhibit a systematically different enforcement style (measured by arrest patterns across the shift) relative to officers who do not.

Scope conditions: results are from a single medium-sized urban police department (approximately 3,000 officers) in Dallas, Texas, a city described as diverse by race, income, and political affiliation. The department is 29% Black, 43% Hispanic, 27% White, and 15% female. Generalizability to other jurisdictions or institutional structures is not established by this study.

Q: What is the main research question? A: The paper asks how public sector workers balance professional motivations (e.g., crime reduction, duty fulfillment) against private economic concerns (e.g., overtime costs, opportunity costs from second jobs). It uses police arrest decisions as the empirical setting because the shift-end timing of arrests generates a clear, observable private cost that varies within officer across days.

Q: What is the key institutional feature that generates identification? A: Arrests made near the end of a shift typically require the arresting officer to stay past the shift and work overtime. This creates a personal cost — more time, delayed transition to off-duty activities — that makes late-shift arrests more costly without changing their professional value. The DPD secondary employment program adds a second source of variation: on days when an officer has a registered second job scheduled after the police shift, the opportunity cost of any arrest (and especially a late-shift arrest) is higher.

Q: How large is the drop in arrest rates near shift end? A: The baseline arrest rate declines by approximately 0.12 percentage points per six-minute time bucket in the last 30 minutes of the shift, or about 5% relative to the mean arrest rate of 2.3 percentage points. The drop is most dramatic for drug and gang charges, where the arrest rate falls by approximately 50%, and smallest for violent charges, where officers appear to arrest regardless of shift timing.

Q: How does arrest quality change near shift end? A: Arrests made in the last 30 minutes of a shift are approximately 1.5–2.5 percentage points more likely to result in prosecution than arrests made earlier in the shift, after controlling for charge type composition and officer fixed effects. The quality premium is larger in more discretionary charge categories (drugs/gang, then property, then violent), consistent with officers becoming more selective to avoid overtime costs on arrests unlikely to result in prosecution.

Q: Does the shift-end drop reflect officer fatigue or overtime cost? A: The paper argues both pieces of evidence point to overtime cost rather than fatigue alone. First, arrest rates increase sharply after the official shift end when the officer is already earning overtime pay — if fatigue were the mechanism, arrests would also decline post-shift. Second, on second-job days arrest rates fall earlier in the shift and by more, consistent with higher opportunity costs rather than accumulated fatigue.

Q: What is the effect of having a second job scheduled on arrest rates? A: Having a second job scheduled reduces arrest rates by roughly 5–10% relative to the baseline across the full shift, with effects concentrated in the second half. The reduction is even larger in the final 30 minutes, consistent with the second job amplifying the overtime cost mechanism.

Q: What is the effect of second-job days on arrest quality? A: Arrests made on second-job days are 1–2 percentage points more likely to result in prosecution compared to arrests on non-second-job days, after controlling for time of day, charge type composition, and officer fixed effects. This parallels the shift-end quality effect and is consistent with officers applying higher selectivity thresholds when opportunity costs are elevated.

Q: How is the second-job variation used for identification? A: The main specification compares the same officer’s behavior on shifts where a second job is scheduled versus shifts where it is not, absorbing officer fixed effects and time fixed effects. The identifying assumption is that second-job scheduling is uncorrelated with unobservable determinants of enforcement intensity conditional on fixed effects. The authors support this with balance tests showing second-job status is not predicted by lagged activity measures or contemporaneous crime call patterns.

Q: What does the dynamic structural model add? A: The structural model formalizes the arrest decision as a dynamic problem where the officer compares the professional benefit b_p of an arrest to the private cost c(t, secondjob), which rises discontinuously when overtime begins and rises further on second-job days. Estimating the model by matching moments (baseline arrest rates, shift-timing patterns, quality changes, second-job effects) yields preference parameters. The model enables counterfactual and welfare analysis that the reduced-form estimates alone cannot provide.

Q: What are the structural estimates of overtime and second-job costs? A: The overtime cost c_ot is estimated to be large enough that arresting someone in the final 30 minutes of the shift reduces the expected professional value of that arrest by roughly 20–30%. The additional second-job cost c_sj reduces expected professional value by a further 10–20%. Both estimates are described as statistically precise.

Q: What does the counterfactual removal of overtime costs imply for arrests? A: Eliminating the overtime cost is estimated to increase overall arrests by approximately 5–8%, which the authors characterize as economically significant. This implies that officers’ private costs have a first-order impact on the quantity of law enforcement activity.

Q: What does the welfare analysis conclude about overtime costs? A: The welfare effect of eliminating overtime costs depends on citizen preferences. Under quality-weighted preferences — where citizens value the probability that an arrest results in prosecution — the current overtime-cost regime may be socially optimal because it induces officers to self-select toward arrests they perceive as likely to stick. Under quantity preferences — where citizens value the total number of arrests per period — reducing overtime costs would increase police activity and benefit citizens.

Q: What are the scope conditions of the study? A: The study is conducted entirely within the Dallas Police Department, a single medium-sized urban department with approximately 3,000 officers. Dallas is described as a diverse city by race, income, and political affiliation, and the department itself is relatively diverse (29% Black, 43% Hispanic, 27% White, 15% female). The findings may not generalize to departments with different overtime rules, labor contracts, or institutional cultures.

professional motivations: The non-pecuniary benefits officers derive from making arrests, such as crime reduction, duty fulfillment, or the legitimacy of their work; modeled as a professional benefit b_p that motivates arrest independent of financial compensation.

private costs of arrest: The personal costs borne by officers when making an arrest, chiefly the overtime cost when an arrest extends the shift past its scheduled end, and the opportunity cost on days when a second job is scheduled. These costs are distinct from professional motivations and respond to economic incentives.

arrest quality: The conditional probability that an arrest results in prosecution by the Dallas County Attorney’s office; used as a revealed-preference measure of the officer’s assessment of arrest strength. Higher arrest quality near shift end reflects greater selectivity under elevated private costs.

secondary employment (second job): A formal DPD program allowing officers to register as certified police officers for private security work after their primary shift. Approximately 30% of DPD officers held a second job at some point during 2019–2021. The scheduled second job raises the opportunity cost of late-shift primary-shift arrests and provides a second source of variation in private costs.

overtime cost: The cost incurred when an arrest requires an officer to remain past the end of the scheduled shift to complete paperwork and processing. Modeled as c_ot per period spent in overtime, this cost is the primary mechanism reducing late-shift arrest rates and increasing arrest selectivity.

dynamic model of arrest decisions: A structural model in which officers decide each moment whether to arrest, balancing professional benefit against private cost as a function of shift timing and second-job status. Estimated by minimum distance on moments from the data; used to recover preference parameters and conduct counterfactual welfare analysis.

Racial Disparities in Housing Returns

Mon, 01 Jan 0001 00:00:00 +0000

Layer 1 — Overview

Research Question

This paper estimates the racial/ethnic gap in realized housing returns using administrative data on individual housing transactions, and investigates the mechanisms that generate those gaps. The central question is: why do Black and Hispanic homeowners accumulate less housing wealth than White homeowners, even as minority homeownership rates have risen substantially over the last century?

Data and Methodology

The authors merge three primary data sources. First, a nationwide panel of residential property records from ATTOM covering 146.8 million arm’s-length home purchases from 1990 to 2020, which records transaction prices, mortgage characteristics, and property-level identifiers. Second, Home Mortgage Disclosure Act (HMDA) records, which contain self-reported race and ethnicity for mortgage applicants. Third, supplementary administrative sources including McDash mortgage servicing records, Equifax credit bureau data, Fannie Mae/Freddie Mac/ABSNet modification records, and the Survey of Income and Program Participation (SIPP). After applying sample restrictions — including requiring an observed purchase price, a linked HMDA record, an arm’s-length repeat sale, a combined loan-to-value ratio of at most 102.5%, and an ownership spell of at least 12 months — the baseline analysis sample comprises 13.6 million ownership spells for Black, Hispanic, and White homeowners who purchased homes with a mortgage between 1990 and 2016 in 40 states. Ownership spells unsold by March 2020 have their value imputed using the FHFA county-level house price index, a procedure that is conservative in that it understates racial gaps.

The authors construct two complementary return measures. The unlevered return compares the annualized ratio of sale price to purchase price. The levered return (internal rate of return) sets the net present value of all homeowner cash flows — down payment, monthly mortgage payments, implicit rent, maintenance, taxes, insurance, transaction costs, and limited liability in foreclosure — equal to zero.

Main Findings

Among mortgaged home purchases, mean annual unlevered returns are 0.5% for Black homeowners, 0.6% for Hispanic homeowners, and 2.8% for White homeowners, implying Black-White and Hispanic-White gaps of approximately 2.3 percentage points per year. Mean annual levered returns are 1.6%, −3.0%, and 6.6% for Black, Hispanic, and White homeowners respectively, yielding gaps of 5.0 and 9.6 percentage points. After adjusting for the approximately one-fourth of purchases made in cash (for which no racial gap is found), preferred estimates of the unlevered gap are 1.9 (Black-White) and 1.4 (Hispanic-White) percentage points.

Distressed sales — foreclosures and short sales — statistically account for the entire gap in returns. Within non-distressed sales, the Black-White gap in annual unlevered returns falls to less than 40 basis points, and the Hispanic-White gap reverses sign. Two distinct factors drive the role of distressed sales: (1) Black and Hispanic homeowners are approximately twice as likely as White homeowners to experience a distressed sale, and (2) minority homeowners live in neighborhoods where distressed sale price discounts are larger — estimated at 39%–40% for Black and Hispanic homeowners versus 28% for White homeowners. A Blinder-Oaxaca decomposition indicates that equalizing distressed sale rates (holding the distressed sale penalty fixed) would eliminate 84.6% of the Black-White unlevered returns gap and 133.6% of the Hispanic-White gap, confirming that the frequency margin dominates the severity margin.

A counterfactual wealth-accumulation exercise using PSID data shows that equalizing housing returns reduces the Black-White gap in housing wealth at retirement by 37%. Equalizing first-time purchase rates reduces the gap by only 1%, illustrating that promoting homeownership without addressing the returns gap is largely ineffective. Equalizing both returns and purchase rates reduces the gap by 49%.

Mechanisms

Approximately one-third of the gap in unlevered returns can be explained by purchase year and county fixed effects, with much of this timing effect attributable to the Great Recession. Controlling additionally for income, family structure, gender, and leverage reduces the gap by a further ~0.3 percentage points, leaving a substantial residual. About half of the racial gap in mortgage default can be attributed to observable credit risk (family structure, income, leverage, credit score). The remainder is associated with unobservable liquidity shortfalls and income instability: median liquid wealth among Black and Hispanic homeowners is $2,400 and $5,400 respectively, and minority homeowners are 2–4 percentage points more likely to transition to unemployment conditional on pre-unemployment income. Using quasi-experimental variation from adjustable-rate mortgage resets, the paper shows that in response to a 10% increase in monthly payments, White homeowners increase 90-day mortgage default by 3.0 percentage points after 12 months, while Black and Hispanic homeowners show increases of 4.5 and 7.1 percentage points respectively — excess sensitivity that is not captured by credit scores. The early-2000s credit supply expansion through private securitization and portfolio lending channels (as distinct from GSE/FHA) contributed to 61.5% of the 6.2-percentage-point increase in the Black-White distressed-sale gap between the 2002 and 2006 purchase cohorts, and 52.0% of the 12.2-percentage-point increase in the Hispanic-White gap. Evidence from the National Survey of Mortgage Originations suggests that Black homeowners hold overoptimistic expectations about future house price growth and income growth relative to their realized outcomes, which may explain why high-risk minority households do not self-select out of homeownership.

Scope Conditions

Results pertain to mortgaged home purchases (approximately three-fourths of all purchases) by Black, Hispanic, and White homeowners in 40 states (non-disclosure states excluded), with primary coverage from 2000 to 2016. No racial gap in returns is found for cash purchases. The racial gap in non-distressed returns is small and not economically meaningful, so the findings specifically pertain to the realized-return distribution that includes the distressed-sale tail.

In depth

Q1. How large is the racial gap in housing returns, and how does it compare to previously documented racial disparities in housing costs?

A: Among mortgaged purchases, Black and Hispanic homeowners each realize annual unlevered returns approximately 2.3 percentage points lower than White homeowners; levered return gaps are 5.0 percentage points (Black-White) and 9.6 percentage points (Hispanic-White). In dollar terms, this translates to a difference of roughly $5,920 per year for the average Black homeowner and $6,762 per year for the average Hispanic homeowner on a ten-year holding horizon. These gaps are an order of magnitude larger than previously documented racial disparities in housing costs, such as post-origination interest rate disparities of about 40 basis points (~$500 annually for a $200,000 home) or inflated property tax assessments amounting to $300–$390 per year.

Q2. What is the role of distressed sales in explaining racial gaps in returns, and how do frequency versus severity contribute?

A: Distressed sales statistically account for nearly the entire racial gap in realized housing returns. Within non-distressed sales, the Black-White unlevered gap falls to less than 40 basis points and the Hispanic-White gap inverts. Two channels operate: (1) Black and Hispanic homeowners are approximately twice as likely as White homeowners to experience a distressed sale; and (2) within distressed sales, minority homeowners realize lower returns because they tend to live in neighborhoods with larger distressed-sale price discounts (estimated at 39–40% below imputed market value for Black and Hispanic homeowners, vs. 28% for White homeowners). A Blinder-Oaxaca decomposition indicates that equalizing distressed sale frequency (holding severity fixed) would close 84.6% of the Black-White gap and 133.6% of the Hispanic-White gap, so the frequency margin is quantitatively dominant.

Q3. Are racial differences in house price appreciation responsible for the gap in non-distressed returns?

A: No. Among non-distressed sales, realized returns closely track county-level FHFA house price index growth for Black, Hispanic, and White homeowners alike, essentially one-for-one regardless of race. There is no economically meaningful racial gap in house price appreciation conditional on avoiding a distressed sale. This finding implies that the gap in average realized returns is not generated by differential neighborhood-level appreciation but rather by the incidence of distressed sales and the price penalties they entail.

Q4. How much of the racial gap in housing returns can be explained by observable homeowner characteristics such as income, family structure, and leverage?

A: Controlling for county and purchase year fixed effects reduces the raw Black-White and Hispanic-White unlevered returns gaps from 2.3 to 1.5 and 1.6 percentage points, respectively. Additionally controlling for income, family structure (gender and co-applicant status), and leverage reduces the gap by a further ~0.3 percentage points. Even among the ostensibly safest group — high-income couples with low leverage — the Black-White (Hispanic-White) gap in unlevered returns is 0.7 (0.5) percentage points. Among high-leverage, low-income, single-male homeowners the gap is 1.8 (1.7) percentage points. Gaps exist within every demographic subgroup, and neighborhoods (Census tract fixed effects) explain roughly half of the remaining gap for Black homeowners and one-third for Hispanic homeowners, but substantial residual gaps persist even within neighborhood.

Q5. What observable credit risk characteristics explain racial differences in mortgage default?

A: Raw racial gaps in 90-day mortgage delinquency are 2.6 percentage points (Black-White) and 1.8 percentage points (Hispanic-White). Controlling for purchase year and county reduces these to 2.2 and 1.6 percentage points respectively. Controlling for family structure, income, leverage, and credit score reduces the gaps to 0.98 and 0.94 percentage points — implying that observable characteristics explain approximately 55% and 41% of the Black-White and Hispanic-White default gaps respectively. Credit scores contribute the most explanatory power among these controls, while mortgage contract characteristics (a test of differential lender treatment) contribute negligibly.

Q6. What is the evidence that liquidity and income instability — factors not observable to lenders — explain the residual racial gap in default?

A: Survey data from SIPP reveal that median liquid wealth (bank accounts, stocks, bonds) for Black and Hispanic homeowners is only $2,400 and $5,400 respectively, while minority homeowners are 2–4 percentage points more likely to transition to unemployment conditional on pre-unemployment income. In SIPP mortgage delinquency regressions, controlling for liquidity, job loss in the prior year, and income reduces the Black-White coefficient by about 30% and the Hispanic-White coefficient by about 41% (and 29% and 70% respectively when also controlling for income level, current loan-to-value, and family composition). In administrative data using ARM payment resets as liquidity shocks, a 10% increase in monthly payments raises 90-day default by 3.0 percentage points for White homeowners, 4.5 percentage points for Black homeowners, and 7.1 percentage points for Hispanic homeowners after 12 months. This excess sensitivity is not substantially reduced by controlling for credit scores, income, or leverage — indicating that the liquidity risk of minority homeowners is largely unobservable to lenders at origination.

Q7. Is there evidence that strategic default explains higher minority distress rates?

A: No meaningful evidence supports strategic default as a driver of excess minority distress. Using quasi-experimental variation in ex-post leverage from diverging option ARM indices (following Gupta and Hansman 2022), the paper finds large causal impacts of leverage on default but no evidence that these impacts are larger for minority homeowners. Separate survey evidence from the NSMO shows a statistically insignificant Black-White difference of 0.05 percentage points (s.e. 0.65) in agreement that “it is okay to default if it is in the borrower’s financial interest” (relative to a White mean of 6.1%). The absence of larger leverage-driven default responses combined with the presence of larger payment-shock-driven responses points specifically to liquidity — not strategic behavior — as the relevant mechanism.

Q8. What is the evidence for information frictions contributing to excess minority homeownership risk?

A: Black homeowners in the NSMO report future house price expectations that are 0.07 standard deviations more optimistic than White homeowners, conditional on past price experiences, yet realized house price growth in the subsequent two years is actually 1.1 percentage points lower for Black homeowners. Although Black homeowners are 2.8 percentage points more likely to report past personal financial crises, their stated expectations about future financial crises are similar to those of White homeowners — despite 90-day default rates that are 2.5 percentage points higher in the first two years post-origination. Black homeowners also report income growth expectations 0.3 standard deviations higher than White homeowners, while SIPP and CPS data show minorities are more likely to experience income losses. These patterns of overoptimistic expectations relative to realized outcomes are consistent with information frictions causing high-risk minority households to suboptimally select into homeownership.

Q9. How much of the racial gap in distress can be attributed to the early-2000s credit supply expansion?

A: The paper identifies the expansion as concentrated in portfolio loans and privately securitized mortgages, which are distinct from GSE/FHA mortgages that did not exhibit a comparable supply increase. Between the 2002 and 2006 purchase cohorts, the Black-White gap in distressed sales rose by 6.2 percentage points overall but only 2.4 percentage points among GSE/FHA loans. A decomposition using this contrast attributes 61.5% of the overall 6.2-percentage-point increase to the credit supply expansion. Analogously, 52.0% of the 12.2-percentage-point increase in the Hispanic-White gap between 2002 and 2006 is attributed to credit supply. Within-race decompositions find that credit supply accounts for 42%, 30%, and 35% of the increase in distress relative to 2002 for Black, Hispanic, and White homeowners respectively, for mortgages originated 2004–2006.

Q10. What is the implied contribution of the returns gap to the racial wealth gap?

A: Using a simple wealth accumulation model calibrated to PSID data on first-time homebuyer rates and home values (average first home for Black households: $142,587; for White households: $208,621), the paper finds an estimated Black-White gap in housing wealth at retirement of $169,389 versus an observed PSID gap of $182,771. Equalizing housing returns would reduce this gap by 37%. In contrast, equalizing first-time purchase rates alone reduces the gap by only about 1%, because low returns nullify the benefit of purchasing earlier. Equalizing both returns and purchase rates reduces the gap by 49%. Housing wealth in the primary home constitutes 43% of total net wealth for the average retirement-age Black household in PSID, implying the returns gap explains a quantitatively large share of the overall racial wealth gap.

Q11. What do the COVID-19 pandemic forbearance experience and mortgage modification evidence imply for policy?

A: Quasi-experimental estimates using servicer-level variation in modification propensity show that mortgage modifications cause economically large increases in housing returns for Black, Hispanic, and White homeowners alike, suggesting that since minority homeowners are more likely to become distressed, expanded modifications would disproportionately benefit them. The pandemic experience provides macroeconomic confirmation: after the onset of COVID-19 forbearance and foreclosure moratoria in March 2020, the Black-White gap in unlevered returns and distressed sales fell by approximately half, while the Hispanic-White gap (whose pre-pandemic distress convergence was already underway) remained comparatively stable. Administratively, Black homeowners who default are already 3–7 percentage points more likely than observationally similar White homeowners to receive a modification, even controlling for neighborhood and servicer, suggesting servicers partially internalize the larger distressed-sale discounts in minority neighborhoods.

Q12. Are neighborhood-level factors — specifically distressed-sale price discounts from illiquid real estate markets — important for explaining racial heterogeneity in returns conditional on distress?

A: Yes. Using MLS data on median days-on-market as a measure of real estate market thickness, the paper shows that distressed sale discounts are substantially larger in less-liquid markets, with discounts experienced by Black homeowners approximately 13 percentage points lower in the least-thick markets relative to the thickest. Black and Hispanic homeowners are disproportionately likely to realize distressed sales in thin markets. Regular sale returns are not affected by market thickness. This establishes that neighborhood market illiquidity is a second-order channel through which neighborhood-level factors contribute to the racial gap — primarily by amplifying the severity of distressed sale penalties rather than by affecting ordinary house price appreciation.

Key Concepts

Distressed sale: In this paper’s usage, an ownership spell that ends in either a foreclosure (where a lender seizes and sells the property after payment default) or a short sale (where the lender allows the homeowner to sell for less than the outstanding mortgage balance without holding the homeowner liable for the deficiency). Distressed sales are the central mediating factor between race and housing returns.

Unlevered return: The annualized ratio of sale price to purchase price, capturing property-level capital gains without reference to the financing structure. Computed as (P_sale / P_purchase)^(1/T) − 1. Does not capture leverage amplification or limited homeowner liability in foreclosure.

Levered return (internal rate of return): The discount rate that sets the net present value of all homeowner cash flows to zero, including down payment at purchase; monthly payments (principal, interest, taxes, insurance, maintenance); implicit rent; and the net proceeds at sale (property sale price minus outstanding principal balance, subject to a floor of $0.01 capturing limited liability). This measure accounts for both the amplifying effect of leverage on gains and the homeowner’s limited liability in underwater foreclosures.

Distressed sale frequency versus severity: The two distinct components through which distressed sales generate racial gaps. Frequency refers to the higher probability that a minority homeowner’s ownership spell terminates in a distressed sale. Severity refers to the larger price discount at distressed sale that minority homeowners experience, concentrated in neighborhoods with illiquid real estate markets. The paper’s decomposition finds frequency is the dominant margin.

Unobservable liquidity risk: Default risk arising from insufficient liquid wealth (cash, bank deposits, liquid securities) and income instability that is not captured by credit scores or other characteristics observable to lenders at mortgage origination. The paper’s ARM-reset event study shows this risk generates excess minority default responses even conditional on credit score and income.

Information friction (overoptimism): The tendency of minority homeowners, particularly Black homeowners, to hold expectations about future house prices, personal financial crises, and income growth that are more optimistic than their realized outcomes and than observationally similar White homeowners’ expectations. The paper uses this to explain why high-risk minority households do not self-select out of homeownership despite the high cost of distressed sales.

Credit supply channel: The mechanism by which the early-2000s expansion of private securitization and portfolio lending — channels that exhibited substantially greater growth among Black and Hispanic borrowers than among White borrowers — contributed to increased rates of minority distress during the Great Recession. Distinguished from GSE/FHA channels that did not exhibit comparable credit expansion and serve as the counterfactual.

Random Utility with Unobservable Alternatives

Mon, 01 Jan 0001 00:00:00 +0000

This paper addresses a foundational gap in the random utility model (RUM) literature: existing axiomatizations by Falmagne (1978) and McFadden and Richter (1990) assume that whenever a menu is observed, the choice frequencies of all alternatives in that menu are observable. In practice, the choice frequencies of some alternatives are routinely missing. The paper derives the full testable implications of the random utility model for such incomplete datasets, delivering a finite, nonredundant system of linear inequalities as a necessary and sufficient condition for RU-rationalizability.

The empirical backdrop motivates the formal contribution directly. In transportation choice (bus, train, walk, drive), revenue data from transit operators can reveal the market shares of bus and train but not walking or driving without survey data. In school choice, governments observe enrollment across public schools but may lack data on private school selections. In market-share analysis, private firms may not disclose sales figures. In each case, researchers typically aggregate all unobservable alternatives into a single “outside option,” treating it as one composite choice. The paper calls this the outside option approach and establishes its formal limitations.

The main theorem (Theorem 3.2) states that an incomplete dataset is RU-rationalizable if and only if two conditions hold jointly. The first is the classical nonnegativity of Block-Marschak (BM) polynomials, which appears in Falmagne’s original characterization and requires that certain inclusion-exclusion quantities over observed choice frequencies are nonneg. The second is a novel balance condition: for any “essential test collection” of choice sets, a specific net signed sum of BM polynomials across observable arcs crossing the boundary of that collection must be nonneg. This second condition captures the informational content that is lost when unobservable alternatives are collapsed. The characterization is nonredundant in the strong sense that removing any single inequality from either condition produces a strictly weaker system — every inequality is independently binding for some dataset.

The limitation of the outside option approach is made precise by Proposition 3.5: the reduced dataset formed by the outside option approach is RU-rationalizable whenever the original incomplete dataset satisfies condition (i) and condition (ii) for singleton essential test collections only. Consequently, if the original data violates condition (ii) for non-singleton essential test collections — meaning it is not genuinely RU-rationalizable — the outside option approach will nonetheless return a verdict of rationalizability. False acceptance of the random utility model is therefore possible under the outside option approach.

The proofs translate the rationalizability problem into a network flow problem on the hypercube lattice over subsets of alternatives, following Fiorini (2004). Each path from the empty set to the full alternative set corresponds to a linear order (ranking). The key methodological innovation is applying a feasibility theorem from network flow theory — specifically a generalization drawing on the max-flow min-cut theorem — to derive the necessary and sufficient conditions in the incomplete-data setting.

The paper also provides an efficient algorithm for computing tight bounds on unobservable choice frequencies, formulated as a minimum-cost transshipment problem. Because the constraint matrix is totally unimodular (it is the incidence matrix of a network), the network simplex algorithm applies directly. Applied to a lottery-choice dataset from McCausland et al. (2020) — 141 participants each choosing from subsets of five lotteries, with choices made six times per choice set — the authors treat two of the five lotteries as unobservable and compare bound widths. Their method yields significantly tighter bounds than the outside option approach and, critically, correctly identifies that lottery 4 is more desirable than lottery 3 among the unobservable alternatives. The outside option approach yields identical trivial bounds for both lotteries and thus cannot distinguish their relative desirability at all.

Q: What is the central research question? A: The paper asks: what are the testable implications of the random utility model when the choice frequencies of some alternatives are unobservable? The goal is a necessary and sufficient condition for RU-rationalizability under incomplete observation, along with a demonstration of what is lost when the standard outside option approach is used instead.

Q: What is the random utility model and why is it the focus? A: The random utility model posits a probability distribution over strict rankings of alternatives; each individual’s preferences correspond to one ranking. It is a cornerstone of discrete choice analysis in economics. Falmagne (1978) and McFadden-Richter (1990) characterized it under full observability of choice frequencies, making the extension to incomplete data a natural and practically important frontier.

Q: What does “incomplete dataset” mean formally in this paper? A: An incomplete dataset is a nonneg vector of choice frequencies satisfying: (i) for menus composed entirely of observable alternatives, frequencies sum to one; (ii) for menus that include at least one unobservable alternative, the sum of observable-alternative frequencies is at most one. The gap between the sum and one corresponds to the unobserved probability mass on unobservable alternatives.

Q: What are Block-Marschak polynomials and why do they appear? A: The Block-Marschak (BM) polynomial K(rho, D, x) is defined by inclusion-exclusion: it sums, with alternating signs, the choice frequency of alternative x over all supersets E of D. In Falmagne’s complete-data characterization, nonnegativity of all BM polynomials is necessary and sufficient for RU-rationalizability. In the incomplete-data setting, nonnegativity of BM polynomials remains necessary but is no longer sufficient.

Q: What is the novel condition in Theorem 3.2 beyond BM nonnegativity? A: Condition (ii) of Theorem 3.2 requires that for any “essential test collection” C of choice sets, the net observable outflow — the sum of BM polynomials on arcs leaving C minus the sum on observable arcs entering C — is nonneg. This balance condition captures the constraint that unobservable flow must be nonneg on every cut of the network corresponding to an essential test collection.

Q: What makes the characterization nonredundant, and why does nonredundancy matter? A: The characterization is nonredundant in the sense that for every individual inequality in conditions (i) and (ii), there exists an incomplete dataset that violates only that inequality and satisfies all others. This is established as part (b) of Theorem 3.2. Nonredundancy is essential for identifying precisely which inequalities the outside option approach discards: without it, some of the novel condition (ii) inequalities might be implied by others, and the argument that the outside option approach loses independent information would not hold.

Q: What does the outside option approach actually discard? A: Proposition 3.5 shows that the outside option approach retains only condition (i) (BM nonnegativity) and condition (ii) for singleton essential test collections. All condition (ii) inequalities corresponding to non-singleton essential test collections are discarded. Because the characterization is nonredundant, each discarded inequality is a genuinely independent constraint, meaning a dataset can violate any one of them while satisfying all others — including all conditions the outside option approach checks.

Q: Can the outside option approach produce a false acceptance of the random utility model? A: Yes. If the true incomplete dataset violates condition (ii) for some non-singleton essential test collection but satisfies all other conditions of Theorem 3.2 — including all conditions the outside option approach checks — then the original dataset is not RU-rationalizable, but the reduced dataset formed by collapsing unobservables into one outside option is RU-rationalizable. Researchers using the outside option approach would therefore erroneously conclude that the data-generating process follows a random utility model.

Q: How is the problem translated into a network flow problem? A: The authors build a directed network on the power set of alternatives, with arcs from D to D union {x} for each alternative x not in D, source at the empty set, and terminal at the full set X. Each source-to-terminal path corresponds to a unique linear order. A probability distribution over rankings corresponds to a flow, with flow conservation at interior nodes and total flow equal to one. The BM polynomial of an observable arc equals the required flow on that arc. Feasibility of this flow — guaranteed by a theorem generalizing max-flow min-cut — is equivalent to RU-rationalizability.

Q: What is the algorithmic contribution for bounding unobservable choice frequencies? A: The bounds problem is formulated as a minimum-cost transshipment problem on the same network. Because the constraint matrix is the incidence matrix of a network (totally unimodular), the network simplex algorithm applies and yields exact solutions efficiently. The algorithm produces tight upper and lower bounds for each unobservable choice frequency by optimizing the flow subject to all feasibility constraints from Theorem 3.2.

Q: How does the paper demonstrate tighter bounds empirically? A: The paper applies its method to a lottery stochastic choice dataset from McCausland et al. (2020), involving 141 participants choosing from subsets of five lotteries, with six repeated choices per choice set. The authors treat two of the five lotteries as unobservable. Their network-flow bounds are significantly tighter than the trivial bounds from the outside option approach. Specifically, their method correctly identifies that lottery 4 is more desirable than lottery 3 among the unobservable alternatives, a distinction the outside option approach cannot draw because it assigns identical trivial bounds to both lotteries.

Q: What is the monotonicity-based lower bound for unobservable choice frequencies? A: Under monotonicity (a weaker condition than full RU-rationalizability), the lower bound L(x*) for the choice frequency of unobservable alternative x* from menu D is the sum over observable alternatives a of the difference rho(D{x*}, a) minus rho(D, a), when D{x*} is in the domain. This lower bound is larger when removing x* from the menu substantially increases observable choice frequencies, indicating that x* was drawing demand away from observables and is therefore relatively desirable.

Q: How does this paper relate to McFadden-Richter (1990)? A: McFadden and Richter (1990) allow for menus to be unobserved but require that when a menu is observed, all its alternative frequencies are observed — a distinct setup from the present paper. Their characterization also involves infinitely many inequalities and is redundant. The present paper’s characterization uses finitely many inequalities and is nonredundant, making it more tractable both theoretically and computationally.

Q: What is the scope of the model regarding which alternatives are unobservable? A: The paper focuses on the case where the set of unobservable alternatives X* is fixed and consistent across all menus: a given alternative is either always observable or always unobservable. The domain of choice sets D is assumed to be an upper set (if a menu is in D, all supersets are too). The paper does not handle cases where observability of an alternative varies by menu.

Incomplete dataset: A nonneg vector of choice frequencies in which, for menus containing unobservable alternatives, the observable frequencies sum to at most one (not exactly one), with the residual mass attributable to unobservable alternatives.

Block-Marschak (BM) polynomial: An inclusion-exclusion quantity K(rho, D, x) defined as the alternating-sign sum of rho(E, x) over all supersets E of D; its nonnegativity is the classical Falmagne condition for RU-rationalizability under complete observation.

Essential test collection: A collection C of choice sets used to define the novel balance condition in Theorem 3.2; for each such C, the net observable outflow of BM polynomial values across the boundary of C must be nonneg for RU-rationalizability.

Outside option approach: The empirical practice of aggregating all unobservable alternatives into a single composite “outside option,” so that all remaining choice frequencies sum to a value below one and the residual is assigned to that composite. This approach retains only a subset of the testable implications of the random utility model.

Nonredundant characterization: A system of inequality conditions in which no single inequality is implied by the conjunction of all others; every inequality is independently binding for some dataset. This property is essential for identifying precisely which implications the outside option approach discards.

Network flow representation: A directed network on the power set of alternatives (source: empty set, terminal: full set X) in which each source-to-terminal path encodes a linear order, flow conservation corresponds to probability conservation, and feasibility of a flow with prescribed values on observable arcs is equivalent to RU-rationalizability.

Minimum-cost transshipment problem: The optimization problem used to compute tight bounds on unobservable choice frequencies; tractable via the network simplex algorithm because the constraint matrix is totally unimodular (the incidence matrix of a network).

Rationing by Race

Mon, 01 Jan 0001 00:00:00 +0000

Singh and Venkataramani ask whether resource scarcity causes discriminatory rationing of health care by patient race, with patient death as the starkest possible outcome of biased allocation decisions. They examine 107,221 inpatient admissions from 2015 to 2018 at two large urban academic teaching hospitals (each with over 500 beds) in a Southeastern U.S. city with a sizable Black population. Black patients accounted for 60% of admissions, were on average younger (52 vs. 59 years), more likely to be female (65% vs. 50%), and had similar comorbidity burdens and baseline in-hospital death rates (approximately 2% for both groups), but waited over two hours longer on average for an inpatient bed and were 27% less likely to be admitted to the ICU.

The authors exploit quasi-exogenous hour-to-hour variation in hospital capacity strain — measured as the share of inpatient beds occupied at the hour of a patient’s arrival — which clinical and qualitative literature establishes is difficult to predict even day-to-day. Capacity strain is coded in hospital-specific deciles (beds filled ranged from 69–78% in decile 1 to 91–95% in decile 10). The core regression interacts patient race with strain decile, controlling for hospital-specific hour-of-day, day-of-week, month-of-year, and year fixed effects; physician-of-record fixed effects; and a rich vector of patient characteristics including Elixhauser comorbidity indices, insurance status, and vital signs. Identification rests on the assumption that strain at the hour of arrival is conditionally independent of unobserved patient characteristics correlated with race — an assumption validated through balance tests on demographics, comorbidities, vital signs, machine-learning-derived admission themes, and selective discharge patterns.

The main finding is that in-hospital mortality rises for Black patients but not for White patients as hospitals approach capacity. At the tenth decile of strain, Black patients face a mortality rate 0.7 percentage points higher than White patients — a 47.6% relative increase over the 1.47% White mortality rate at the same decile. A pooled difference-in-differences estimate implies that approximately 15% of Black patient deaths at high strain (decile 10) would not have occurred had Black patients faced the same strain-mortality relationship as White patients (coefficient 0.0052, p = 0.025). This pattern is concentrated among patients with the greatest ex ante medical need as measured by above-median Elixhauser mortality index scores (a score with AUC of 0.92 for predicting in-hospital mortality) and, in qualitatively similar but less precisely estimated form, by abnormal vital signs at arrival.

The authors identify wait time for an inpatient bed as the primary mechanism. At all levels of capacity strain, high-need Black patients wait longer than low-need White patients — a pattern the authors characterize as a striking inversion of any need-based allocation principle. Racial disparities in wait times widen further at the highest decile of strain, exactly mirroring the mortality pattern. As an additional, more suggestive mechanism, the authors analyze free-text clinical documentation (the Reason for Admission field) using descriptive text features (time to completion, character count, average word length), sentiment analysis (subjectivity and polarity scores via TextBlob), and adjective counts. Documentation for Black patients exhibits features consistent with lower provider effort at all strain levels — shorter notes, less time deferred to completion — and subjectivity of notes and adjective counts diverge further by race at the highest strain decile, with White patients receiving increasingly detailed and descriptive notes as strain rises.

The findings are robust across sparse models (age, gender, hospital fixed effects only) through fully saturated specifications (DRG fixed effects, interactions of all controls with race and strain), and to replacing Elixhauser index composites with their 31 individual comorbidity components. The authors explicitly scope their findings to a pre-COVID-19 period (2015–2018), while noting that pandemic-era record capacity strain and racial disparities in health outcomes suggest de facto race-based rationing may have been far more severe during COVID-19.

Q: What is the central research question and why is the health care setting chosen? A: The paper asks whether increasing resource scarcity causes discriminatory rationing on the basis of race in consequential, high-stakes real-world decisions. Health care is chosen because it is high-stakes (patient death is the outcome), has a long documented history of racial discrimination at both provider and system levels, and offers uniquely detailed time-stamped electronic health record data that enables identification from hour-to-hour variation in capacity strain — a finer temporal resolution than most prior work.

Q: How is hospital capacity strain measured and what is the identifying variation? A: Strain is measured as the total number of patients occupying inpatient beds at the specific hour of a patient’s arrival, converted into hospital-specific deciles. The first decile corresponds to 69–78% of beds filled and the tenth decile to 91–95%. The identifying variation is residual hour-to-hour fluctuation in this measure after removing hospital-specific hour-of-day, day-of-week, month-of-year, and year fixed effects, which absorbs all predictable capacity patterns. Clinical and qualitative evidence establishes that even day-to-day strain is difficult to anticipate, making hour-to-hour residual variation plausibly as-if random.

Q: What are the main mortality findings, and how large are the racial disparities at peak strain? A: At the tenth decile of capacity strain, Black patients face a mortality rate 0.7 percentage points higher than White patients, representing a 47.6% relative increase over the 1.47% White mortality rate at that decile. The pooled difference-in-differences estimate (comparing decile 10 to deciles 1–9) implies that approximately 15% of Black patient deaths at high strain would not have occurred if Black patients had the same strain-mortality relationship as White patients (coefficient 0.0052, p = 0.025). White patient mortality does not increase at high strain; if anything, small (imprecisely estimated) decreases appear at deciles 7–9.

Q: Which patients drive the racial mortality disparity? A: The disparity is concentrated among patients with above-median Elixhauser mortality index scores — the ex ante sickest patients. The Elixhauser Mortality Index has a predictive AUC of 0.92 for in-hospital mortality. At decile 10, high-need Black patients experience a sharp increase in mortality not seen for high-need White patients or for low-need Black patients. Qualitatively similar but less precisely estimated results appear when acute need is measured by abnormal vital signs at arrival, with the difference that the triple interaction (race × strain × high-need vitals) is not statistically significant, consistent with vital signs being noisier proxies for severity than the Elixhauser indices.

Q: How do the authors validate the identifying assumption that strain is conditionally independent of patient composition by race? A: They document five types of supporting evidence: (i) the distribution of Black and White patients across hours of arrival and across strain deciles is nearly identical; (ii) regressions of patient demographics, all five Elixhauser comorbidity measures, and five vital signs abnormalities on race × strain interactions show no significant differential selection by race at different strain levels; (iii) machine-learning (Latent Dirichlet Allocation) topic themes from free-text admission notes change similarly by strain for Black and White patients; (iv) there is no evidence of selective discharge to hospice care by race and strain, with point estimates running counter to the hypothesis; and (v) strain is computed at time of arrival to the hospital rather than time of admission to an inpatient bed, preserving exogeneity.

Q: What is the primary identified mechanism for the mortality finding? A: Wait time for an inpatient bed is the primary mechanism. Black patients experience greater increases in wait times as strain rises compared to White patients, with the clearest divergence at decile 10 — exactly mirroring the mortality pattern. More strikingly, at every decile of strain (including decile 1, when beds are most abundant), high-need Black patients wait longer for a bed than low-need White patients, implying that the disparity is not solely a product of logistical constraints but reflects ingrained factors in clinical protocols, likely including implicit or explicit provider bias.

Q: What does the wait time evidence reveal about the role of medical need vs. race in allocation decisions? A: At lower strain levels, low-need patients appropriately wait longer than high-need patients. However, at higher strain levels (deciles 8–10) this need-based gap almost entirely disappears, while the racial gap in wait times persists. The gap between high-need Black and low-need White patients is larger than the gap between high-need and low-need patients of the same race, meaning race is a stronger predictor of wait times than medical need. This pattern is consistent with the paper’s conceptual framework in which increasing strain reduces providers’ ability to accurately assess medical need while increasing the weight assigned to racial identity.

Q: How is provider effort measured and what are the findings? A: Provider effort is inferred from features of free-text Reason for Admission documentation: time to completion, character count, average word length, TextBlob subjectivity and polarity scores, and adjective counts. Across all strain levels, Black patients’ documentation exhibits features consistent with lower effort — shorter completion times (providers less likely to defer documentation for clinical tasks), shorter notes with fewer characters and shorter words. At the highest strain decile, subjectivity scores for Black patients’ notes increase relative to White patients’ (driven by both rising Black and falling White subjectivity), and White patients receive more adjectives as strain rises while Black patients’ adjective counts do not increase. Polarity scores remain stable by race and strain.

Q: What do the documentation patterns suggest about compensatory behavior by providers? A: The authors speculate that providers may anticipate reduced care quality at high strain and compensate by becoming more conscientious with White patients — writing longer, more detailed, more descriptive notes as strain increases, and potentially exerting greater care effort correlated with these documentation improvements. This protective compensatory behavior appears substantially less pronounced or absent for Black patients, which the authors suggest may translate into the small imprecisely estimated decrease in White patient mortality at higher strain deciles. They explicitly characterize this interpretation as speculative and requiring further investigation.

Q: How robust are the main mortality findings to specification choices? A: The mortality findings hold across: (i) sparse models with only age, gender, and hospital/year fixed effects; (ii) linear probability and logistic models; (iii) models with DRG fixed effects to compare within-diagnosis; (iv) models interacting all control variables with patient race and strain; (v) models replacing the Elixhauser composite index with its 31 individual comorbidity components; and (vi) models additionally controlling for five individual abnormal vital sign indicators. Results are substantively unchanged across all these specifications.

Q: What additional care intensity measures are examined and what do they show? A: The authors also examine ICU admission, ICU length of stay, total inpatient length of stay, and inpatient charges. They find no strain-related racial disparities on these margins. However, they note that unconditionally (across all strain levels), Black patients receive fewer resources on average — they are 27% less likely to be admitted to the ICU. The authors treat these care intensity measures as harder to interpret because both over- and under-provision can harm patients, and thus view them as less informative for their research question.

Q: What conceptual framework guides the empirical predictions? A: The framework models providers as assessing perceived medical need Nij(t) = Ni × exp(−γ × S(t)), where the parameter γ captures the diminishing ability to accurately assess true need as strain S(t) rises. Simultaneously, the racial weight Rij(t) = Ri × φ(S(t)) increases with strain through the parameter φ(S(t)). When γ = 0 and φ = 0, allocation is race-neutral and need-based. When both parameters are positive, increasing strain simultaneously degrades need assessment and amplifies reliance on racial identity in allocation decisions — the paper’s core prediction, which is confirmed empirically.

Q: How do the findings relate to the COVID-19 pandemic? A: The data predate COVID-19 (2015–2018). The authors argue that pandemic conditions — record hospital capacity strain (especially in hospitals serving Black patients), extreme provider burnout, and documented racial disparities in health access — suggest race-based rationing may have been considerably more severe during COVID-19. The paper also contextualizes its findings within the pandemic-era debate over whether explicit race-based triage protocols were ethical or legal, arguing that de facto rationing by race appears to occur in ordinary care settings under typical stressors irrespective of that normative debate.

Q: What policy interventions do the authors suggest? A: The authors propose: increasing provider awareness of implicit biases; developing new algorithms to improve triage decisions for high-mortality-risk patients who might otherwise be overlooked; correcting existing care algorithms with documented racial bias; building provider peer networks to reduce biased treatment decisions; supporting patient self-advocacy; improving capacity prediction systems (as spurred by COVID-19); and creating load-shifting protocols and inter-hospital transfer networks to prevent resources from being stretched beyond capacity during high-strain periods.

Capacity strain: The state of a hospital when a high share of inpatient beds are occupied, measured here at the hour of patient arrival as hospital-specific deciles of bed occupancy (ranging from 69–78% full at decile 1 to 91–95% full at decile 10); the paper’s primary measure of resource scarcity.

Rationing by race: The paper’s term for the phenomenon whereby, as resource scarcity deepens, allocation decisions increasingly reflect patient racial identity rather than medical need — a form of discriminatory rationing that the authors distinguish from explicit (de jure) race-based triage and document as de facto practice.

Perceived need (N*): In the paper’s conceptual framework, the provider’s assessment of a patient’s medical need, which deviates from true need Ni by the factor exp(−γ × S(t)) as strain S(t) increases; captures the provider team’s diminishing ability or willingness to accurately assess true medical need under cognitive and resource constraints.

Racial weight (R*): The weight assigned to a patient’s racial identity in allocation decisions, modeled as Ri × φ(S(t)), where the function φ is increasing in capacity strain; represents the potential for discrimination — from implicit bias, algorithmic bias, reduced patient advocacy, or provider-patient social distance — to intensify as strain rises.

Wait time inversion: The condition, documented throughout the paper, where high-need Black patients wait longer for an inpatient bed than low-need White patients at every decile of capacity strain, including decile 1 when resources are most abundant — inverting the normative principle that greater medical need should yield faster access to care.

Elixhauser Mortality Index: A widely validated composite score of patient comorbid conditions used to predict in-hospital mortality (AUC = 0.92); used in this paper as the primary measure of chronic medical need, with patients split at the median into relatively sick (above median) and relatively healthy (below median) groups.

Provider effort (inferred): An unobserved construct inferred in this paper from features of free-text clinical documentation in the Reason for Admission field, including time to note completion, character count, average word length, TextBlob subjectivity and polarity scores, and adjective counts; features argued to reflect how much attention, detail, and care a provider invested in documenting — and by extension, in assessing — a patient’s condition.

Risk Sharing Tests and Covariate Shocks: Drought, Floods, and Pests in Uganda

Mon, 01 Jan 0001 00:00:00 +0000

This paper identifies and corrects a fundamental flaw in the standard methodology for testing efficient risk-sharing when shocks are covariate (affecting common prices rather than only individual incomes). The standard Townsend (1994) approach infers marginal utilities of expenditure (MUEs) from total expenditures, which implicitly assumes homothetic preferences — specifically Constant Relative Risk Aversion (CRRA) — under which all goods have unitary income elasticities and a single scalar price index captures all price effects. Ligon demonstrates that this assumption causes the standard test to fail when applied to covariate shocks such as droughts, floods, and agricultural pests, because these shocks change relative prices in ways that cannot be captured by a single price index. The perverse consequence is that in Ugandan data, every covariate shock — drought, floods, pests, and adverse prices — appears to improve household welfare under the CRRA specification (significant positive coefficients of 0.046, 0.097, 0.095, and 0.103 respectively, all significant at p<0.01), a result the paper argues is mechanically induced by the mis-specification rather than reflecting reality.

The paper makes two core theoretical contributions. First, it characterizes the complete class of preferences that permit MUE inference from expenditure data alone — specifically, requiring that item-level expenditures be “lambda-separable” (additively separable in the MUE and prices). Solving the resulting functional equations yields exactly two families of semiparametric demand systems: Constant Frisch Elasticity (CFE) demands (a generalization of CRRA) and Generalized Stone-Geary demands. Only CFE demands are tractable for panel estimation. Second, the paper shows that under CFE preferences, log expenditures on each good j follow the system: log x^j_it = a_j(p_t) + g_j(z_it) + beta_j * w_it + epsilon^j_it, where beta_j is the good-specific Frisch elasticity and w_it = -log lambda_it is the negative log MUE. This allows price effects to enter flexibly through good-time fixed effects rather than a single index, and MUEs to be recovered via factor analysis on the residual covariance matrix.

The empirical work uses eight waves of the Ugandan National Panel Surveys (2005–2020), an unbalanced panel of 5,601 distinct households yielding 22,791 usable household-year observations across 41 consumption goods (primarily food items). Uganda is divided into four regional markets, producing 32 market-year cells and 1,312 market-year-good dummies. Estimated Frisch elasticities vary substantially across goods — passion fruit is roughly three times as income elastic as cassava — emphatically rejecting the hypothesis of equal elasticities required by CRRA.

Using CFE-estimated MUEs, the risk-sharing test shows that none of the four covariate shocks has a significant effect on welfare (CFE coefficients: drought 0.010, floods 0.035, pests 0.041, adverse prices -0.043, all insignificant). The pattern holds across all time windows from 0–12 months: 42 of 52 covariate shock coefficients are significant and positive in the CRRA specification, versus only 4 of 52 in the CFE specification — barely above the 2.6 false positives expected under the null. These findings indicate that the welfare impacts of covariate shocks in Uganda operate primarily through the common price channel rather than through idiosyncratic income variation, meaning they are broadly shared within market-regions. Idiosyncratic income shocks, by contrast, show the expected pattern: they reduce welfare significantly in both specifications (CFE: 0.050***, CRRA: 0.071***), and health shocks are significant only in CFE (−0.059**).

Q: Why does the standard CRRA risk-sharing test fail for covariate shocks? A: Under CRRA preferences, MUEs depend on total expenditures only through a single scalar price index pi(p). When a covariate shock raises prices of inelastic goods (primarily food), total food expenditures increase even as actual consumption quantities fall. Because risk-sharing tests based on CRRA total expenditures cannot separate this price effect from a welfare improvement, the shock appears to raise welfare. The disturbance term in the CRRA TWFE regression depends on the very prices affected by covariate shocks, violating the exclusion restriction.

Q: What is the lambda-separability condition, and why does it matter? A: Lambda-separability requires that for each good j, some transformation phi_j of expenditures on that good can be written as the sum of a function of prices and a function of the MUE: phi_j(x_j(p,lambda)) = a_j(p) + b_j(lambda). This property is necessary for time fixed effects to absorb price variation and household fixed effects to absorb Pareto weights, which is the identification strategy behind all TWFE risk-sharing tests. Without it, no panel estimator using only expenditure data can consistently recover MUEs.

Q: What are the two demand families that satisfy lambda-separability, and what distinguishes them? A: Theorem 1 establishes that rationalizable lambda-separable demands must belong to either the Constant Frisch Elasticity (CFE) family or the Generalized Stone-Geary family. In CFE demands, log expenditures on each good equal the log of a price function minus beta_j times log lambda, where beta_j is a good-specific constant Frisch elasticity. The Stone-Geary family has a more complex nonlinear form that does not lend itself to linear estimation of log MUEs, making CFE the tractable choice. Both families nest CRRA as the special case where all beta_j are equal.

Q: How are MUEs estimated from the CFE system in practice? A: Estimation proceeds in two steps. First, log expenditures on each good are regressed on good-time-market effects and household demographic controls to obtain residuals. Second, the covariance matrix of these residuals has the factor structure Sigma = Var(w)betabeta’ + Psi, where beta is the vector of Frisch elasticities; the rank-one matrix beta*beta’ is recovered from the sample covariance matrix via factor analysis, and household-level MUEs are then obtained by regression using the estimated beta as generated regressors.

Q: What do the estimated Frisch elasticities reveal about preferences in Uganda? A: The Frisch elasticities beta_j vary substantially across the 41 goods in the Ugandan sample. Starchy staples and salt are least elastic (lowest beta_j), while fresh milk, sweet bananas, coffee, oranges, and passion fruit exhibit high elasticities — passion fruit is roughly three times as income elastic as cassava. The hypothesis that all elasticities are equal (the CRRA restriction) is easily rejected, providing direct evidence against homothetic preferences in this population.

Q: What direct evidence does the paper provide that droughts, floods, and pests are genuinely covariate and harmful? A: About 39% of Ugandan households reported drought in the 2005–06 round. Among drought reporters, 92% said it affected their production, 80% said it affected their income, and 50% said it affected their consumption. Drought, pests, and adverse prices (but not floods) led to statistically significant increases in local farmgate prices. Among markets experiencing covariate shocks, 82%, 74%, 44%, and 53% of t-tests rejected equality of relative food prices for drought, floods, pests, and adverse prices respectively. Dietary diversity and intake of vitamin B-12 (from animal-source foods) declined significantly following covariate shocks.

Q: How do households cope differently with covariate versus idiosyncratic shocks? A: Households experiencing covariate shocks primarily relied on self-insurance: 51% of drought-affected households reduced consumption and 45% drew on savings, with increased labor supply also reported. In contrast, households experiencing idiosyncratic shocks most often relied on help from friends and family (52%). This behavioral difference is consistent with the finding that covariate shocks affect welfare mainly through common price channels that are not individually insurable through social networks, while idiosyncratic shocks are partially absorbed via informal transfers.

Q: What do the CFE results imply about the nature of insurance against covariate shocks in Uganda? A: The CFE regression finds that none of the four covariate shocks (drought, floods, pests, adverse prices) has a statistically significant effect on household MUEs when time-market fixed effects are included. This implies that the welfare impact of covariate shocks is transmitted primarily through common price changes that affect all households in a market-region symmetrically, rather than through idiosyncratic income variation. Effectively, covariate shocks are “shared” within market-regions — but through price deterioration affecting everyone, not through informal transfers.

Q: How robust are the results across different shock time windows? A: Figure 3 shows that for the CRRA specification, any prior covariate shock 3–12 months earlier has a significant positive effect on log consumption in every month, while for the CFE specification no shock window produces a significant effect on w. In the full tabulation across all shock types and windows (Tables 4 and 5), 42 of 52 covariate shock coefficients are significant and positive in CRRA versus only 4 of 52 in CFE — the latter barely exceeding the 2.6 false positives expected under the null hypothesis of full insurance.

Q: What are the policy implications of these findings for relief program design? A: Because covariate shocks affect welfare mainly through common prices within market-regions, relief programs should target communities rather than individual households, since the burden is broadly shared and not concentrated. Policies that integrate markets across regions of Uganda or connect Ugandan markets to broader African or world markets would reduce the price impact of local covariate shocks. Targeted household transfers would be less effective than interventions that stabilize regional prices or supply.

Q: What broader applicability do CFE MUEs have beyond risk-sharing tests? A: Since MUE construction is independent of the risk-sharing hypothesis, CFE-estimated MUEs can be used to estimate and test any dynamic life-cycle model that puts structure on the evolution of MUEs over time, including consumption Euler equations, intertemporal marginal rates of substitution calculations, and household bargaining models. The CFE approach requires only the same expenditure data used in the standard CRRA approach and therefore serves as a more general drop-in replacement across all settings where CRRA MUEs are currently employed.

Marginal Utility of Expenditure (MUE): The Lagrange multiplier lambda on the household budget constraint in the consumer’s optimization problem; the object whose proportionality across households (log lambda_it = log mu_t - log theta_i) characterizes efficient risk-sharing. It is a function of budget, prices, and household characteristics — not reducible to a scalar function of total expenditure except under special preference restrictions.

Lambda-separability: A property of Frischian expenditures on good j such that some transformation phi_j(x_j) can be written as the sum of a function of prices and a function of the MUE alone — phi_j(x_j(p,lambda)) = a_j(p) + b_j(lambda). This is the necessary and sufficient condition for using time fixed effects to control for prices and household fixed effects to control for Pareto weights in a TWFE risk-sharing regression based solely on expenditure data.

Constant Frisch Elasticity (CFE) expenditure system: The tractable member of the two demand families satisfying lambda-separability, characterized by log x^j_it = a_j(p_t) + g_j(z_it) + beta_j * w_it + epsilon^j_it, where beta_j is a good-specific constant elasticity of expenditures with respect to MUE. Nests CRRA as the special case of equal beta_j across all goods, but admits nonhomothetic preferences and fully flexible relative-price responses.

Frischian demands: Demands expressed as functions of prices and the MUE lambda rather than prices and budget — f(p, lambda). Homogeneous of degree zero in (p, 1/lambda), equivalently written f(p*lambda). This representation is central to the lambda-separability characterization because it separates the role of the budget (via lambda) from the role of prices directly.

Covariate shock: In this paper’s usage, a shock that affects prices common to all households in a market-region — not merely a shock affecting many households simultaneously. The key analytical distinction is that idiosyncratic shocks change individual budgets without changing prices, while covariate shocks change prices, which is what causes the standard CRRA test to fail.

Nonhomothetic preferences: Preferences for which expenditure shares vary with income (budget), so no single scalar price index can fully represent the welfare impact of price changes. The paper confirms nonhomotheticity in the Ugandan data through widely varying Frisch elasticities, and argues this is the root cause of the CRRA test’s failure for covariate shocks — a problem that does not arise when shocks are idiosyncratic and leave prices unchanged.

Rural Migrants and Urban Informality: Evidence From Brazil

Mon, 01 Jan 0001 00:00:00 +0000

Overview

Research Question. Does rural-urban migration increase or decrease urban informality, and through what mechanisms — and does the answer depend on the time horizon?

Setting and Data. The paper studies internal migration in Brazil over 2000–2010. The empirical analysis combines: (i) two waves of the Decennial Population Census (2000 and 2010) covering working-age adults (ages 15–64) across 3,548 Minimum Comparable Areas (MCAs); (ii) the universe of formal firms and workers from the matched employer-employee administrative dataset RAIS (1997–2018); (iii) the ECINF informal firm survey (2003); and (iv) the annual National Household Survey (PNAD, 2001–2009) for year-on-year short-run analysis in 700 identifiable municipalities. Internal immigration to the average urban destination was large: 17.6 percent overall over the decade, 7 percent for state-to-state migration.

Empirical Design. The authors use a shift-share instrumental variable (IV) design. The shares are pre-existing migration networks (migrant flows by origin-destination pair, 1995–2000). The shifts are drought shocks constructed from the Standardized Precipitation-Evapotranspiration Index (SPEI) interacted with agricultural crop calendars and the value share of each crop in each origin municipality — accumulated over the 2000–2010 decade. A second independent instrument uses international commodity price shocks as push factors (following a China-analogous construction); the two instruments are nearly uncorrelated across origins (0.007) and only weakly correlated across destinations (-0.3), providing an independent validation.

Long-Run Findings (decadal changes, 2000–2010). A one-percentage-point increase in the immigration rate (equal to 18.5 percent of a standard deviation):

Increases the share of workers in formal wage employment by 0.27 percentage points (a 1.2 percent increase from the mean of 23 percent).
Decreases the share in informal wage employment by 0.29 percentage points (a 2.9 percent decrease from the mean of 10 percent).
Has no effect on overall wage employment, unemployment, or self-employment — the formalization effect is a reallocation from informal to formal jobs, not net job creation.
Reduces formal sector wages by 0.6 percent, with no effect on informal wages.
Increases the number of formal establishments by 1.6 percent and the number of formal jobs by 2 percent.
Raises gross firm entry by 2.8 percent and gross firm exit by 3 percent (higher churn), with effects stable or slightly increasing through 2017–18.

These firm-creation effects are not driven by migrants starting businesses: migrants are not more likely to be business owners in high-immigration municipalities.

Short-Run Findings. Using year-on-year specifications with the PNAD (2001–2009), the authors replicate the results in the prior literature: municipalities receiving more migrants experience a reduction in formal wage employment, with no change in informal employment or non-employment — so the share of informal jobs rises. These short-run informality-increasing effects coexist with the long-run formalization results, and are not a sample artifact (the long-run results are unchanged when restricted to the same 700 PNAD municipalities).

Mechanism — Downward Nominal Wage Rigidity (DNWR). DNWR in the formal sector is the key mechanism reconciling short- and long-run effects. In Brazil, nominal wage cuts were illegal, and the national minimum wage rose regularly during the 2000s. Two municipality-level DNWR proxies are used: (i) the Kaitz index (national minimum wage / municipality median wage in 2000); (ii) the share of workers with negative year-on-year nominal wage changes (from RAIS, 1997–2000). In municipalities with higher DNWR: the positive formalization effects of immigration are smaller or fully muted; non-employment increases; and formal wages decline less. These cross-sectional patterns echo the Harris-Todaro-Fields prediction, and are consistent with DNWR being more binding in the short run (when nominal rigidities bind) than in the long run (when inflation and worker turnover allow real wage adjustment).

Model. The paper develops and estimates a dynamic model of firm dynamics and informality, extending the canonical Hopenhayn framework with (i) two margins of informality — the extensive margin (whether a firm registers) and the intensive margin (whether a registered formal firm hires workers formally) — and (ii) heterogeneous long-run productivity parameters (nu) that generate firm-specific life-cycle growth profiles. Formal firms cannot revert to informality; informal firms can formalize by paying the cost differential between formal and informal entry costs.

Counterfactuals. A simulated once-and-for-all 10 percent labor supply shock (approximately the 80th percentile of observed immigration shocks) produces: a 4.1 percent decline in the share of informal workers (IV: 7.5 percent); a 16.1 percent increase in formal firms (IV: 21.1 percent); and a 3.4 percent wage decline (IV: 5 percent). Of the increase in formal firms, 40 percent is accounted for by formalization of previously informal firms, highlighting the stepping-stone role of informality that a static or dual-economy model would miss. Average firm productivity declines by 1.4 percent due to worsening firm composition (the share of formal firms in the lowest productivity quartile rises by more than 4 percentage points). A counterfactual that nearly eliminates the extensive margin of informality (via steep enforcement costs) raises total output by 8.6 percent vs. 7 percent in the baseline shock, and increases average firm productivity by 2.1 percent vs. a decline of 1.4 percent — at the cost of displacing the least productive informal firms.

Scope Conditions. Results pertain to internal (not international) migration; drought-induced migrants do not change the skill composition of the labor force at destination, justifying a homogeneous worker assumption. The formalization effects hold for migrants and non-migrants separately, and for high- and low-skilled workers separately. The model is calibrated to the average urban destination in Brazil, not a spatial general equilibrium.

In depth

Q1. What is the identification strategy, and what are the key threats to validity the authors address?

The authors use a shift-share IV where shifts are drought shocks at origin municipalities (constructed from SPEI x crop calendar x crop revenue share, accumulated over 2000–2010) and shares are pre-2000 migration networks. Threats addressed: (i) pre-trends — no evidence of differential pre-trends in firm outcomes between 1997–98 and 1999–2000; (ii) demand channel — controlling for local drought shocks and distance-weighted neighboring shocks leaves results unchanged; (iii) capital reallocation — adding a bank-network-based shift-share control (following prior literature) does not change results; (iv) agricultural processing linkages — results hold after excluding agricultural firms and food/beverage/tobacco manufacturers; (v) migration persistence — controlling for baseline log population and 1995–2000 migration rates leaves results unchanged. The commodity-price-shock instrument provides an independent validation, yielding similar results despite near-zero cross-origin correlation with drought shocks and only -0.3 correlation across destinations.

Q2. How do the authors reconcile the long-run formalization result with the short-run informality-increasing result, and what role does DNWR play?

DNWR is the key mechanism. Nominal wage cuts are illegal in Brazil’s formal sector, and the minimum wage rose through the 2000s, making DNWR binding especially in the short run. In the year-on-year specification (PNAD, 2001–2009), immigration reduces formal wage employment with no change in informal employment, raising the informal share — consistent with prior literature. Over the decade, inflation and worker turnover permit real formal wage adjustment, enabling formal sector expansion. Cross-sectional heterogeneity confirms this: in municipalities with above-median Kaitz index or below-median share of negative wage changes, the formalization effect of immigration is smaller or zero, and non-employment rises — precisely the Harris-Todaro-Fields prediction for rigid-wage environments.

Q3. What is the exact magnitude of the firm-level effects and how persistent are they?

A one-percentage-point increase in the immigration rate increases formal establishments by 1.6 percent, formal jobs by 2 percent, firm entry by 2.8 percent, and firm exit by 3 percent — all decadal effects (1999–2000 to 2011–12). Effects on firms, entry, exit, and jobs remain stable or slightly increasing through 2017–18 as estimated using RAIS panel data, with no evidence of pre-trends (effects near zero in 1997–98 to 1999–2000 period). The effect on firm-level average wages is negative (consistent with the worker-level wage effect) but not statistically significant.

Q4. Are migrants themselves the source of new formal firm creation?

No. The authors directly test and reject this channel. Migrants are not more likely to be business owners — either of small firms (fewer than 5 employees) or larger firms (6 or more employees) — in municipalities that receive more immigration. The increase in formal firm entry is driven by non-migrants responding to cheaper labor.

Q5. What are the two margins of informality in the model, and why does the intensive margin matter for the migration-formality nexus?

The extensive margin is whether a firm registers formally (firm-level binary). The intensive margin is whether a formally registered firm hires workers without formal labor contracts (worker-level, within formal firms). The intensive margin is crucial because it links formal firms to migrants: newly arrived migrants may take informal jobs within formal firms, allowing formal firm creation to respond to the immigration shock even before the labor market fully formalizes. In the transition dynamics after an immigration shock with DNWR, new formal firms tend to be small and lower-productivity, and hire a substantial fraction of their workforce informally — so labor informality hovers near its initial level for several years even as firm informality declines quickly.

Q6. What fraction of the increase in formal firms in the counterfactual comes from stepping-stone formalization versus new formal entry?

In the baseline 10 percent labor supply counterfactual, approximately 40 percent of the increase in the number of formal firms comes from formalization of previously informal firms across their life cycles. The remaining 60 percent comes from new formal firm creation. A static framework would miss the stepping-stone channel entirely and substantially underestimate total formalization.

Q7. How does the model’s calibration pin down the cost structure of informal vs. formal firms?

The model is calibrated using a two-step minimum distance procedure. First-step parameters include the persistence of formal firms’ productivity process (estimated from RAIS: rho_f = 0.92), and statutory tax rates (payroll tax tau_w = 0.375; revenue VAT tau_y = 0.293). Second-step parameters (12 total, including entry costs, exogenous death rates, productivity dispersion, and cost-function curvatures for both margins of informality) are estimated by minimizing the distance between simulated and observed moments from RAIS (2003 cross-section for static moments; 2000–2011 panel for growth moments) and ECINF (informal firms with up to 5 employees, 2003). Key calibrated values: formal entry costs are more than twice informal entry costs and correspond to over 30 times the 2003 monthly national minimum wage; the informal sector exogenous death rate (delta_i = 0.148) is more than twice the formal rate; productivity variance and persistence are similar across sectors.

Q8. What happens to firm productivity and output per worker in the long-run counterfactual?

Average firm productivity declines by 1.4 percent despite lower informality. The composition of formal firms worsens: the share of firms in the lowest productivity quartile rises by more than 4 percentage points, while the share in the top quartile falls by about 3 percentage points. Total output and tax revenues increase (7 and 8.6 percent, respectively), but both decline in per capita terms. The authors note these are likely lower bounds because the model assumes no technological differences between formal and informal sectors and no differential capital access.

Q9. What does the enforcement counterfactual reveal about the dual role of informality?

When the extensive margin of informality is nearly shut down (by making the informal cost function very steep), a 10 percent labor supply shock produces: output increase of 8.6 percent (vs. 7 percent with informality present); average firm productivity increase of 2.1 percent (vs. decline of 1.4 percent); much higher tax revenues due to greater formality. However, this comes at the cost of a sizable reduction in total firm count as the least productive informal firms are displaced. This illustrates the dual role: in the short run, the informal sector acts as an employment buffer and stepping-stone, which is more important when formal wage rigidity is stronger; but in the long run, it dampens aggregate economic benefits from immigration by sheltering low-productivity firms.

Q10. Do the results hold for both migrants and non-migrants, and across skill levels?

Yes. Appendix results show similar employment and wage effects for migrants and non-migrants separately, though formal wage declines are more pronounced for non-migrants. Results are also similar for high- and low-skilled workers — which the authors attribute to the fact that drought-induced migration does not change the skill composition of the workforce at destination (confirmed empirically). Price-shock-induced migrants differ: they are more likely to be young and male, and do change workforce composition, providing a different set of compliers that strengthens external validity.

Q11. How does the paper relate to the “startup deficit” literature on demographic decline?

The paper’s findings are the mirror image of the US startup deficit literature, which argues that demographic slowdown reduced firm entry, labor reallocation, and employment growth. The magnitudes are comparable in scale: the US startup deficit corresponds to a 5-percentage-point decline in firm entry between 1980 and 2012, while the rural-urban migration shocks studied here produce first-order effects on firm entry of similar or larger magnitude (2.8 percent per percentage point of immigration rate), suggesting labor supply growth is a primary driver of formal firm dynamics in both directions.

Key Concepts

Downward Nominal Wage Rigidity (DNWR). In the paper’s usage, the binding constraint that formal sector wages cannot be cut in nominal terms — in Brazil, both legal prohibition of nominal wage cuts and a rising national minimum wage. DNWR is the paper’s central mechanism explaining why immigration increases informality in the short run (wages cannot adjust) but reduces it over the decade (inflation and turnover permit real adjustment). Measured empirically via the municipality-level Kaitz index (national minimum wage / local median wage) and via the share of workers with negative year-on-year nominal wage changes in RAIS.

Extensive Margin of Informality. Whether a firm is registered with the government (formal) or not (informal). In the model, informal firms can avoid taxes but face a size-increasing cost of informality and the option to formalize by paying the difference in entry costs. This margin captures the firm’s legal registration status.

Intensive Margin of Informality. Whether a formally registered firm hires individual workers with or without formal labor contracts (signed work booklet, carteira de trabalho). Formal firms face increasing costs for informal hiring but exploit this margin for lower-cost labor, especially when small or young. This margin is critical because it links formal firms to migration-induced informal labor supply and allows formal firms to absorb migrants before full wage adjustment occurs.

Stepping-Stone Role of Informality. The paper’s term for the dynamic channel through which the informal sector facilitates transitions to formality for both firms and workers. Informal firms accumulate productivity experience and formalize when productivity crosses the formalization threshold; informal workers within formal firms transition to formal contracts as firms grow. In the counterfactuals, 40 percent of the increase in formal firms following a labor supply shock is attributable to this channel. The stepping-stone role is most valuable during the short-run period of formal wage rigidity.

Shift-Share Instrumental Variable. The identification design combining pre-existing migration network shares (fraction of prior migrants to destination d from each origin o, computed 1995–2000) with exogenous push shocks at origin (drought shocks or commodity price shocks). The instrument predicts which destination municipalities receive more migrants based purely on exogenous origin-level shocks, purging the endogeneity from migrants self-selecting into prosperous cities.

Minimum Comparable Area (MCA). The paper’s geographic unit of analysis: a harmonized aggregation of Brazilian municipalities whose administrative borders changed during the study period, yielding 3,548 stable units covering all urban destinations studied. The authors call these “municipalities” for convenience.

Harris-Todaro-Fields Framework. The theoretical benchmark against which the paper’s results are compared — the view (from Harris and Todaro 1970 and Fields) that rural-urban migration increases urban unemployment or informality because DNWR prevents the formal sector from absorbing migrants, who instead queue for formal jobs or enter the informal sector. The paper shows this prediction holds in the short run and in high-DNWR municipalities, but not in the long run where real wage adjustment occurs.

Screening and Segmenting: A Consumer Surplus Perspective

Mon, 01 Jan 0001 00:00:00 +0000

Bergemann, Heumann, and Wang study consumer surplus when a monopolist simultaneously engages in second-degree price discrimination (screening consumers within each market segment through quality-differentiated menus) and third-degree price discrimination (offering different menus across segments). The central question is which market segmentation maximizes aggregate consumer surplus, and under what conditions any segmentation benefits consumers at all.

The model features a monopolist selling vertically differentiated goods of quality q at strictly convex cost c(q) to a continuum of buyers with privately known values v drawn from an aggregate market m*. A segmentation is any decomposition of m* into submarkets, each receiving a profit-maximizing screening menu. The seller observes segment identity but not individual values. The problem of finding the consumer-optimal segmentation is, on its face, an optimization over distributions of distributions — an infinite-dimensional object.

The paper’s central methodological contribution is a dramatic dimensional reduction. Theorem 1 establishes that the maximum consumer surplus achievable by any segmentation equals the maximum of the expected local information rent, u(v,h) = h·Q(v−h), over all inverse hazard rate functions h satisfying a majorization constraint h ≺ h* (where h* is the aggregate market’s inverse hazard rate). The local information rent captures both the extensive margin (h measures the mass of higher-value buyers per unit of value-v buyers who earn rent from v’s allocation) and the intensive margin (Q(v−h) is the quality allocated to value v, decreasing in h as distortion increases). The two margins trade off: raising h widens the base of rent-earning buyers but worsens allocative distortion, making u(v,h) hump-shaped in h with an interior maximizer h̄(v).

The consumer-optimal segmentation has a striking structural property: every buyer of a given value v receives the same quality in every segment in which they appear, even though the monopolist could in principle offer different qualities across segments. Prices, however, differ across segments for identical buyers. This holds because the optimal segmentation is always a uniform segmentation — one in which the inverse hazard rate hm(v) is equalized across all segments containing value v.

Under log-concavity of both aggregate demand (equivalently, a non-increasing aggregate inverse hazard rate h*(v), satisfied by uniform, normal, logistic, and exponential distributions) and the supply function Q(v) (equivalent to c’’’(q)q/c’’(q) ≥ −1, satisfied by all power cost functions), the optimal segmentation takes a transparent two-regime form (Proposition 3): for values below a threshold v̂ where h*(v̂) = h̄(v̂), the inverse hazard rate is reduced to h̄(v) by concentrating low-value buyers; for values above v̂, the aggregate market is left unchanged. The resulting segments are nested convex intervals [vm, v̄], all sharing the same upper bound v̄, with pricing differing across segments only by a quality-independent base price Tm that increases with vm (Theorem 2).

Corollary 3 delivers the sharpest policy-relevant finding: under log-concave demand and supply, zero segmentation is optimal — any segmentation harms consumers — if and only if h*(v̲) ≤ h̄(v̲) at the lowest value v̲. For iso-elastic costs c(q) = q^γ/γ (γ > 1), this becomes η*(v̲) ≤ γ/(1−γ), where η*(v̲) is the aggregate demand elasticity at the bottom of the distribution. When demand is sufficiently elastic relative to supply, the monopolist’s screening already provides near-optimal consumer rents and no redistribution of buyers across segments can improve them. More elastic supply (lower γ) shrinks the set of markets where zero segmentation is optimal (Proposition 4, Zγ’ ⊂ Zγ for γ’ < γ); more inelastic supply (higher γ) expands it, and in the limit γ → ∞ zero segmentation is suboptimal only when the aggregate allocation itself is efficient.

For iso-elastic costs, the optimal segmentation assigns each segment a Pareto distribution below v̂ with shape parameter α = γ/(γ−1), and the aggregate market above v̂ (Corollary 1). Each segment’s demand elasticity equals the constant γ/(1−γ) below v̂ and the aggregate elasticity above (Corollary 2): the supply elasticity 1/(γ−1) determines how elastic demand must be made within segments to counteract monopoly distortions. The paper also extends the framework to adverse selection (where seller cost rises with buyer type), with the full reduction to inverse hazard rate optimization preserved when the rate of increase in adverse selection satisfies τ’’(v)v/τ’(v) ∈ [0,1] (Proposition 5).

Q: What is the local information rent and why is it central? A: The local information rent is u(v,h) = h·Q(v−h), where h is the inverse hazard rate at value v and Q is the inverse marginal cost (supply) function (equation 9). The factor h captures the extensive margin — the mass of higher-value buyers per unit of value-v buyers who earn rent from v’s quality allocation — while Q(v−h) captures the intensive margin — the quality allocated to v via the virtual value v−h, which falls as h rises. Because u is hump-shaped in h, there is an interior rent-maximizing inverse hazard rate h̄(v) for each value. Lemma 2 establishes that in every regular market, total consumer surplus equals the integral of u(v,hm(v))dFm(v), so the entire segmentation problem reduces to choosing h.

Q: What is the majorization constraint and what does it exactly characterize? A: The majorization constraint h ≺ h* requires that for all v ∈ V, the integral from v̲ to v of [h*(t) − h(t)]dF*(t) ≥ 0 (equation 18). Proposition 1 shows that for any segmentation σ, the average inverse hazard rate hσ must satisfy hσ ≺ h*. A partial converse holds: given h ≺ h* under regularity conditions, a uniform segmentation implementing h exists. The constraint is strictly weaker than the pointwise bound h ≤ h* available in the binary case because it permits h to exceed h* at some values (dilution) provided it falls sufficiently below h* at higher values (concentration) to maintain the cumulative inequality.

Q: What are concentration and dilution, and how do they interact? A: Concentration gathers buyers of a given value into fewer segments, lowering their inverse hazard rate below h*(v). Dilution raises the inverse hazard rate of value v by placing v in segments where immediately higher values are missing — creating gaps in the support — thereby increasing the support increment Δm(v) and hence hm(v) (equation 12). Dilution at v requires that values just above v have already been concentrated elsewhere to create the gaps; concentration thus enables dilution, linking the two tools. With only binary values, only concentration is available; with a continuum, dilution can strictly expand achievable consumer surplus by permitting h to exceed h* at low values.

Q: What does Theorem 1 establish and why is it a major simplification? A: Theorem 1 states that the maximum consumer surplus over all segmentations of m* equals the maximum of ∫u(v,h(v))dF*(v) over all h satisfying the majorization constraint h ≺ h* (equation 25). The original problem maximizes over distributions on the infinite-dimensional space of probability measures on V; the reduced problem is a standard optimal control problem over a single real-valued function h: V → R+, amenable to Karush-Kuhn-Tucker methods and often yielding closed-form solutions. Furthermore, every optimal segmentation is a uniform segmentation implementing some h solving the reduced problem, so the reduction is exact. The optimal h always satisfies regularity (h’(v) ≤ 1), meaning v − h(v) is non-decreasing, which ensures segments in the optimal uniform segmentation are themselves regular.

Q: What is the structural property of consumer-optimal segmentations regarding quality across segments? A: In any consumer-optimal segmentation, every buyer of value v receives the same quality in every segment in which they appear (the uniform quality property following from Theorem 1). This holds because the optimal inverse hazard rate h(v) is equalized across segments (uniform segmentation), and quality in a regular market is qm(v) = Q(v − hm(v)), which depends on the market only through hm(v). Prices, however, differ across segments for identical buyers: the monopolist does not redesign its product line across segments but adjusts only quality-independent base prices. This is counterintuitive because nothing in the monopolist’s problem requires quality uniformity — it emerges purely from the consumer surplus maximization.

Q: What conditions guarantee the simple two-regime convex segmentation structure? A: Log-concavity of aggregate demand — equivalently, h*(v) non-increasing in v, satisfied by uniform, normal, logistic, and exponential families — and log-concavity of the supply function Q(v), equivalent to c’’’(q)q/c’’(q) ≥ −1, together guarantee the structure of Proposition 3 and Theorem 2. Under these conditions, h̄(v) is strictly increasing in v (log-concave supply) while h*(v) is decreasing (log-concave demand), so they cross exactly once at v̂. The optimal h equals h̄(v) below v̂ and h*(v) above. Only concentration (not dilution) is ever used because log-concave supply makes u concave in h and log-concave demand ensures monotone ordering of marginal local information rents across values, so the binding majorization constraint becomes the pointwise constraint at the bottom.

Q: What is the structure of convex segmentations and their menus (Theorem 2)? A: Under log-concave demand and supply, the consumer-optimal segmentation consists of segments m with absolutely continuous supports [vm, v̄] for varying lower bounds vm ≤ v̂, all sharing the same upper bound v̄ (Part 1 of Theorem 2). Pricing across these segments differs only by a quality-independent base price Tm that is increasing in vm — more concentrated segments (lower vm) face a lower base price and carry higher information rents — while the quality menu p(q) is uniform across segments (Part 2). Equivalently, the monopolist offers nested menus all sharing the same efficient upper bound quality Q(v̄), differing in how far down the menu is extended and in the price of the lowest offered quality.

Q: What do Corollaries 1 and 2 say for iso-elastic cost functions? A: With iso-elastic cost c(q) = q^γ/γ (γ > 1) and log-concave demand, the consumer-optimal segmentation assigns each segment a Pareto distribution with shape parameter α = γ/(γ−1) below the threshold v̂, and the aggregate distribution above v̂ (Corollary 1). This delivers a constant demand elasticity of γ/(1−γ) within each segment below v̂, matching the aggregate market’s elasticity above v̂ (Corollary 2). The Pareto shape — and thus the degree of demand manipulation — is determined entirely by the supply elasticity 1/(γ−1): more elastic supply (lower γ) mandates a higher shape parameter α and more elastic within-segment demand to counteract larger monopoly distortions.

Q: When is zero segmentation optimal, and what is the precise elasticity condition? A: Under log-concave demand and supply, zero segmentation is optimal if and only if h*(v̲) ≤ h̄(v̲) — the aggregate inverse hazard rate at the lowest value already lies at or below its rent-maximizing level (Corollary 3). Since h* is decreasing under log-concavity, this condition at v̲ implies it holds everywhere, so the designer cannot improve rents at any value. For iso-elastic cost, the condition becomes η*(v̲) ≤ γ/(1−γ): aggregate demand elasticity at the bottom must be at least as large in magnitude as one plus the supply elasticity. For a Pareto aggregate distribution with shape parameter α, zero segmentation is optimal when α ≥ γ/(γ−1).

Q: How does supply elasticity govern the scope for beneficial segmentation (Proposition 4)? A: Proposition 4 establishes that for iso-elastic cost, the set of markets Zγ where zero segmentation is optimal is strictly nested increasing in γ: for any γ’ < γ, Zγ’ ⊂ Zγ. More elastic supply (lower γ) amplifies monopoly distortions and enlarges the set of markets where segmentation benefits consumers; more inelastic supply (higher γ) makes quality provision rigid, reducing segmentation’s scope. In the limit γ → ∞ (approaching unit demand), zero segmentation is suboptimal only if the aggregate allocation is already efficient — but this limit also means very inelastic supply, so the potential benefits from segmentation have shrunk toward zero simultaneously.

Q: How does this paper compare to and depart from Haghpanah and Siegel (2023)? A: Haghpanah and Siegel (2023) showed that in generic markets with a finite number of goods, some segmentation always improves consumer surplus relative to the aggregate market. This paper shows that with a continuum of qualities, this universal improvement result fails: Corollary 3 identifies a large, non-degenerate class of markets satisfying Haghpanah and Siegel’s genericity conditions where zero segmentation is optimal for consumers. The discrepancy arises because the log-concave supply condition (equation 27) is violated in finite-good environments — Haghpanah and Siegel explicitly provide a counterexample showing their result fails with a continuum of goods. This paper characterizes exactly when the finite-good gains vanish as the quality space becomes continuous, providing the precise elasticity conditions.

Q: What changes and what is preserved when extending to adverse selection? A: In the adverse selection specification, buyer net value v is private and the seller’s cost per unit is τ(v) − v, increasing in v when τ’(v) > 1. The local information rent becomes w(v,h) = u(v, τ’(v)·h), where adverse selection enters by amplifying the effective inverse hazard rate by τ’(v) (equation 40). Proposition 5 confirms that the full reduction to majorization-constrained optimization over h goes through, and the optimal segmentation features more elastic within-segment demand when adverse selection is more severe. The reduction requires τ’’(v)v/τ’(v) ∈ [0,1] (equation 39), bounding the rate of increase of adverse selection severity; if this fails, the key inequality (35) driving the optimality of uniform segmentations may break down.

Q: What are the policy implications for regulation of price discrimination? A: The results imply that blanket restrictions on market segmentation may harm consumers by preventing welfare-enhancing price discrimination in markets where demand is sufficiently inelastic relative to supply (the region outside the zero-segmentation condition). In markets satisfying η*(v̲) ≤ γ/(1−γ), allowing segmentation yields no consumer benefit, so restrictions are harmless to consumers. The key policy-relevant primitives are demand and supply elasticities, which are in principle measurable. The findings also imply that the welfare effects of data-driven personalized pricing depend critically on the interaction between consumer heterogeneity (demand shape) and cost structure (supply elasticity), rather than on the degree of segmentation per se.

Local information rent: u(v,h) = h·Q(v−h), the total consumer surplus generated per unit mass of buyers at value v as a function of the inverse hazard rate h. The factor h is the extensive margin (mass of higher-value buyers per unit of value-v buyers who earn rent) and Q(v−h) is the intensive margin (quality allocated to v via the virtual value v−h). It is hump-shaped in h with interior maximizer h̄(v), and the segmentation problem reduces entirely to maximizing its expectation.

Inverse hazard rate hm(v): in a continuous market, (1−Fm(v))/fm(v); generalized to accommodate atoms and support gaps (equation 12). It simultaneously determines the virtual value ϕm(v) = v − hm(v) (governing allocative distortion) and the scaled mass of higher-value buyers per unit of value-v buyers (governing the extensive margin of rents). The dual role requires both a continuum of qualities and endogenous segmentation.

Majorization constraint h ≺ h*: for all v, the cumulative integral of [h*(t)−h(t)]dF*(t) from v̲ to v is non-negative (equation 18). It is the exact characterization of inverse hazard rate functions achievable by some segmentation of m*, strictly weaker than the pointwise bound h ≤ h* of the binary case because it permits h to exceed h* at some values (dilution) provided it falls sufficiently below h* at higher values (concentration).

Uniform segmentation: a segmentation in which every buyer of value v faces the same inverse hazard rate hm(v) = hσ(v) in every segment containing v (equation 22). Theorem 1 establishes that every consumer-optimal segmentation is uniform; this class converts the double integral over segments and values into a single integral against F*, enabling the dimensional reduction of Theorem 1.

Concentration and dilution: the two tools by which segmentation modifies inverse hazard rates. Concentration gathers buyers of a given value into fewer segments, lowering hm(v) below h*(v). Dilution raises hm(v) above h*(v) by placing value v in segments where immediately higher values are absent, creating support gaps. Dilution requires prior concentration of adjacent higher values, so the two tools are linked; under log-concave demand and supply, only concentration is used in the optimal segmentation.

Convex segmentation: a segmentation whose constituent segments have nested convex interval supports [vm, v̄] all sharing the same upper bound v̄, with varying lower bounds vm. This is the consumer-optimal structure under log-concave demand and supply (Theorem 2). For iso-elastic cost, each segment below the threshold v̂ corresponds to a Pareto distribution with shape parameter α = γ/(γ−1) determined by cost convexity γ.

Zero-segmentation condition: the condition under which no segmentation can improve consumer surplus over the aggregate market. Under log-concave demand and supply with iso-elastic cost c(q) = q^γ/γ, it is η*(v̲) ≤ γ/(1−γ): aggregate demand elasticity at the lowest value must be at least as large in magnitude as one plus the supply elasticity (Corollary 3). When this holds, any redistribution of buyers across segments strictly reduces consumer surplus.

Search Frictions and Product Design in the Municipal Bond Market

Mon, 01 Jan 0001 00:00:00 +0000

Layer 1 — Overview

Research Question

This paper investigates whether intermediaries in the U.S. municipal bond market strategically exploit product design to increase search frictions and, through that channel, capture rents. Specifically, it asks: do underwriters who negotiate bond design with local governments have an incentive to add nonstandard provisions that raise their own competitive advantage in subsequent secondary-market intermediation, even at the expense of issuing governments and their taxpayers?

Setting and Data

The study focuses on tax-exempt general obligation and revenue bonds issued via negotiated sales by local governments (counties, cities, school districts, and other special-purpose governments) from 2010 to 2013, tracking all secondary-market transactions through 2014. The final sample comprises 13,118 bond issues with a total face value of $266.9 billion. Bond attribute data come from Mergent; transaction data come from the Municipal Securities Rulemaking Board (MSRB). Issuer financial health, demographics, and economic conditions are drawn from the Census and American Community Survey; state revolving-door regulations are compiled from the National Conference of State Legislatures database. Structural estimation uses a subsample of 927 bonds concentrated in the five states that enacted revolving-door regulations during the study period and neighboring border counties.

Identification Strategy

A core empirical challenge is that unobserved factors may jointly determine bond complexity and market outcomes. The authors exploit panel variation in state-level revolving-door regulations — laws that restrict former public officials from taking employment at firms regulated by their former agencies for a “cool-off” period — as an instrument for bond complexity. Between 2010 and 2013, three states (Arkansas 2011, Indiana 2010, Maine 2013) enacted new legislation covering state officials, and two states (New Mexico 2011, Virginia 2011) extended existing regulations to cover local officials. A difference-in-differences regression, with county and year-month fixed effects, shows that adopting revolving-door regulations covering local officials reduces bond complexity by 6% on average (coefficient −0.064, p < 0.01). Regulations targeting only state officials, who are not directly involved in bond negotiations, yield smaller and statistically fragile effects. Placebo checks on auctioned bonds, where underwriters cannot influence design, show no effect, and there is no evidence of pre-existing trends in complexity.

Main Findings

Flexibility vs. liquidity trade-off: A 1% increase in the bond complexity index lowers the number of negative credit-watch events (a proxy for default risk) by 0.002, a 3% decrease relative to the mean of 0.074, confirming that nonstandard provisions provide genuine financial flexibility. However, increasing the complexity index from its mean (1.46) to the 75th percentile (1.69) raises the intermediation spread — the cost for an investor to buy and immediately sell a bond — by 17 basis points (a 14% increase over the average of 120 basis points), confirming that complexity raises trading frictions. For context, the average intermediation spread of 120 basis points is large relative to the 30–60 basis point bid-ask spread of corporate bonds in 2010–2013.
Underwriter incentive to complicate: Increasing complexity from the mean to the 75th percentile raises the underwriter’s market share in secondary-market intermediation by 1.4 percentage points, an 11% increase over the average underwriter share of 12.2%. The underwriter’s gross profits from intermediation also increase with complexity.
Structural estimates — search costs: For a median bond, average dealer search costs amount to 10% of monthly gross profits ($2,625 per month). The underwriter’s exclusive initial sales generate a client network that lowers its effective search costs by 21% relative to an average dealer, more than offsetting its initial geographical disadvantage (for 72% of bonds, the underwriter’s baseline search cost exceeds the median dealer’s). Nonstandard provisions increase both the initial search cost parameter (φ₀) and the network-effect parameter (φ₁): a 1% increase in the complexity index increases φ₀ by 3.79% and φ₁ by 1.66%, implying complex bonds raise search costs broadly but amplify the advantage of a large client network — a position the underwriter occupies via exclusive primary-market sales.
Investor demand: Nonstandard provisions do not substantially change the average investor valuation but substantially increase the dispersion: the standard deviation of investor valuations is 0.003 for simple bonds and 0.013 for complex bonds, consistent with complex bonds being niche products that investors “either love or loathe.”
Government cost: The marginal cost of paying debt obligations is convex in complexity, reaching a minimum at an interior level of provisions; the government’s marginal financial cost increases by 42% when a median bond is stripped of all nonstandard provisions, reflecting the value of payment flexibility.
Conflict of interest: The estimated weight that government officials place on underwriter payoffs in the absence of revolving-door regulations (ψ₀) is 0.34, implying the underwriter’s value accounts for 6.7% of the government official’s payoff under the median unregulated issuer. With revolving-door regulations in place, ψ₁ is essentially zero.

Counterfactual Policies (on representative bond: face value $6.45 million, maturity 7.7 years)

Standardization mandate (ban on all nonstandard provisions): The coupon rate falls from 2.81% to 2.16% (−23%), average dealer search costs fall 47%, and investor surplus rises 13.3%. However, the marginal financial cost (c₀) rises by 41% (from 0.615 to 0.871), so the issuer’s total debt payment cost — principal plus interest, weighted by c₀ — rises by 35%, from $5.13 million to $6.96 million. The standardization policy harms issuers even while saving 7.8% of raw principal-and-interest payments ($8,349K to $7,997K), because the loss of flexibility more than offsets the liquidity gain.
Issuer-driven design (issuer sets complexity to minimize its own debt payment cost, then negotiates the coupon): Complexity falls 19% to 1.14, the interest rate falls to 2.37%, total issuer cost falls 1.5%, investor surplus rises 6%, and the underwriter’s secondary-market payoff falls 19.9%.
Underwriter intermediation ban (underwriter excluded from trading after six months): Complexity falls 5.7% to 1.33, the coupon falls to 2.59%, issuer cost falls 1.5%, but investor surplus falls 1.84% and even other dealers are worse off by 3.97%, because the underwriter’s information on primary-market buyers is lost, offsetting the liquidity gains from lower complexity.

In depth

Q1. What are the five nonstandard bond features tracked as proxies for complexity, and how are they combined into a single index?

Following Harris and Piwowar (2006), the paper focuses on five features that are particularly difficult for investors to price: (i) multiple or serial bonds per issue (as opposed to a single bond), (ii) call provisions allowing early redemption, (iii) sinking fund provisions requiring periodic debt retirement, (iv) nonstandard interest payment frequencies (other than semiannual), and (v) variable or floating interest rates. The complexity index is constructed as the simple average of the latter four provisions across bonds within an issue, plus a dummy for whether the issue contains multiple bonds.

Q2. Why do revolving-door regulations that target local officials reduce complexity more than those targeting state officials?

State officials are not directly involved in bond origination negotiations — they can only indirectly influence local governments through budget allocations. Local officials negotiate directly with underwriters and are thus the proximate counterparties whose incentives the regulations alter. Accordingly, revolving-door regulations covering local officials reduce complexity by 6% (coefficient −0.064, p < 0.01 with full controls), whereas regulations targeting only state officials produce a smaller effect (approximately 2%) that loses statistical significance once issuer financial health controls are added.

Q3. How does the paper validate that revolving-door regulations are a valid instrument for bond complexity?

The paper provides three pieces of evidence. First, the regulations have no effect on the credit ratings of bonds issued prior to their enactment, on the annual amount of bond issuance, or on the maturity length and sale method conditional on issuance — confirming the regulations do not alter governments’ risk management or underlying financing needs. Second, the regulations have no effect on complexity for competitively auctioned bonds, where underwriters cannot influence design — a direct placebo test. Third, a pre-trend analysis (Figure A1) finds no differential trend in complexity in states that subsequently adopted regulations.

Q4. What is the mechanism by which underwriters benefit from adding nonstandard provisions, and why does this advantage not diminish over time?

Underwriters purchase and distribute the entire bond issue at origination, giving them an exclusive network of investors who initially purchased the bonds. In the secondary market, knowing who owns a bond allows the underwriter to locate buyers and sellers with lower search effort. For complex bonds, this advantage is amplified: nonstandard provisions make investor education and persuasion more costly, increasing the value of pre-existing client relationships. The network-effect parameter φ₁ — which governs how rapidly search costs fall as a dealer’s cumulative trades grow — itself rises with complexity (by 1.66% per 1% increase in the complexity index), so the underwriter’s head start in client network accumulation translates into a persistently larger cost advantage precisely for the most complex bonds.

Q5. How large is the underwriter’s search cost advantage in equilibrium, and what drives it?

At the equilibrium meeting rate, the underwriter’s effective search cost of maintaining a given meeting rate is 21% lower than that of an average dealer. This advantage arises despite the underwriter having a higher initial search cost type (φ₀ of $3,609 vs. $3,216 for the average dealer at λ = 1), because for 72% of bonds the underwriter has less local trading experience than the median dealer. The advantage is entirely driven by the underwriter’s network: its exp(−φ₁ log(b)) cost discount factor averages 0.34, 32% lower than the average dealer’s 0.50. The underwriter meets investors 20% more frequently than the average dealer (0.23 vs. 0.19 per month), despite higher absolute search expenditures ($3,045 vs. $2,625 per month).

Q6. How does bond complexity affect investor demand — mean or dispersion of valuations?

Structural estimates show that increasing the complexity index by 1% increases the standard deviation of investor valuations (γ₂) by 4.60% but has no statistically significant effect on the mean valuation (coefficient −0.085, standard error 0.561). This pattern is consistent with complex bonds being niche products — they attract a subset of investors with specific preferences for the embedded features (e.g., certain tax or cash-flow attributes), while being unappealing to most investors. The standard deviation of valuations is 0.003 for a low-complexity bond (25th percentile) and 0.013 for a high-complexity bond (75th percentile).

Q7. What does the structural estimate of ψ₀ imply about the degree of collusion between government officials and underwriters?

The estimated collusion parameter without revolving-door regulations (ψ₀ = 0.34) implies that, for the median unregulated issuing government, the underwriter’s value from secondary-market trading accounts for 6.7% of the government official’s objective function. This is a substantial weight: it means officials act partly as agents for the underwriter rather than purely for taxpayers. With revolving-door regulations (ψ₁ ≈ 0), this collusive weight is essentially eliminated, explaining the empirical reduction in complexity found in Table 2.

Q8. What are the effects of a full standardization mandate on each class of market participant, and why does the issuer lose overall despite paying a lower coupon?

Under standardization, the coupon falls 23% (from 2.81% to 2.16%) and the raw principal-plus-interest payment falls 7.8% (from $8,349K to $7,997K). However, the marginal financial cost c₀ rises 41% (from 0.615 to 0.871), reflecting the loss of payment flexibility previously provided by call provisions and other features; the total issuer cost — c₀A(1 + rT) — rises by 35% (from $5.13 million to $6.96 million). Investors gain 13.3% in surplus because they value liquidity and, on average, do not value nonstandard features. The underwriter loses 36.6% of its secondary-market value while other dealers gain 36.1%, as standardization erodes the underwriter’s network advantage.

Q9. Why does the issuer-driven design scenario outperform standardization in terms of total issuer cost, even though complexity does not fall to zero?

Under issuer-driven design, the government minimizes its total cost of debt payment c₀A(1 + rT), accounting for both the flexibility value of provisions and their effect on the negotiated coupon. The optimal complexity index is 1.14 — positive, but 19% below the current baseline of 1.41 — because some provisions genuinely lower c₀ by allowing flexible debt service. The cost of search frictions (and hence the liquidity premium embedded in the coupon) falls 32% and the negotiated coupon falls to 2.37%, sufficient to reduce total issuer cost by 1.5%. By contrast, full standardization imposes a complexity of zero, which overshoots: c₀ rises more than the coupon savings compensate, increasing total costs by 35%.

Q10. What are the net welfare effects of the underwriter intermediation ban, and why is investor surplus negative despite lower complexity?

The ban reduces complexity by 5.7%, lowering the coupon to 2.59% and reducing issuer costs by 1.5%. However, the underwriter’s client network — built during exclusive initial sales — is a productive resource that improves match quality in the secondary market; banning the underwriter from trading after six months wastes this information. Average dealer search costs rise 1.2% and the meeting rate falls 1.7%, net of the complexity reduction. Investors face bonds with lower coupons and higher effective search frictions, so their surplus falls 1.84%. Non-underwriter dealers also lose 3.97% because lower coupons reduce the rents extractable from intermediation.

Q11. How is the structural model estimated, and what role do revolving-door regulations play in the estimation?

Estimation proceeds in three steps. In Step 1, bond-specific trading market parameters (investor demand, dealer search costs, meeting rates, bargaining parameters) are recovered separately for each bond by minimizing squared differences between observed and simulated trading prices, quantities, and transaction timing. In Step 2, IV regressions using revolving-door regulations and their interactions with county/state attributes as instruments for endogenous complexity map Step 1 parameters to bond attributes, addressing the endogeneity of complexity in determining search costs and investor demand. In Step 3, GMM moment conditions derived from Nash bargaining first-order conditions for the equilibrium complexity and coupon rate identify government preference parameters (θ_c, ψ₀, ψ₁), using the orthogonality condition that unobserved financing cost shocks are mean-zero conditional on observed attributes, regulations, and bond supply from neighboring counties.

Q12. Does the underwriting market show signs of concentration that might amplify the conflict-of-interest problem?

Yes. The mean state-level Herfindahl-Hirschman Index (HHI) for underwriting is 0.12, with the top three firms covering 45% of the market on average. For smaller deals (under $10 million), concentration is markedly higher: mean HHI of 0.24 and top three firms covering 64% of the market. Repeat relationships are common — 41% of bonds issued in 2011–2017 were underwritten by a firm that had underwritten a prior bond for the same issuer within five years — reflecting both informational advantages of local presence and potentially entrenched relationships that may increase government officials’ susceptibility to underwriter influence.

Key Concepts

Complexity index (nonstandard provisions): A bond-level measure computed as the simple average, across bonds within an issue, of four nonstandard features — call provisions, sinking fund provisions, nonstandard interest payment frequency, and variable/floating interest rates — plus a dummy for whether the issue contains multiple bonds. Used as the primary measure of bond complexity in all regressions and the structural model.

Revolving-door regulation: A state-level law restricting former public officials or employees from engaging in lobbying or taking employment at regulated firms for a specified “cool-off” period (typically one to two years) after leaving office. The paper uses the presence and scope of such regulations (whether they cover state officials, local officials, or both) as a source of exogenous variation in government officials’ incentives to align with underwriter interests.

Intermediation spread: The logarithm of the average dealer-to-investor sale price minus the logarithm of the average dealer-from-investor purchase price for a given bond. Used as the empirical measure of trading frictions; the sample average is 120 basis points.

Network effect in search (φ₁): The parameter governing how a dealer’s cumulative prior trades with investors in a given bond reduce its cost of meeting new investors for that bond. A higher φ₁ means a larger client network translates into steeper cost savings. The paper estimates that φ₁ itself increases with bond complexity, so complex bonds amplify the advantage of dealers (especially the underwriter) who accumulate large client networks.

Marginal cost of debt payment (c₀): A bond- and issuer-specific parameter capturing the effective cost to the government of repaying each dollar of principal and interest, net of the flexibility benefits provided by nonstandard provisions. Normalized to one for a bond with zero nonstandard provisions at average issuer characteristics; estimated to be convex in complexity with an interior minimum, implying some nonstandard provisions are beneficial from the government’s perspective.

Collusion weight (ψ): The weight a government official places on the underwriter’s secondary-market value from trading when negotiating bond design. Estimated at ψ₀ = 0.34 in the absence of revolving-door regulations (implying the underwriter’s interest accounts for 6.7% of the official’s objective) and at ψ₁ ≈ 0 when such regulations are present.

Underwriter dual role: The institutional arrangement in which the same investment bank (i) negotiates and purchases the entire bond from the issuing government at origination, and (ii) subsequently acts as a dealer in the bond’s secondary market. This dual role creates an incentive to design complex bonds that strengthen the underwriter’s competitive advantage in secondary intermediation via network effects in search.

Issuer-driven design: A counterfactual policy scenario in which the government sets the complexity level to minimize its total cost of debt payment — accounting for both the flexibility value of provisions and the anticipated effect on the negotiated coupon rate — before bargaining with the underwriter only over the coupon. This policy allows some nonstandard provisions (complexity index 1.14 vs. baseline 1.41) and reduces total issuer cost by 1.5% relative to the baseline.

Soft landing and inflation scares

Mon, 01 Jan 0001 00:00:00 +0000

Layer 1 — Overview

Research Question

Why did the 2021–2023 US inflation surge end in a soft landing — disinflation without a major recession — while the Volcker disinflation of 1979–1987 required substantial output losses? And was the timing and strength of the Federal Reserve’s reaction to the inflation surge decisive in achieving this outcome?

Methodology and Model

The paper develops and estimates a micro-founded Heterogeneous-Expectation New Keynesian (HENK) model in which agents hold idiosyncratic, dispersed beliefs about the long-run (steady-state) level of inflation. The key departure from full-information rational expectations (FIRE) is that information about the long-run value of inflation is dispersed and sticky: agents update their beliefs through pairwise social learning (SL), adopting the forecasting model of the agent whose belief produced lower recent inflation forecast errors. This tournament process — inspired by genetic algorithms — generates a time-varying cross-sectional distribution of subjective inflation beliefs.

The model admits a closed-form solution that retains the entire time-varying distribution of beliefs and can be estimated with standard full-information Bayesian methods using the inversion filter (Cuba-Borda et al. 2019). The FIRE benchmark is nested as the special case in which the average belief deviation from the target is zero at all times.

Estimation uses four US macroeconomic observables (output gap, CPI inflation, one-quarter-ahead average SPF inflation expectation, and the proxy funds rate of Choi et al. 2022 that captures both conventional and unconventional monetary policy) over 1985Q1–2023Q4. A formal model comparison rejects the RE null hypothesis (p < 0.0001) in favor of the HENK specification.

Main Findings With Quantitative Magnitudes

Inflation scares are endogenous: In the model, inflation scares arise whenever repeated above-target inflation outcomes validate and diffuse above-target beliefs through social interactions. Under the historical scenario, the share of agents holding long-run inflation beliefs between 1 and 3 percent (annualized) falls to 40 percent in mid-2022 before recovering above 90 percent by end-2023, indicating a partial but not complete unanchoring of expectations.
Timing dominates strength: Counterfactual simulations show that the timing — not the strength — of the Fed’s reaction to the inflation surge is the key determinant of inflation expectations management and subsequent macroeconomic outcomes. Varying the Taylor-rule inflation coefficient by +/-10 percent (moving from 1.64 to 2.00) produces negligible differences in inflation and output gap dynamics, with welfare ratios of 1.052 and 0.981 relative to benchmark respectively under the ad-hoc loss function. By contrast, varying the timing via the interest-rate smoothing parameter by +/-10 percent produces much larger divergences.
The Fed fell behind the curve: Under a scenario in which the Fed had strictly followed its estimated Taylor rule (removing the negative monetary policy shocks observed from mid-2020 to mid-2022), inflation would have peaked approximately 3 percentage points lower on a yearly basis. Inflation expectations would have remained lower for almost a year longer, and the subsequent rise in expectations would have been more gradual and lower-peaking. Crucially, the output gap in this preemptive-tightening scenario would have been only briefly negative (in 2022Q2) and not deep enough to trigger a recession.
Further delays would have been highly costly: A delay of the tightening by one, two, four, or eight quarters would have produced successively worse outcomes. A two-year delay generates runaway inflation and 100 percent loss of target credibility (complete unanchoring). A delay of approximately three quarters would have resulted in a sizable, self-reinforcing entrenchment of above-target inflation expectations. The welfare cost of an eight-quarter delay is 5.76 times the benchmark loss under the ad-hoc measure (1.167 under the microfounded measure).
Early rate cuts would have reignited inflation: A counterfactual 100-basis-point cut as early as 2022Q3 would have pushed annual inflation approximately 2 percent above the historical scenario through end-2023, with inflation expectations rebounding by about 1 percent (annualized) immediately after the cut. Under no early-cut scenario would inflation or expectations have converged back to target by end-2023.
Expectation heterogeneity amplifies shocks: Greater initial dispersion in beliefs amplifies and prolongs the impact of all shocks (demand, supply, monetary policy, expectation). After a one-standard-deviation cost-push shock, higher initial belief dispersion produces larger and more persistent deviations in inflation, output, and interest rates. The model-implied interquartile range of beliefs is correlated 0.538 with the SPF interquartile range and the cross-sectional standard deviation is correlated 0.483 (both p < 0.001).
Historical decomposition: Over the 2010s, negative expectation shocks account for a substantial fraction of the persistent below-target inflation (“missing inflation”). From approximately mid-2022 onward, positive expectation shocks account for most of the variance of inflation in the model. The recent disinflation is attributed to a combination of: easing supply pressures, normalization of monetary policy, and re-anchoring of inflation expectations.

Scope Conditions

Results are conditional on the estimated HENK model applied to US data, 1985Q1–2023Q4, using a stylized three-equation NK backbone (no labor market dynamics, no financial sector, no capital). The proxy funds rate is more volatile than the federal funds rate, which affects the welfare comparison for large preemptive tightening scenarios. Counterfactual scenarios are implemented through unexpected monetary policy shocks; anticipated shocks would only strengthen the inflationary effects of delays.

In depth

Q1. What is the core mechanism by which an inflation scare can develop in the HENK model?

A: When inflation repeatedly exceeds the target — whether due to shocks or delayed policy — agents whose beliefs are already above-target incur lower forecast errors than those anchored at the target. During pairwise social interactions (the tournament step of social learning), above-target beliefs spread through the population because they are selected as the “better” forecasting model. The resulting upward shift in the average belief feeds higher inflation through the New Keynesian Phillips Curve, which validates above-target beliefs further, creating a self-reinforcing loop. This mechanism differs from rational-expectations models, where beliefs mean-revert automatically.

A: Two assumptions deliver the closed-form. First, beliefs are private and dispersed (Assumption 1): agents observe only the belief of their matched mate, not the population distribution. Second, a quasi-rational-expectations (quasi-RE) observer treats aggregate beliefs as a random walk in expectations (Assumption 2: a martingale). Under these conditions, the aggregate subjective inflation expectation equals the average subjective belief about steady-state inflation plus the rational-expectations forecast. This augmented minimum-state-variable (MSV) solution can be estimated with full-information methods (the inversion filter) via standard Dynare tooling.

Q3. What data are used and how are observables mapped to model variables?

A: The estimation uses four quarterly US observables from 1985Q1–2023Q4: the output gap (real GDP from FRED, HP-filtered with a one-sided adjusted filter); the CPI inflation rate (CPIAUCSL, FRED); one-quarter-ahead average CPI inflation expectation from the Survey of Professional Forecasters (CPI3); and the proxy funds rate of Choi et al. (2022), which captures both QE and QT so that unconventional monetary policy is reflected in the instrument. Inflation and expectations are demeaned by the sample average to express them as deviations from steady state. The discount factor is calibrated at 0.99; all other parameters are estimated via Bayesian methods with Metropolis-Hastings (8 parallel chains x 100,000 iterations, acceptance rate ~30%).

A: The posterior mean of the decay parameter in the fitness evaluation (discounting of past forecast errors) is 0.775, implying a half-life of past forecast errors of approximately 3 quarters. The frequency of news shocks has a posterior mean of 0.436, meaning approximately 40 percent of agents receive an inflation news shock every quarter. The standard deviations of the aggregate and idiosyncratic news shocks are very small (posterior means of 0.0004 and 0.0006, respectively) but strictly positive. The 95 percent confidence intervals for both exclude zero.

Q5. How does the HENK model outperform the RE benchmark in fitting the data?

A: Formal model comparison rejects the RE null (p < 0.0001) with equal prior model weights (50/50). On second moments, only the HENK model replicates positive autocorrelation in inflation (0.428 vs. 0.162 for RE, against an empirical interval of [0.239; 0.579]), in inflation expectations (0.824 vs. 0.161, empirical interval [0.839; 0.927]), and in inflation forecast errors (0.122 vs. -0.145). Additionally, the HENK model reproduces the untargeted cross-sectional dispersion of beliefs over the business cycle, including the increase during the GFC and the COVID-19 era and the low dispersion during the Great Moderation — with correlations of 0.538 and 0.483 between model and SPF dispersion measures.

Q6. What does the historical shock decomposition reveal about the recent inflation surge?

A: The decomposition (Section 3.3) shows that in the initial phase of the COVID-19 shock (2020Q2-Q3), negative demand and monetary policy shocks drove inflation down. Adverse cost-push (supply) shocks dominate from early 2021 into 2022. Expectation shocks — the contribution of dispersed beliefs — are negative throughout the 2010s (explaining part of the “missing inflation”) and remain briefly negative at the pandemic’s onset before turning sharply positive and driving most of the variance of inflation in the final two years of the sample (2022-2023). The loose monetary policy stance (negative monetary policy shocks from mid-2020 to mid-2022, visible in the Taylor-rule residuals) also contributes substantially to the inflation dynamics.

Q7. What does the Taylor-rule counterfactual show, and why doesn’t preemptive tightening cause a recession in the model?

A: Removing the monetary policy shocks after 2020Q4 so that the proxy rate follows the estimated Taylor rule would have reduced the inflation peak by approximately 0.75 percentage points per quarter (equivalent to about 3 percentage points annualized) and kept expectations lower-anchored for almost a year longer. The output gap under the Taylor-rule scenario is only briefly negative (2022Q2) and does not constitute a recession. This occurs because the preemptive tightening exploits the sluggishness of subjective expectations stemming from information frictions: by raising rates earlier when beliefs are still anchored (or only weakly above target), the CB prevents the social-learning mechanism from diffusing above-target beliefs, which in turn softens the stabilization trade-off between inflation and output.

Q8. What is the U-shaped welfare relationship between preemptive tightening size and welfare?

A: Both the ad-hoc and microfounded welfare measures show a U-shaped relationship as the size of the front-loaded tightening in 2021Q1 increases from 100 bps to 400 bps to 800 bps. At 100 bps, the welfare ratio is 0.336 (ad-hoc, improvement over benchmark at 1.0); at 400 bps it improves further to 0.304; but at 800 bps (front-loading the entire subsequent tightening cycle) the ratio rises to 0.555, reflecting that the output costs of a very large early rate increase become prohibitive amid the series of supply shocks that hit in 2022. The maximum welfare gain in the microfounded criterion occurs at a slightly larger early increase than in the ad-hoc criterion, attributed to the absence of a financial sector and use of the more volatile proxy funds rate.

Q9. Does increasing the hawkishness of the Taylor rule compensate for falling behind the curve?

A: No. Varying the inflation reaction coefficient by +/-10 percent (to 2.00 for “hawk” and 1.64 for “dove”) from the posterior mean of approximately 1.82 produces negligible differences in inflation and output gaps. The hawkish scenario achieves marginally earlier rate increases but does not reduce the inflation gap relative to the historical benchmark. Welfare ratios are 0.960 (hawkish, slight improvement) and 1.057 (dovish, slight deterioration) under the ad-hoc measure, and 0.981 and 1.052 under the microfounded measure. The joint simulations varying both smoothing (timing) and hawkishness (strength) confirm that timing is the dominant factor: the two “earlier reaction” scenarios are clustered together and well-separated from the two “later reaction” scenarios, regardless of the inflation coefficient.

Q10. How does the model handle the role of initial belief dispersion in monetary policy transmission?

A: Impulse response function exercises varying the initial standard deviation of beliefs (as a share of the maximum model-generated standard deviation under the filtered shocks) show that greater initial dispersion uniformly amplifies and prolongs the macroeconomic response to all shock types (demand, cost-push, monetary policy, expectation). The mechanism is: greater dispersion means the population contains more “extreme” (far-from-target) beliefs; a shock that temporarily moves inflation off target temporarily validates extreme beliefs (lower forecast errors), causing them to spread in social interactions and shift the average belief further from target. This raises nominal rates (through the Taylor rule), deepens output losses, and prolongs the return to steady state.

Q11. What are the implications of early interest rate cuts in the counterfactual scenarios?

A: A 100-basis-point cut in any quarter from 2022Q3 through 2023Q2 would have reignited inflation expectations. The 2022Q3 scenario is most severe: expectations rebound approximately 1 percentage point higher (annualized) immediately post-cut, and annual inflation remains on average 2 percent above the historical path through end-2023. Across all early-cut scenarios, neither inflation nor inflation expectations would have returned to target by end-2023; instead, inflation would have been landing approximately 2 percent above the 2 percent target. The welfare ratios for early cuts range from 1.200 (cut in 2022Q3) down to 1.079 (cut in 2023Q2) under the ad-hoc measure — all welfare-worsening.

Key Concepts

Inflation scare (Goodfriend 1993, as used in this paper): A situation in which the public’s long-run inflation expectations become unanchored from the central bank’s target, making beliefs about above-target steady-state inflation self-fulfilling via the New Keynesian Phillips Curve. In the HENK model, a scare arises endogenously when above-target inflation outcomes repeatedly validate above-target beliefs, causing them to spread through social interactions. Measured in the paper by the share of idiosyncratic beliefs falling between 1 and 3 percent (annualized); lower share = more severe scare.

Social learning (SL): The belief-updating mechanism in which agents are paired at random each period and compare their inflation forecasting models; the agent whose model produced lower recent forecast errors (measured by the discounted sum of squared forecast errors with half-life approximately 3 quarters) is adopted by both members of the pair. This evolutionary tournament process — analogous to a genetic algorithm — generates a nonlinear, history-dependent distribution of beliefs that can drift persistently away from the target.

Steady-state learning: The restriction that agents’ heterogeneous beliefs concern only the low-frequency (intercept) component of inflation — i.e., their subjective perception of the steady-state inflation rate — while the rest of their inflation forecast (the effects of transitory shocks and lagged variables) coincides with rational expectations. This assumption, combined with internal rationality, permits a closed-form MSV solution of the HENK model.

Internal rationality: The assumption that each agent uses a perceived law of motion that is consistent with the true MSV solution of the HENK economy (including the effect of heterogeneous beliefs on dynamics), even if their intercept differs from the rational-expectations value. Agents internalize how the aggregate deviation of expectations from RE affects inflation, but they disagree about the long-run level.

Quasi-rational-expectations (quasi-RE) observer: An observer (or central bank) who, lacking information about how individual private beliefs are formed and aggregated, treats aggregate beliefs as a martingale — i.e., the expected future aggregate belief equals its current value. This assumption closes the model and permits estimation with full-information (inversion filter) methods, while preserving consistency between subjective beliefs and the law of motion.

Belief dispersion / expectation heterogeneity: The time-varying cross-sectional standard deviation (or interquartile range) of idiosyncratic beliefs in the population. In the model this is an endogenous, history-dependent outcome of the SL process. Greater dispersion amplifies the response of all macroeconomic variables to any shock by providing more “extreme” beliefs that can gain traction in pairwise tournaments when inflation temporarily deviates from target. Measured empirically by the interquartile range and standard deviation of individual SPF forecasts.

Proxy funds rate (Choi et al. 2022): A summary measure of the US monetary policy stance that incorporates both conventional interest rate policy and the effects of unconventional policies (quantitative easing and tightening), used in the paper in place of the federal funds rate to capture the full stance of monetary policy in the estimation and historical decomposition.

Inversion filter (Cuba-Borda et al. 2019): A computationally efficient estimation algorithm that, rather than the Kalman or particle filter, inverts the observation equation analytically to recover the sequence of structural shocks for a given parameter vector. It enables full-information Bayesian estimation of the nonlinear HENK model by separating the linear part of the solution from the nonlinear social-learning residual.

Talent Hoarding in Organizations

Mon, 01 Jan 0001 00:00:00 +0000

This paper provides the first empirical evidence of talent hoarding in organizations — the practice whereby managers deliberately suppress workers’ internal mobility to retain productive team members, thereby serving their own performance-based compensation interests at the expense of firm-wide talent allocation. The research question is whether managers with misaligned incentives hoard talent, how this can be measured, and what consequences it have for worker career outcomes and organizational efficiency.

The study uses personnel records from a large German manufacturing firm with over 200,000 employees worldwide, focused on more than 30,000 white-collar and management employees in Germany, covering over 300,000 employee-by-quarter observations from 2015 to 2018. This is supplemented by a manager survey (62% response rate, over 3,000 responses) and an employee survey (50% response rate, over 15,000 responses), plus the universe of internal job application and hiring data covering over 16,000 job openings and over 200,000 applicants.

The conceptual framework formalizes talent hoarding as a moral hazard problem: managers observe worker productivity and are compensated based on team performance, but are tasked with identifying and developing talent for promotion. When a high-productivity worker leaves, team productivity falls. The framework predicts that hoarding intensity increases with worker productivity, team vulnerability to departures (smaller teams), and manager-level hoarding incentives (performance-related pay, low talent visibility).

The key administrative measure of hoarding is the systematic gap between managers’ private performance ratings (not shared outside the team) and public potential ratings (widely circulated within the firm). Managers who suppress potential ratings relative to what would be predicted given worker performance are interpreted as strategically reducing worker visibility. Managers with a 1 percentage point higher share of performance-related pay are 0.19 percentage points more likely to hoard talent; a one-person increase in team size reduces hoarding probability by 1.3 percentage points; and managers in low-visibility functional areas are 4.0 percentage points more likely to hoard. Survey-based hoarding measures yield directionally identical patterns.

To identify causal effects on workers, the paper exploits quasi-random manager rotations. When a manager learns they will move to a different team — typically two to three quarters before the actual transition — their hoarding incentive ceases. This creates a temporary window of reduced hoarding. During this window, worker application rates increase by 2.3 percentage points, representing a 78% increase over the baseline application rate of 2.9%. An event study confirms flat pre-trends prior to the announcement period, supporting the identifying assumption.

Using manager rotations as an instrument for worker applications, marginal applicants — those induced to apply only by the manager rotation — face a 49.1% likelihood of receiving a new position, compared to an average hiring likelihood of 27.6%. This positive selection implies that many deterred applicants would have been successful and that talent hoarding meaningfully degrades the quality of the internal applicant pool. Gender analysis reveals that women are 22% more likely to rely on manager career guidance and 26% more likely to prioritize preserving a good manager relationship. Marginal female applicants are more positively selected on education, past performance, and hiring probability for higher-level positions. The counterfactual reduction in the gender pay gap from eliminating talent hoarding is estimated at 86%.

Scope conditions: the firm is a large European manufacturer with long average tenures (13 years), an application-based internal labor market, and centralized online job portal. Results apply most directly to white-collar and management employees in Germany. External validity is supported by comparisons to German workforce surveys and by the fact that 83% of top publicly listed German companies and half of 665 global organizations in industry surveys report talent hoarding as a significant organizational friction.

Q: How is talent hoarding formally defined in this paper? A: Talent hoarding is defined as actions taken by managers that lower the likelihood that a worker applies for and receives a promotion or any internal transfer outside the team. In the formal framework, a manager chooses hoarding intensity β ≥ 0, where β > 0 reduces the equilibrium probability that a worker gets promoted. The definition encompasses all forms of managerial action that reduce worker departure probability, including suppressing visibility, restricting access to trainings, explicit discouragement, and threats.

Q: Why do managers have an incentive to hoard talent? A: Managers are compensated based on team performance, so losing a high-productivity worker (whose replacement is a random draw from an outside distribution with expected productivity ᾱ) reduces team performance and thus manager compensation. The framework shows that when a worker’s productivity αi exceeds the expected productivity of an outside hire ᾱ, the manager optimally sets β* > 0. The cost of hoarding (parameterized as φm) is convex and varies across managers, capturing altruism, reputation risk, or detection probability.

Q: What share of managers in the survey self-report talent hoarding? A: 75% of managers reported that they sometimes find themselves in situations where they need to dissuade a team member from exploring opportunities in another department due to immediate team needs or performance goals. Additionally, 45% cite the risk of losing talent as a reason not to invest in employee career development, and 66% cite the need to prioritize short-term performance targets over long-term employee development.

Q: How are misaligned incentives documented in the manager survey? A: 55% of managers agree or strongly agree that talent development entails a conflict of interest because more developed workers are more likely to leave the team. While 96% believe their direct intervention has a large impact on workers’ career development, only 36% perceive that impact to be valued by the firm as much as team performance impact. Similarly, 87% say talent development is a high-impact area for the firm, but only 40% believe a track record in talent development matters for their own compensation and promotion.

Q: How is the administrative measure of talent hoarding constructed? A: The measure is the residual from an OLS regression of a worker’s potential rating (a public signal of promotion readiness, widely circulated within the firm) on their performance rating (a private signal of current task performance, not shared outside the team) and worker characteristics including age, education, gender, and tenure. The manager-level measure is the average of these residuals across all workers and quarters under that manager. Managers in the top tercile (mean deviation above 0.1036) are classified as hoarding-prone.

Q: Does the hoarding measure respond to the incentive proxies as predicted by the framework? A: Yes. A 1 percentage point higher share of performance-related compensation is associated with a 0.19 percentage point increase in the probability of being classified as hoarding-prone (p = 0.000), corresponding to a 13 percentage point difference between the 90th and 10th percentiles of the financial incentive distribution. A one-person increase in team size reduces hoarding probability by 1.3 percentage points (p = 0.000), again a 13 percentage point difference across percentiles. Managers in low-visibility functional areas are 4.0 percentage points more likely to hoard (p = 0.002) relative to high-visibility areas.

Q: Is the training-based hoarding measure consistent with the potential-rating measure? A: Yes. A complementary measure based on managers restricting worker access to high-visibility in-person trainings yields nearly identical patterns: a 1 percentage point increase in performance-related pay increases hoarding probability by 0.20 percentage points (p = 0.000); a one-person increase in team size reduces it by 1.4 percentage points (p = 0.000); low-visibility areas increase hoarding by 2.98 percentage points (p = 0.021). The direction and economic magnitudes are highly similar across both administrative measures and the survey-based measures.

Q: How are manager rotations used to identify causal effects on workers? A: When a manager learns they will move to a different position — typically two to three quarters before the rotation — their incentive to hoard workers on their current team ceases. This creates a quasi-random window of reduced talent hoarding for workers on that team. An event study with worker and quarter fixed effects shows flat pre-trends in application rates beyond three quarters before the rotation, consistent with the identifying assumption that managers do not yet know about their rotation in that earlier window. Balance tests confirm workers exposed to rotations are observationally similar on demographics and past performance to non-exposed workers.

Q: How large is the effect of manager rotations on worker applications? A: Manager rotations increase worker application rates by 2.3 percentage points in the quarter of rotation, representing a 78% increase over the baseline application rate of 2.9%. The effect is transitory: application rates return to baseline within one quarter after the new manager settles in. The effect is not driven by managers taking subordinates with them (97% of applications are to positions outside both the current team and the manager’s new team).

Q: Does the rotation effect vary with predicted hoarding intensity as the framework requires? A: Yes. The rotation effect is larger for workers with higher productivity, those whose replacement would be costlier (consistent with the prediction that workers harder to replace face more hoarding), and those working under managers with lower utility costs of hoarding. The paper tests these cross-sectional predictions using continuous interactions between the rotation indicator and standardized proxies for hoarding intensity, and all patterns are consistent with the talent hoarding mechanism rather than alternative explanations.

Q: How successful would the deterred applicants have been? A: Marginal applicants — those induced to apply by the manager rotation who would not otherwise have applied, identified via IV assumptions — face a hiring probability of 49.1%, compared to the average hiring likelihood of 27.6% across all applicants. This large positive selection implies that a substantial share of deterred applicants would have been successful, and that talent hoarding meaningfully degrades the quality and quantity of the firm’s internal applicant pool and the firm’s ability to promote high-productivity workers.

Q: Does talent hoarding have differential effects by gender? A: Yes. Women are 22% more likely to place high value on preserving a good relationship with their manager and 26% more likely to rely on manager career guidance when making career decisions. Consistent with this, marginal female applicants are more positively selected on educational qualifications, past performance, and hiring probability for higher-level positions than marginal male applicants. When comparing potential earnings outcomes, both men and women would earn more in the absence of talent hoarding, but the larger earnings gains for women imply a counterfactual reduction in the gender pay gap of 86%.

Q: What evidence supports external validity of the findings? A: The firm’s employee demographics closely match those of large manufacturing firms in the German BiBB workforce survey across gender, age, citizenship, and marital status. The firm’s internal labor market design is standard for large German firms, where 83% of top publicly listed companies cite talent hoarding as a key organizational friction. Industry surveys also report that half of 665 global organizations report managers hoarding talent by discouraging worker mobility, and talent hoarding occurs through many of the same behaviors documented in this study.

Q: How does the paper rule out confounding mechanisms for the rotation effect? A: The paper tests and rules out several alternatives: worker-manager specific match effects (the effect does not depend on characteristics of the incoming or outgoing manager); finite project timelines driving a rush to apply; and workers being recruited by managers to their new teams (97% of applications are outside the current team and not to the manager’s new team). Balance tests show workers exposed to rotations are observationally similar to non-exposed workers, and event studies confirm absence of pre-trends in team-level outcomes including absenteeism.

Q: What are the policy implications of the findings? A: The findings suggest firms forgo productivity gains when hoarded workers are not allocated to positions where they would be most productive. Potential organizational responses include monitoring or rewarding managers for promoting talent, reducing performance-related pay tied to team composition, or structuring career development activities in ways that cannot easily be suppressed by individual managers. The paper notes that firms generally do not compensate managers for promoting workers, partly due to practical difficulties of such contracts, and that the misalignment between what managers believe benefits the firm and what is recognized in their own compensation is particularly pronounced for talent development relative to all other managerial responsibilities.

Talent hoarding: Actions taken by managers that lower the likelihood that a worker applies for and receives a promotion or internal transfer outside the team, driven by managers’ incentive to retain productive workers to protect team performance and manager compensation. Distinct from mere neglect — it is strategic and deliberate.

Potential rating: A public signal of a worker’s future potential for higher-level positions, assigned by the direct supervisor and widely circulated within the firm (e.g., via HR lists of high-potential workers); distinguished from performance ratings by its visibility outside the worker’s current team, making it a lever for strategic manipulation by hoarding managers.

Performance rating: A private, task-specific signal of a worker’s past performance in their current position, not shared with other units in the firm; used as the baseline against which potential ratings are compared in the paper’s administrative hoarding measure.

Visibility suppression (hoarding measure): The manager-level average residual from a regression of workers’ potential ratings on their performance ratings and worker characteristics; a positive average residual indicates the manager systematically assigns lower potential ratings than predicted, suppressing worker visibility outside the team in a manner consistent with strategic talent hoarding.

Manager rotation: An event in which a manager leaves their current team for a different internal position within the firm, temporarily eliminating their hoarding incentive for current team workers and creating the paper’s quasi-experimental source of variation in hoarding exposure.

Marginal applicant: In the IV framework, a worker who applies for an internal position only because their manager is rotating and would not have applied otherwise; estimated via complier analysis (Abadie 2003) and used to characterize the counterfactual quality and hiring probability of workers deterred by talent hoarding.

Utility cost of hoarding (φm): A manager-level parameter capturing the convex private cost to a manager of engaging in talent hoarding; may reflect altruism, detection risk, or reputational consequences; managers with lower φm hoard more intensively, and variation in φm is proxied empirically by performance-related pay, team size, and functional-area talent visibility.

The Effect of High-Tech Clusters on the Productivity of Top Inventors: Comment

Mon, 01 Jan 0001 00:00:00 +0000

This paper is a comment on Moretti (2021b), which studied agglomeration effects for innovation by testing whether the size of technology clusters causes patenting. The original paper (M21) used US patent data from 1971 to 2007 (Zucker and Darby, 2014) and reported a baseline elasticity of patenting with respect to cluster size of 0.0676, along with event study and instrumental variables (IV) evidence supporting a causal interpretation.

Wiebe identifies two major methodological problems that undermine M21’s causal claims.

Problem 1 — Misspecified event study. M21’s event study (Figure 6) was designed to test for selection bias from “rising star” inventors sorting into large clusters. The event is inventors moving across cities exactly once. However, M21’s specification interacts pre-move average cluster size with pre-move event-time indicators and post-move average cluster size with post-move event-time indicators separately — it does not exploit the change in cluster size generated by the move itself. Following the standard “mover” design literature (Finkelstein et al., 2016; Molitor, 2018; Cantoni and Pons, 2022), the correct specification uses the change in average cluster size as the treatment variable, interacted with event-time indicators. Wiebe implements this corrected event study and finds no statistically significant pre-trend and no statistically significant treatment effect post-move. Notably, the baseline elasticity estimated on the mover sample using all observed variation is large and significant at 0.3145 (SE 0.0953), but no effect is detected when variation is restricted to that generated by moving. The null result could also partly reflect attenuation bias from misclassified moves, since the dataset does not distinguish inventors who share the same name.

Problem 2 — Coding error in IV. M21’s Table 5 instruments cluster size using variation in the number of inventors in other cities employed by firms also active in the focal inventor’s city, with the instrument calculated via first-differencing. Due to a coding error, M21 sorts data by firm, field, and year but not by city before first-differencing, so the differencing is taken across cities rather than within cities. Because firm-field-year is not a unique sorting key, Stata’s sort command pseudo-randomly orders observations with tied values, making the results unreproducible across runs. When Wiebe corrects the code to sort by city and compute first-differences within city, the 2SLS estimates become unstable and nonsignificant, with the first-stage F-statistic falling to approximately 7. This means M21 provides no valid IV evidence against confounding from city-field-year shocks such as local subsidies.

Beyond these two major problems, the Appendix documents seven additional issues. The positive effect of cluster size on patent quality (M21 Table 6) disappears and reverses when the log transformation is corrected from log(y + 0.00001) to log(y + 1) or Poisson regression — the corrected estimate is negative and significant, implying that cluster size reduces citations per patent along the intensive margin and the overall quality effect is negative. Heterogeneous elasticity estimates (M21 Table 8) contain a coding error; corrected estimates show substantial heterogeneity. The distributed lag model (M21 Figure 5) uses an incorrectly defined lag structure in an unbalanced panel; corrected estimates yield nonsignificant contemporaneous effects. Cluster quality estimates (M21 Table A.8) use a cluster size definition differing from the text, and corrected elasticities are approximately half as large. M21’s claimed extensive margin effect in Table A.7 is logically unsupported since no zeros are observed. The team size robustness check is conceptually flawed because it controls twice for per-coauthor adjustment. A gap-interpolation coding error in Table A.6 biases estimates downward. Broader computational reproducibility failures arise from many-to-many merges with non-unique sort orders. Wiebe explicitly notes that the null IV and event study results are not evidence against agglomeration effects per se.

Q: What is the baseline finding in M21 that Wiebe contests? A: M21 reports a baseline elasticity of patenting with respect to cluster size of 0.0676, estimated from linear regressions with extensive fixed effects including inventor fixed effects. M21 presents an event study and IV strategy as additional evidence supporting a causal interpretation of this elasticity.

Q: What is wrong with M21’s event study specification? A: M21’s event study interacts pre-move average cluster size with pre-move event-time indicators and post-move average cluster size with post-move event-time indicators, but never uses the change in cluster size associated with moving. The standard mover design (Finkelstein et al., 2016; Molitor, 2018) uses the change in average environment as a constant treatment variable interacted with all event-time indicators. Because M21’s specification does not exploit moving-induced variation, it would be identified even if moving induced no change in cluster size.

Q: What does Wiebe’s corrected event study find? A: Wiebe’s corrected mover event study shows no statistically significant pre-trend (consistent with no systematic sorting of rising-star inventors into large clusters) and no statistically significant post-move treatment effect. In contrast, the baseline fixed-effects elasticity on the mover sample using all observed variation is 0.3145 (SE 0.0953) — large and significant — indicating the null result is specific to the moving-generated variation.

Q: What alternative explanation does Wiebe offer for the null event study result? A: The null result could be partly explained by attenuation bias from misclassified moves. M21’s code creates inventor identifiers based on names, but the COMETS dataset does not distinguish inventors who share the same name, so an apparent cross-city move may simply be two different inventors with the same name living in different cities.

Q: What is the coding error in M21’s IV strategy? A: M21 constructs the instrument by first-differencing a variable measuring inventors in other cities working for firms also active in the focal city. The code sorts by firm, field, and year before differencing, but omits city from the sort key, so first-differencing is computed across cities rather than within cities, generating an instrument that does not match the definition in the text.

Q: Why does the coding error also cause non-reproducibility? A: Firm-field-year is not a unique sorting key because multiple cities can share the same firm-field-year values. Stata’s sort command pseudo-randomly orders observations with tied values, so each run produces a different city ordering within tied groups and therefore a different instrument and different estimates.

Q: What do the corrected IV results show? A: After correcting the sort order to include city and computing first-differences within city, the 2SLS estimates are unstable and nonsignificant. The first-stage F-statistic falls to approximately 7, indicating a weak instrument. This does not constitute evidence against agglomeration effects, but means M21’s IV strategy provides no valid evidence against confounding from city-field-level shocks such as local subsidies.

Q: What happens to the patent quality results when the log transformation is corrected? A: M21 uses log(citations + 0.00001), which assigns very large weight to the extensive margin. When Wiebe uses log(citations + 1) or Poisson regression instead, the estimated effect of cluster size on patent quality is negative and statistically significant, reversing M21’s finding. The corrected result implies that while cluster size may raise the probability of producing any cited patent, it reduces citations per patent for inventors who do produce cited patents, and the overall effect is negative.

Q: What are the corrected aggregate agglomeration loss estimates? A: Using the corrected constant elasticity, the estimated output reduction from equalizing cluster sizes is -9.15% (slightly smaller than M21). Using corrected heterogeneous elasticities based on within-field-year size quartiles, the output loss is -23.75% (about twice as large). Using elasticities based on global size quartiles, the loss is -35.11%.

Q: What is wrong with M21’s distributed lag model (Figure 5)? A: M21’s code defines lags and leads using sequential observations in the panel rather than calendar years. Because the inventor-year panel is unbalanced, a coded “one-year lag” can refer to any number of years prior. When Wiebe restricts to inventors with 11 consecutive years and correctly defines year-based lags, confidence intervals widen substantially and the contemporaneous effect estimate becomes nonsignificant.

Q: What is the conceptual flaw in M21’s team-size robustness check? A: M21’s Table A.8 controls for the number of coauthors on a patent, but the dependent variable is already measured as patents per coauthor. Controlling for team size after already dividing by team size effectively controls for the same variable twice.

Q: What are the broader computational reproducibility problems in M21? A: The cleaning code uses many-to-many merges with non-unique sort orders, generating slightly different datasets on each run. For example, when merging inventors with patent assignees, patent identifiers are not unique because multiple firms can be assigned to a single patent. Removing name suffixes also causes distinct inventors (e.g., Paul H. Hamisch Jr. and Sr.) to be assigned the same identifier. Additionally, using reghdfe with the keepsingletons option retains singleton groups explicitly warned against by the package due to biased standard errors.

Agglomeration elasticity: The elasticity of an inventor’s patent output with respect to the size of the technology cluster (city-field-year cell) in which they work; reported as 0.0676 in M21’s baseline and 0.3145 on the mover sample with all observed variation.

Mover event study design: An event study specification in which the treatment variable is the change in an individual’s average environment (here, cluster size) before and after a geographic move, interacted with event-time indicators — the standard design used in Finkelstein et al. (2016) and Molitor (2018), which M21’s specification does not follow.

Cluster size: The number of inventors (or cluster density) active in the same city-field-year cell as the focal inventor, used as the key independent variable in M21’s regressions.

First-stage F-statistic: A measure of instrument strength in 2SLS IV estimation; the corrected instrument yields F ≈ 7 (indicating weakness), whereas M21’s incorrectly constructed instrument produced a stronger first stage by exploiting spurious cross-city variation.

Extensive vs. intensive margin (patent quality): The extensive margin captures whether an inventor produces any cited patent; the intensive margin captures citations per patent conditional on having any. M21’s log(y + 0.00001) transformation overweights the extensive margin, and the corrected intensive-margin effect of cluster size on quality is negative and significant.

Computational reproducibility: The property that running code on the same data produces identical results across runs. M21’s code fails this standard due to non-unique sort orders in merges and first-differencing steps, causing the IV instrument to differ across runs.

Rising star sorting: The hypothesized selection mechanism whereby inventors with increasing patent trajectories are preferentially hired into large clusters, which would bias OLS agglomeration elasticity estimates upward; M21’s event study was designed to test for this but is incorrectly specified and does not use moving-induced variation.

The Effect of Omitted Variables on the Sign of Regression Coefficients

Mon, 01 Jan 0001 00:00:00 +0000

Masten and Poirier demonstrate a previously unrecognized asymmetry in the coefficient stability literature: depending on how omitted variable bias is measured, it can be substantially easier for omitted variables to flip a regression coefficient’s sign than to drive it to zero. The paper focuses specifically on Oster (2019b), a widely used robustness framework with approximately 5,500 Google Scholar citations as of December 2025, and shows that Oster’s sensitivity parameter δ — commonly interpreted as the ratio of selection on unobservables to selection on observables — exhibits a structural problem when used to assess sign robustness.

The core theoretical result (Theorem 2) is that, in Oster’s sensitivity analysis, the sign change breakdown point is bounded above by 1 for any value of R²_long. Since researchers typically treat |δ| = 1 as the cutoff for a robust result, this implies that no empirical result is robust to sign changes under Oster’s framework, even when the explain away breakdown point is far larger than 1. The mechanism is a vertical asymptote in the identified set for βlong that occurs precisely at δ = 1, arising from near multicollinearity between the treatment X and the covariates. At this asymptote, the bias-adjusted estimand becomes discontinuous: βlong can jump from a positive to a negative value as δ crosses 1, even when δ is changed by a negligible amount.

The paper illustrates this with the bias-adjusted estimand formula. Under Oster’s Proposition 1 (which requires δ = 1 plus an auxiliary proportionality assumption), the point estimate for the social capital application is 0.532. But if δ = 1 without the auxiliary assumption, the identified set becomes {−0.0855, 1.8947}. For δ = 0.99, the identified set includes {−18.66, −0.0868, 1.736}. The baseline OLS estimate is 0.17, and the explain away breakdown point (correct) is −32.0, while the sign change breakdown point is only 0.586 — well below the conventional robustness threshold of 1.

The authors propose a modified robustness measure that adds Assumption A5: an explicit bound M on the magnitude of omitted variable bias (|βlong − βmed| ≤ M). Under this restriction, the sign change breakdown point can exceed 1, making robust sign conclusions possible. The choice of M requires substantive justification by the researcher.

Two meta-analyses covering 58 empirical papers document the practical extent of the problem. For papers published in top-five journals from 2019–2021 that cite Oster (2019), the median explain away breakdown point is 2.65, while the median sign change breakdown point (with M = 10|β̂med|) is 1.15 and without the M restriction is 0.96. At the 90th percentile, the explain away point is 13.22, while the sign change point (M = 10|β̂med|) is only 1.66. Across both meta-analytic samples, more than 50% of regressions require that the sign of βlong must be assumed a priori in order to interpret the explain away breakdown point as evidence of sign robustness.

Scope conditions: The results apply specifically to Oster’s linear regression coefficient stability framework under the assumption of exogenous controls (cov(W1, W2) = 0, Assumption A4). The authors note this exogeneity assumption is strong in many applications. The paper does not claim the results extend to other sensitivity analysis frameworks (e.g., Cinelli and Hazlett 2020). The methods are implemented in the companion Stata module regsensitivity.

Q: What is the central finding of the paper?

A: The sign change breakdown point for Oster’s δ is bounded above by 1 (Theorem 2), regardless of how large the explain away breakdown point is. Since |δ| = 1 is the conventional robustness threshold, this implies that, under Oster’s framework, no result is ever robust to a sign change. The explain away breakdown point can simultaneously be very large — e.g., −32.0 in the social capital application — while the sign change breakdown point is only 0.586.

Q: What are the two kinds of breakdown points the paper distinguishes?

A: The explain away breakdown point answers: what is the smallest |δ| required for the data to be consistent with a zero causal effect? The sign change breakdown point answers: what is the smallest |δ| required for the data to be consistent with a causal effect of opposite sign? These two quantities are often equal but are not generally equivalent, and the sign change breakdown point can be strictly smaller than the explain away breakdown point.

Q: What is the mechanism behind the sign change breakdown point being bounded above by 1?

A: The identified set for βlong has a vertical asymptote precisely at δ = 1, arising because the sensitivity analysis allows treatment X and the covariates (W1, W2) to approach near multicollinearity. Near this asymptote, omitted variable bias can be arbitrarily large while δ remains close to 1. This discontinuity allows the bias-adjusted estimand to jump across zero — changing sign — even as δ is changed by an infinitesimal amount near 1.

Q: How sensitive is Oster’s bias-adjusted point estimator near δ = 1?

A: Extremely sensitive. In the social capital application, Oster’s Proposition 1 formula (which assumes δ = 1 with the auxiliary proportionality condition) yields an estimate of 0.532. But without the auxiliary assumption, at δ = 1 the identified set is {−0.0855, 1.8947}; at δ = 0.99 it includes {−18.66, −0.0868, 1.736}; at δ = 1.01 it includes {−0.0843, 2.133, 15.64}. These are not minor perturbations — the estimand is discontinuous in δ at the value that Oster’s formula evaluates it.

Q: What modification do the authors propose to recover sign robustness?

A: They propose adding Assumption A5, which bounds the magnitude of omitted variable bias: |βlong − βmed| ≤ M for a researcher-specified M ≥ 0. Under this restriction, the identified set BI(δ, R²_long, M) is intersected with [βmed − M, βmed + M], and it becomes possible for the sign change breakdown point to exceed 1. The practical difficulty is that M must be chosen with substantive justification, and the authors show via meta-analysis that the conventional choice M = |βmed| (equivalent to assuming the sign of βlong is already known) applies to more than 50% of regressions in their sample.

Q: What do the meta-analyses show about the gap between explain away and sign change breakdown points in practice?

A: For 34 primary regressions from top-five journal papers (2019–2021) with R²_long = 1, the median explain away breakdown point is 2.65 while the median sign change breakdown point (M = 10|β̂med|) is 1.15 and without the M restriction is 0.96. At the 90th percentile, the explain away point is 13.22 versus a sign change point (M = 10|β̂med|) of only 1.66. The second meta-analysis (141 regressions from 55 papers, 2008–2013) produces qualitatively similar results.

Q: Why does the paper flag the implicit sign assumption embedded in many applications of Oster’s method?

A: Using the explain away breakdown point as evidence of sign robustness implicitly requires that M = |βmed|, which is equivalent to constraining βlong ∈ [0, 2βmed] — that is, assuming the sign of βlong is the same as the sign of βmed. The paper shows (Table 4) that across both meta-analytic samples, more than 50% of regressions make this implicit sign assumption in order to interpret the explain away breakdown point as informative about sign robustness.

Q: What is δ, and what are its interpretive limitations?

A: δ is the ratio of (cov(X, γ′2,long W2)/var(γ′2,long W2)) to (cov(X, γ′1,long W1)/var(γ′1,long W1)), measuring the relative magnitude of selection on unobservables versus observables. As Cinelli and Hazlett (2020) show, it is a double ratio: the ratio of the treatment-unobservable association to the treatment-observable association, divided by the ratio of their outcome effects. This double-ratio structure leads to counter-intuitive behavior: a single omitted variable that is only modestly related to treatment can produce δ values far from 1 if the observable control is also only weakly related to treatment, even if the omitted variable is not strongly confounding in an absolute sense.

Q: What assumption is required for the entire sensitivity analysis framework, and how restrictive is it?

A: Assumption A4 requires that all observed covariates W1 are uncorrelated with all unobserved covariates W2 (exogenous controls). The authors note this is a strong assumption in many empirical settings. A companion paper (Diegert, Masten, and Poirier 2025a) addresses the case where controls are endogenous.

Q: What do the authors recommend as best practice?

A: They recommend two practices: (1) plotting the full estimated identified set for the coefficient of interest across a range of assumptions about omitted variables, rather than relying on a single bias-adjusted point estimate; and (2) reporting sign change breakdown points as robustness summary statistics in addition to (or instead of) explain away breakdown points. Both are implemented in the companion Stata module regsensitivity.

Explain Away Breakdown Point: The smallest value of the sensitivity parameter |δ| required for the data to be consistent with a zero causal effect (βlong = 0). This is the quantity computed by Oster’s Proposition 2 and commonly reported as “Oster’s delta.”

Sign Change Breakdown Point: The smallest value of |δ| required for the data to be consistent with a causal effect of opposite sign from the baseline estimate. The paper proves this is bounded above by 1 in Oster’s framework, regardless of the magnitude of the explain away breakdown point.

Oster’s δ: The ratio of the regression of treatment X on the omitted variable index (γ′2,long W2) to the regression of X on the observed covariate index (γ′1,long W1), measuring relative selection on unobservables versus observables. Interpreted as a double ratio: (treatment-unobservable association / treatment-observable association) ÷ (outcome effect of unobservable index / outcome effect of observable index).

Identified Set BI(δ, R²_long): The set of values of βlong consistent with the observed data and a given value of δ and R²_long. Characterized as roots of a cubic polynomial. Has a vertical asymptote at δ = 1, meaning the set can include arbitrarily large or small values of βlong as δ approaches 1.

Bias Magnitude Restriction (Assumption A5): A bound M ≥ 0 on the magnitude of omitted variable bias: |βlong − βmed| ≤ M. Adding this assumption intersects the identified set with [βmed − M, βmed + M], allowing the sign change breakdown point to potentially exceed 1 and making sign robustness conclusions possible.

Coefficient Stability Analysis: A class of empirical methods that assess omitted variable bias by comparing regression coefficients across specifications that include different sets of covariates. The intuition is that if adding observed controls substantially raises R² but barely moves the coefficient, further omitted variable bias is likely small. Formalized by Altonji, Elder, and Taber (2005) and extended by Oster (2019b).

Near Multicollinearity (in this context): The situation in which treatment X and the combined covariate vector (W1, W2) are nearly collinear. In Oster’s framework, this arises precisely at δ = 1 and produces the vertical asymptote in the identified set, making the bias-adjusted estimand discontinuous and potentially unbounded near this value.

The Effects of Medical Debt Relief: Evidence from Two Randomized Experiments

Mon, 01 Jan 0001 00:00:00 +0000

Layer 1: Overview

Research Question

This paper asks whether relieving downstream medical debt — debt that has been sold to third-party debt collectors — causes improvements in financial outcomes, mental and physical health, and healthcare utilization for recipients. The question is motivated by a large correlational literature documenting strong associations between medical debt and adverse outcomes, and by the rapid expansion of government and private debt relief programs that, as of mid-2024, had committed or planned over $14.6 billion in relief.

Data and Design

The authors partnered with RIP Medical Debt (a non-profit that purchases and forgives medical debt for government and private donors) to conduct two randomized controlled trials between March 2018 and October 2020. In total the experiments relieved medical debt with a face value of $169 million for 83,401 people.

Hospital debt experiment: RIP purchased a random subset of debt from a large for-profit hospital system at the juncture when the hospital would normally sell accounts to a debt collector (approximately one year after the medical service). The purchase price was 5.5 cents per dollar of face value. The treatment group consisted of 14,377 people who received $19 million in face-value relief (average of $1,321 per person). The 61,496-person control group had their debt pursued by the collector under normal protocol.
Collector debt experiment: RIP purchased a random subset of older debt already under collection on the secondary market for several years, at a price of less than one cent per dollar. The treatment group consisted of 69,024 people who received $150 million in face-value relief (average of $2,167 per person). The 68,014-person control group retained their debt.
Credit reporting sub-experiment: Partway into the collector debt experiment, the debt collector ceased reporting medical debt to the credit bureaus, reflecting an industry-wide trend. The authors isolate 2,761 accounts (6.8% of wave 1) that were reported prior to treatment assignment to estimate the effects of debt relief when accounts would have been counterfactually reported, compared to the subsequent no-reporting environment.

Outcomes are tracked using quarterly depersonalized credit bureau data from TransUnion (spanning at least four quarters before to four quarters after treatment), collections account data on future bill accrual, and a multimodal survey of 2,888 hospital debt experiment respondents measuring mental and physical health, healthcare utilization, and financial wellness. The primary credit-bureau outcome is the number of accounts past due; the primary survey outcome is the share with at least moderate depression (PHQ-8).

Main Findings

Credit market outcomes (main experiments): In both the hospital and collector debt experiments — where there is no counterfactual credit bureau reporting — debt relief has no average effect on financial distress, credit access, or credit utilization. The effect on the number of accounts past due is -0.01 (statistically insignificant; 95% CI excludes effects smaller than -0.04, relative to a control mean of 1.20). Effects on credit card balances (95% CI: -$42 to $47 relative to a mean of $1,481) and auto loan balances (95% CI: -$235 to $148 relative to a mean of $8,020) are similarly precise nulls. These null effects hold for the hospital debt sample (younger debt, 1.3 years old on average) and the collector debt sample (older debt, 7.0 years old on average), and across all preregistered subgroups.
Credit reporting sub-experiment: When control group accounts are counterfactually reported, debt relief immediately raises credit scores by an economically small average of 3.4 points (p-value 0.021), with a larger 13.8-point increase (p-value 0.008) for persons with no other debt in collections. Credit limits grow gradually, reaching $340 (15.3% of the post-reporting control mean of $2,231; p-value 0.010) after the no-reporting period begins, with larger effects for those with no other debt in collections. Once control group reporting ceases, both the credit score and credit limit effects converge to zero for those with other debts in collections. No effects on borrowing or financial distress measures are detected in this sub-experiment.
Collections account outcomes (bill repayment): Debt relief causes a statistically significant 1.1 percentage-point increase in the probability of having another unpaid bill sent to collections (6.6% of the control mean of 16.2%; p-value < 0.05) and a $15 increase in the dollar amount of future medical debt sent to collections (7.2% of the control mean of $208). The increase is almost entirely attributable to pre-relief medical services, indicating reduced repayment of existing bills rather than greater healthcare utilization.
Survey outcomes: There are no detectable average effects on depression (primary outcome), anxiety, stress, subjective well-being, or general health. Debt relief raises the share with at least moderate depression by a statistically insignificant 3.2 percentage points (p-value 0.097; control mean 45.0%); a 95% CI rules out a reduction of more than 0.6 percentage points, well below the 7.0 percentage-point improvement predicted by the median expert respondent. There are similarly null effects on healthcare utilization and financial wellness as measured in the survey.

Scope Conditions

The study focuses specifically on downstream medical debt in collections — debt that has already been through the hospital billing cycle and sold to third-party collectors. Results do not necessarily apply to upstream debt relief (e.g., financial assistance programs applied closer to the time of the medical event), nor to populations with different baseline financial profiles. The credit reporting results are most relevant to the prior regime of widespread reporting; under the current environment in which most medical debt has been removed from credit reports, the credit-access channel is largely foreclosed.

In depth

Q1. Why did the authors focus specifically on downstream medical debt in collections, and how does this define the scope of their study?

The authors focus on downstream medical debt because this is the target of essentially all large-scale government and private relief programs working with RIP Medical Debt, and because it is the category of debt that is most comprehensively observable. Downstream medical debt is defined as bills that have been or are about to be sold by the healthcare provider to a third-party debt collector. This focus excludes upstream unpaid bills still held by the hospital, bills being paid over time, and medical expenses charged to credit cards. The distinction matters because prior literature on hospital financial assistance programs finds substantial benefits from upstream interventions that relieve debt closer to the precipitating medical event; the authors’ null results are explicitly scoped to the downstream, post-collection stage.

Q2. Why did the purchase price of medical debt (5.5 cents per dollar for hospital debt, less than 1 cent per dollar for collector debt) suggest caution about expected financial impacts ex ante?

The authors argue that in a competitive market, the purchase price of medical debt reflects the sum of expected recovery rates and collection costs. A price of 5.5 cents per dollar implies that actual recovery (what collectors expect to collect from patients) is very low. Even if all of the expected recovery is passed through to the patient as a financial benefit, the direct liquidity gain from debt forgiveness is a small fraction of the debt’s face value. For the collector debt experiment, where the purchase price is less than 1 cent per dollar, the expected direct financial benefit to recipients is even smaller. The authors note that survey respondents expected to pay 54% of their outstanding medical debt and thought it fair to pay 37%, suggesting that perceived (rather than actual) payment obligations may be what connects medical debt to financial behavior.

Q3. How was random assignment implemented in the hospital debt experiment, and what design features ensure the validity of the experiment?

Within each of 18 waves between August 2018 and October 2020, RIP received a portfolio of unpaid bills from the hospital system. Persons were grouped at the individual level and stratified by the amount of debt, state of residence, insurance status, and a collections score predicting repayment likelihood. Within strata, persons were randomly assigned to treatment or control, with approximately 20% treated per wave (varying with donor funding). The hospital was unaware of the intervention, eliminating scope for selection of particularly uncollectible accounts. Treatment notification occurred via two letters sent approximately three and six weeks post-purchase. Balance tests confirm successful randomization: all p-values on baseline characteristics are above 0.05, and F-tests fail to reject joint balance.

Q4. What was the credit reporting sub-experiment and how was it identified?

The debt collector in the collector debt experiment historically reported medical debt to the credit bureaus but largely ceased doing so before the first intervention wave (March 2018), reflecting broader industry concerns about CFPB enforcement and data integrity risk. However, a subset of accounts — 2,761 accounts (6.8% of wave 1, with virtually identical match rates across treatment and control) — were still being reported until 2019 Q1 (three quarters after wave 1 and one quarter after wave 2). This created a natural sub-experiment: for this subset, treatment group accounts were removed from credit reports immediately upon debt relief, while control group accounts continued to be reported for three more quarters before also being removed. The authors identify reported accounts by matching dollar amounts in collections account data to credit bureau tradeline data in the four quarters prior to intervention, and use this variation to estimate effects separately for the “reporting” and “no-reporting” periods.

Q5. What are the exact estimated effects on credit scores and credit limits in the credit reporting sub-experiment?

During the three quarters when control group accounts are still reported to credit bureaus, debt relief raises credit scores by an average of 3.4 points (p-value 0.021) for the full reporting subsample. The effect is concentrated among those with no other debt in collections: 13.8 points (p-value 0.008) versus 1.2 points (p-value 0.440) for those with other debt in collections. Credit limits increase gradually, reaching $340 (15.3% of the post-reporting control mean of $2,231; p-value 0.010) by the four quarters after control group reporting ceases. Among persons with no other debt in collections, this credit limit effect grows to $922 (23% of the control mean; p-value 0.070). Once control group reporting stops, both the credit score effect and the credit limit growth converge to zero for persons with other debts in collections. The event study coefficients show the credit limit effect growing approximately linearly over five quarters post-intervention before leveling out.

Q6. How does the paper rule out the possibility that medical debt relief increases healthcare utilization, thereby causing more future medical bills?

The collections account analysis separates future debt accrual into debt associated with pre-relief medical services (which can only result from reduced repayment of existing bills) and post-relief medical services (which could reflect either increased utilization or changed repayment of new bills). Panel B of Table VI shows that virtually all of the increased debt sent to collections — a $15 increase and 1.1 percentage-point increase in the probability of any future collection — is attributable to pre-relief services. Panel C shows statistically insignificant increases in future debt from post-relief services. The authors therefore attribute the effect to reduced payment of existing bills and conclude they “cannot rule in or rule out effects on healthcare utilization” for the post-relief services channel, but the dominant mechanism is behavioral change in repayment of already-incurred debt.

Q7. What are the three mechanisms proposed to explain the reduction in repayment of existing medical bills, and which mechanism is rejected?

The authors offer three candidate mechanisms for the 6.6% relative increase in the probability of future bill collections: (i) an expectations mechanism, in which beneficiaries reduce payments because they anticipate future debt relief from similar charitable programs; (ii) a targeting mechanism, drawing on Dobkin et al. (2018), in which patients tolerate a certain level of indebtedness — relieving some debt creates “room” in their debt budget, so they reduce payment of remaining bills to return to that target level; and (iii) a confusion mechanism, in which recipients mistakenly believe the relief applied to non-forgiven bills (the notification letter explicitly stated “the forgiveness is for this outstanding bill only” but patients may not have internalized this). The income effect or “flypaper” mechanism — the idea that financial relief of existing debt frees up mental-account resources for paying medical bills, thereby increasing repayment — is explicitly rejected by the data, as the effect goes in the direction of less repayment, not more.

Q8. What did the expert survey predict, and how did those predictions compare to the experimental estimates?

An expert survey conducted between April and May 2022 — after the interventions were completed but before results were released — asked academics, non-profit staff, hospital revenue-cycle practitioners, and policymakers to predict the impact of the hospital debt experiment. The median expert predicted a 7.0 percentage-point reduction in depression (8.0 points when weighted by confidence), a 10.2 percentage-point reduction in borrowing (13.7 points when confidence-weighted), and meaningful improvements in healthcare access. In total, 75.6% of respondents predicted medical debt relief is at least a moderately valuable use of charity resources, and 51.1% thought it very or extremely valuable. The authors estimate a statistically insignificant 3.2 percentage-point increase in depression (not a decrease), and a 95% confidence interval that rules out a reduction in depression of more than 0.6 percentage points — far below the 7.0 percentage-point expert prediction.

Q9. What survey methodology was used, and what response rate was achieved?

The survey, administered by NORC at the University of Chicago, targeted a random subset of 14,922 hospital debt experiment participants who entered the study after September 2019 (waves 6-18) and owed at least $500. The protocol spanned 13 weeks and included five postal mailings (including a $2 upfront incentive and a $5 incentive with the paper survey), twice-weekly email reminders, certified mail delivery of the full survey instrument, and telephone interviews by a US-based call center. Respondents received a $50 completion incentive. The protocol achieved a 19.4% response rate, with 68% responding via web, 10% via telephone, and 23% via mail. The survey was titled “Health and Financial Wellness Study” and made no reference to RIP Medical Debt to avoid priming respondents. Respondents were surveyed on average 13 months after treatment assignment (interquartile range 10 to 17 months).

Q10. What heterogeneity in survey outcomes was detected, and how do the authors interpret the anomalous depression finding for high-debt recipients?

Across all four preregistered heterogeneity dimensions (medical debt amount, age of debt, age of person, amount of other debt in collections), null effects on survey outcomes were found in 15 of 16 subgroups. The exception is persons in the fourth quartile of medical debt eligible for relief, for whom debt relief caused a statistically significant 12.4 percentage-point increase in depression (p-value 0.002) relative to a control mean of 45.9%, with similar patterns for anxiety, stress, subjective well-being, and general health. The authors consider this may be a statistical fluke given the null results across all other 15 groups. They also note potential parallels with findings from unconditional cash transfer experiments, where the receipt of transfers raised the salience of financial deprivation without addressing its underlying causes. A charity-stigma mechanism (recipients did not request the assistance) is also considered. The authors caution against giving this result undue weight in the overall assessment.

Q11. How does the paper position downstream debt relief relative to upstream interventions, and what does prior evidence suggest about upstream alternatives?

The authors highlight that their null results do not extend to upstream medical debt relief. Adams et al. (2022), studying a hospital financial assistance program at Kaiser Permanente that bundled debt relief with reductions in cost-sharing close to the time of the medical event, found substantial increases in high-value healthcare utilization. The Oregon Health Insurance Experiment (Baicker et al. 2013) found that Medicaid reduced depression by 9 percentage points among low-income uninsured adults. The authors suggest several reasons why downstream relief may fail: the intervention occurs too late after the precipitating event (approximately 15 months after the medical service in the hospital debt experiment, and about 7 years in the collector debt experiment), patients may have habituated to the stress of debt collections, the relief amount may be too small relative to overall financial distress, and the direct financial benefit is inherently limited by the low market price of collections-stage debt.

Q12. How do the authors address concerns about differential survey response and external validity?

Treated persons were a statistically insignificant 1.3 percentage points more likely to respond to the survey (p-value 0.056). The authors address this in two ways. First, they estimate specifications that (i) add rich observable controls and (ii) use speed of survey response as a proxy for unobserved response propensity; neither exercise changes the estimates meaningfully. Second, to probe external validity, they test for heterogeneous effects by predicted response propensity (from a logistic regression of a response indicator on baseline characteristics) and by speed of response; neither yields evidence of differential effects for non-respondents. They also compare credit bureau treatment effects for the full hospital debt sample, the survey outreach sample, and the survey respondent sample and find similar estimates across all three groups.

Key Concepts

Downstream medical debt: Medical bills that have already been sent to third-party debt collectors by the healthcare provider after the initial billing cycle, as distinguished from upstream unpaid bills still held by the hospital at or near the time of the medical event. The paper studies debt at this late stage specifically because it is the target of most large-scale relief programs.

Credit reporting sub-experiment: An embedded quasi-experiment within the collector debt RCT, exploiting the fact that a subset of accounts (6.8% of wave 1) were still being reported to credit bureaus at the time of intervention while the debt collector had already ceased reporting for the remaining accounts. This allows separate estimation of debt relief effects with and without counterfactual credit bureau reporting, using the period until 2019 Q1 (when the collector stopped reporting entirely) as the “reporting” window.

Downstream bill repayment effect: The paper’s finding that debt relief increases the probability of a subsequent unpaid medical bill being sent to collections. The paper attributes this primarily to reduced repayment of existing pre-relief medical bills rather than to increased healthcare utilization, consistent with an expectations, targeting, or confusion mechanism — and inconsistent with an income or flypaper effect that would increase repayment.

Targeting a level of indebtedness: A behavioral model (drawn from Dobkin et al. [2018]) in which patients implicitly target a certain level of indebtedness. Under this model, relieving some debt creates headroom in the patient’s implicit debt budget, leading to reduced repayment of remaining bills to restore the targeted level of total indebtedness.

Expert survey (pre-results): A structured elicitation of predicted treatment effects conducted between April and May 2022 — after the interventions were completed but before results were released — from academics, non-profit practitioners, hospital revenue-cycle managers, and policymakers. Used as a benchmark to quantify how far the causal estimates fall below prevailing beliefs, and to document that the null results were ex ante surprising to informed observers.

PHQ-8 (Patient Health Questionnaire-8): An eight-item validated clinical screen for depression, used as the paper’s primary preregistered survey outcome. An indicator for “at least moderate depression” on the PHQ-8 is the main mental health measure against which the debt relief treatment effect is estimated.

Multimodal survey: A survey protocol combining five postal mailings, twice-weekly email reminders, certified mail delivery of a paper survey instrument, and US-based call center telephone interviews, designed to maximize response rates in a hard-to-reach low-income population with medical debt in collections.

The Environmental Bias of Corporate Income Taxation

Mon, 01 Jan 0001 00:00:00 +0000

This paper documents and quantifies an “environmental bias” embedded in the U.S. corporate income tax code: CO2-intensive (“dirty”) firms systematically face lower effective tax rates than clean firms, constituting an implicit subsidy on pollution. The authors — Iovino, Martin, and Sauvagnat — establish this cross-sectional fact, trace it to a specific mechanism, provide causal evidence using the 2017 Tax Cuts and Jobs Act (TCJA), and quantify aggregate emissions implications using a calibrated multi-sector general-equilibrium model.

Data and sample. The empirical analysis combines firm-level CO2 emissions from Trucost (scope 1 greenhouse gases) with financial data from Compustat North America for U.S. publicly listed firms, 2003–2021, yielding 11,223 firm-year observations with positive pretax and gross capital income. Effective tax rates are measured as income taxes paid divided by gross capital income (sales minus COGS minus SGA expenses, adding back R&D).

Cross-sectional finding. A one-standard-deviation increase in CO2 intensity is associated with a decrease in the effective tax rate equal to approximately 9% of its standard deviation (coefficient −0.021 to −0.022, significant at 1%). The negative relationship is entirely explained by the lower taxable fraction of gross capital income for dirty firms — that is, by larger interest expense deductions — rather than by differences in the statutory tax rate applied to pretax income.

Mechanism. The chain of causation runs: CO2-intensive production requires tangible capital (primarily machinery and equipment) → tangible capital serves as collateral → higher collateral supports higher debt → higher debt generates larger interest deductions (the “tax shield of debt”) → lower effective tax rates. Once PPE-to-capital-income is controlled for, the coefficient on CO2 intensity in leverage, pretax income, and tax regressions becomes small and statistically insignificant. The relationship holds both across and within industries, including within the energy sector, though the dominant variation is cross-industry.

Causal evidence: TCJA 2017. The paper exploits the federal corporate tax rate cut from 35% to 21% (effective January 2018) in a difference-in-differences design, comparing firms in the top quartile of 2017 CO2 intensity (“dirty”) to cleaner firms. Dirty firms experienced a relative increase in their federal effective tax rate of 2.4 percentage points post-reform. Correspondingly, dirty firms’ total assets grew approximately 11% less than clean firms post-reform. This translates to a semi-elasticity of firm total assets to a one-percentage-point increase in the effective tax rate of approximately −4.8. Parallel pre-trends are confirmed visually and via Rambachan-Roth (2023) sensitivity analysis; a placebo using non-federal taxes shows no differential effect. Results survive controls for other TCJA provisions (interest deductibility limits, international tax changes, net operating loss restrictions), exposure to import tariffs and carbon taxes, leave-one-industry-out specifications, and a triple-difference using foreign firms.

General-equilibrium model and counterfactuals. A 375-sector model with input-output networks (both intermediate and investment networks), financial frictions linking equipment to debt capacity, and endogenous CO2 emissions through fossil fuel usage is calibrated to 2017 BEA and Compustat data. In the Cobb-Douglas benchmark, the 2017 tax cut raises output by 5.9% and emissions by only 4.5% — a less-than-proportional emissions response because clean sectors expand relatively more. A counterfactual eliminating the tax shield of debt while simultaneously cutting the tax rate from 35% to 30% (to hold GDP constant) reduces aggregate emissions by 1.3% with output declining only 0.1%. When equipment and fuel are treated as complements (elasticity of substitution below 1), the emissions reduction under the same policy rises to over 3.7%, implying an absolute reduction of 80–240 million metric tons of CO2 from 2017’s total of 6,457 million metric tons. Monetized at the social cost of carbon, this ranges from USD 8–24 billion (conservative, ~USD 100/ton) to USD 112–336 billion (USD 1,400/ton per Bilal and Kanzig 2024).

Q: What is the central empirical finding of the paper? A: CO2-intensive firms in the U.S. face systematically lower effective corporate income tax rates than clean firms. A one-standard-deviation increase in CO2 intensity is associated with a roughly 9% of a standard deviation decrease in the ratio of taxes paid to gross capital income. This negative relationship is robust to alternative emissions measures (EPA data, scope 2 and 3 emissions), alternative tax scalings (taxes over sales or assets), log CO2 emissions, and leave-one-industry-out specifications.

Q: What is the mechanism linking CO2 intensity to lower effective tax rates? A: Dirty firms rely on tangible capital — specifically machinery and equipment — to produce. Tangible capital is pledgeable as collateral, enabling higher debt. Higher debt generates larger interest expense deductions under the tax code (the “debt tax shield”), which reduces taxable income relative to gross capital income. Once PPE-to-capital-income is included as a control, the coefficient on CO2 intensity in regressions of leverage, pretax income, and taxes paid all become small and statistically insignificant, confirming that PPE fully mediates the relationship.

Q: Which component of tangible capital drives the result? A: Machinery and equipment, not buildings, leases, land, natural resources, or construction in progress, explains virtually the entire positive relationship between PPE and CO2 intensity. This finding is based on the Compustat breakdown of PPE components available for roughly 70% of sample firms.

Q: Does the mechanism operate within industries or only across them? A: Both. Decomposing firm CO2 intensity into an implied industry component (sales-weighted from pure-play firms) and a firm residual, both components are significantly associated with higher tangible capital, leverage, lower taxable fraction of capital income, and lower taxes paid at the 1% level. However, the largest share of the total effect stems from cross-industry variation. Within the energy sector specifically, firms with greater fossil fuel production capacity (from EPA/EIA data) also have more tangible capital, higher debt, and lower effective tax rates.

Q: How does the 2017 TCJA cut affect clean versus dirty firms differently? A: Because dirty firms already shield a large fraction of their capital income from taxation via interest deductions, a uniform cut in the statutory rate benefits them less in proportional terms. The difference-in-differences estimates show that dirty firms (top quartile of 2017 CO2 intensity) experienced a relative increase in their federal effective tax rate of 2.4 percentage points post-reform compared to clean firms, and their total assets grew approximately 11% less than clean firms post-reform. The semi-elasticity of firm assets to a one-percentage-point increase in effective tax rate is approximately −4.8.

Q: How is the parallel trends assumption supported? A: Event-study graphs show no pre-2018 divergence in federal effective tax rates or asset growth between dirty and clean firms. A placebo test using non-federal income taxes (which should be unaffected by the federal statutory rate change) shows no differential post-reform effect. The Rambachan-Roth (2023) sensitivity analysis confirms that the null of no differential effect can be rejected at the 1% level allowing for pre-trend deviations up to M = 0.5, and at the 10% level up to M = 1.

Q: What robustness checks address other provisions of the TCJA and concurrent shocks? A: The authors exclude or control for firms affected by the TCJA’s interest deductibility limitation, multinational firms (more than 20% foreign sales), firms with large loss carryforwards, and manufacturing firms — results are unchanged. They also control for firm-level exposure to import tariff changes and carbon taxes (using the World Carbon Pricing Database), with coefficients of interest remaining virtually unchanged. Leave-one-industry-out specifications and a triple-difference using foreign firms (comparing U.S. dirty vs. clean firms pre/post-2018, against foreign equivalents in countries with stable tax rates) yield a semi-elasticity of −5.8, if anything larger than the baseline.

Q: What does the general-equilibrium model add that the difference-in-differences cannot? A: The DiD design identifies relative effects of the tax cut on dirty versus clean firms but cannot recover the absolute effect on aggregate output and emissions. The GE model, calibrated to 2017 data and validated against the untargeted DiD estimates, quantifies aggregate impacts: the 2017 tax cut raises steady-state output by 5.9% while emissions rise by only 4.5% — a less-than-proportional increase due to compositional reallocation toward clean sectors.

Q: What does the counterfactual removing the debt tax shield find? A: Eliminating the tax shield of debt while simultaneously lowering the corporate tax rate from 35% to 30% (to keep GDP constant) reduces aggregate emissions by 1.3% (Cobb-Douglas benchmark) while total output falls only 0.1% and GDP remains constant by design. The emissions reduction arises because clean sectors, which rely more on less-pledgeable capital, are made relatively cheaper once the tax advantage of debt is removed, redirecting demand away from CO2-intensive sectors.

Q: How does the complementarity assumption between equipment and fuel affect the results? A: When equipment and fuel are modeled as complements (elasticity of substitution below 1) rather than Cobb-Douglas substitutes, both policy counterfactuals yield larger emissions effects. For the tax shield removal policy, the predicted emissions reduction rises from 1.3% to over 3.7% as complementarity strengthens. This is because policies that raise the cost of equipment also induce firms to cut fuel consumption, amplifying the direct compositional effect.

Q: What is the quantified absolute emissions impact of removing the tax shield? A: Given 2017 U.S. total emissions of 6,457 million metric tons, the model predicts an absolute reduction of 80–240 million metric tons of CO2, depending on the assumed complementarity between equipment and fuel. Monetized at conservative estimates (~USD 100/ton), the policy saves USD 8–24 billion; at USD 1,400/ton (Bilal and Kanzig 2024), the value rises to USD 112–336 billion. The authors note that the physical quantity measure is more reliable than the monetized figure given uncertainty in the social cost of carbon.

Q: How does this paper relate to the ECB bond purchasing literature? A: Piazzesi et al. (2022) document that the ECB’s market-neutral bond purchases implicitly favor dirty firms because those firms issue more bonds due to higher tangible capital holdings. This paper identifies the same underlying mechanism — tangible capital → debt capacity — but on the tax side, showing that the corporate income tax code independently provides an implicit subsidy to dirty firms through the debt tax shield.

Q: What is the policy implication for the debt tax shield specifically? A: The debt tax shield — the deductibility of interest payments but not dividends — has no clear economic rationale (both are returns to capital) and, per several policy proposals (CBO 1997, IMF 2016), is a candidate for elimination. This paper adds a new dimension: the tax shield indirectly subsidizes CO2 emissions by differentially benefiting capital-intensive, CO2-intensive sectors. A revenue-neutral reform eliminating the shield can reduce emissions without sacrificing GDP.

Effective tax rate (paper’s definition): The ratio of corporate income taxes paid to gross capital income, where gross capital income equals sales minus cost of goods sold minus SGA expenses plus R&D spending. This differs from the tax-to-pretax-income ratio because it captures how much of total capital earnings — before any deductions — is remitted as tax.

Debt tax shield (tax advantage of debt): The reduction in corporate tax liability arising from the deductibility of interest payments on corporate debt. Because dividends are not deductible, debt-financed capital faces a lower after-tax cost than equity-financed capital. The shield’s value is estimated at approximately 10% of firm value in prior literature.

CO2 intensity: Metric tons of CO2 equivalent per USD 1,000 of output (tCO2/k$). The sample average is 0.1 tCO2/k$, with a heavily right-skewed distribution (median 0.02, 99th percentile 1.5).

Environmental bias of corporate taxation: The paper’s central concept — the systematic difference in effective tax rates between dirty and clean firms that arises not from explicit environmental policy but from the interaction of the debt tax shield with the capital structure of CO2-intensive industries. This constitutes an implicit subsidy on pollution embedded in the corporate income tax.

Asset pledgeability (psi): The fraction of a firm’s assets recoverable by creditors in the event of default. In the model, equipment has higher pledgeability than other capital (estimated b_psi = 0.23 additional pledgeability for equipment, a_psi = 0.35 base). Higher pledgeability allows firms to sustain more debt and thus benefit more from the tax shield.

User cost of capital: The total cost to a firm of using one unit of capital, combining depreciation, tax allowances from accelerated depreciation, and the financing cost advantage of debt over equity. The model formalizes how both the equity-financed component and the debt advantage component respond to tax rate changes, with the debt advantage term being larger for firms with more pledgeable (tangible) capital.

Investment network: An input-output structure capturing which sectors’ outputs are used to produce each type of capital good. The paper extends vom Lehn and Winberry (2021) by constructing separate equipment and non-equipment investment networks across 375 non-fuel BEA sectors, enabling emissions accounting that includes capital production alongside direct production inputs.

The Long-Run Impacts of Public Industrial Investment on Local Development and Economic Mobility: Evidence from World War II

Mon, 01 Jan 0001 00:00:00 +0000

Overview

Research Question. Does government-led construction of large manufacturing plants in previously under-industrialized regions generate long-run improvements in regional economic development and in the lifetime earnings of the incumbent residents who were already living there at the outset? And, if so, through what mechanism — developmental improvements during childhood or expanded adult labor market opportunities?

Setting and Identification. The paper exploits the United States industrial mobilization for World War II, specifically the construction of 90 large, government-financed, newly-built manufacturing plants (each costing $10 million or more in contemporary dollars, approximately $150 million in 2020 dollars) in dispersed locations outside the major prewar manufacturing hubs. Strategic and security considerations — not economic optimization — drove the military to insist these plants be sited away from congested industrial centers. Because private firms were unwilling to finance construction in isolated locations with uncertain postwar value, the government built them directly as government-owned, contractor-operated (GOCO) facilities through the Defense Plant Corporation. Site selection within the set of sufficiently populated regions was governed by idiosyncratic, short-run factors — the immediate availability of suitable parcels, informal connections to procurement officers, and expedience — rather than systematic economic characteristics of the receiving counties. The paper documents no systematic association between publicly-funded wartime plant construction and prewar county-level economic or demographic characteristics conditional on population size, and finds parallel prewar trends and balanced outcome levels across treatment and comparison counties in all decades leading up to WWII. A placebo test using 1910-to-1940 intergenerational mobility in matched Census records confirms no differential prewar upward mobility in treatment counties.

The comparison group consists of 1,400 counties outside the 100 largest prewar manufacturing counties that did not receive large public plants. Treatment assignment for individuals is based on birth county, not adult county of residence, enabling the paper to track outcomes regardless of where individuals ultimately live.

Data. The analysis draws on the 1945 War Production Board data book for plant-level investment; county-level panels from Decennial and Economic Censuses spanning 1900–2000; the SSA NUMIDENT file (birth county and date); IRS Form 1040 individual income tax returns in 1969, 1974, 1979, and 1984 (covering wage earnings and adjusted gross income); the full-count 1940 Census (parent earnings, demographics); the 2000 Census long form (educational attainment); and W-2 earnings histories from the SSA Detailed Earnings Record matched to a CPS-linked subsample, with employer information linked to the Business Register.

Regional Effects. By 1970, counties receiving large public wartime plants had approximately 30 percent higher manufacturing employment, 20 percent larger populations, and 7–8 percent higher median family income than comparison counties. Manufacturing employment as a share of total employment rose and remained elevated through the 1970s before converging toward parity with the comparison group by 1990. Treated counties were permanently larger — with population stabilizing at a new, persistently higher equilibrium roughly 20 percent above comparison counties by end of century — even after the manufacturing employment share converged, consistent with path dependence and multiple equilibria. Average production worker pay in manufacturing rose by approximately 10 percent, closely tracking value-added per worker, while average retail wages rose by only one-third as much and were not statistically significant in most years. In the 40 years after the war, treated counties saw median family earnings increase by 5–10 percent, concentrated in higher average wages and employment shares in manufacturing and semi-skilled blue-collar occupations, with limited effects on non-manufacturing, white-collar occupations, or female individual income.

Individual Earnings Effects. Men born in treatment counties in the 18 years before the war (birth cohorts 1922–1940) earned approximately $1,200–$1,300 more per year (2020 dollars) in average wage earnings reported on 1040 returns in 1969, 1974, 1979, and 1984 — an increase of 2.5–3 percent and roughly a one-percentile rise in the national earnings distribution. Effects were largest for children of parents at the bottom of the 1939 earnings distribution: children of the lowest-income parents saw adult wage earnings rise by approximately $1,800–$2,000 per year (3–4 percent), with effects declining linearly by parent rank and effectively vanishing for children of the highest-earning parents. Black men experienced larger average earnings effects (4–6 percent, or $1,500–$2,500 in 2020 dollars) than White men (2–3 percent, or $1,000–$1,500), with the racial earnings gap estimated to have narrowed by about 2 percent in the treatment group. When examining Form 1040 returns (tax-unit level), effects are comparable for men and women, but W-2 individual earnings data from the SSA-CPS subsample show no positive effect on women’s own earnings — the 1040 effects for women are entirely driven by their husbands’ higher earnings.

Mechanism. The balance of evidence points to access to higher-wage jobs in adulthood as the primary channel, rather than developmental human capital improvements accumulated during childhood. War plants modestly increased male educational attainment — children from the lowest-earning families completed approximately one-quarter of a year more schooling and were 3 percentage points more likely to graduate high school — but education effects are too small to account for the full earnings increase. Critically, there is no gradient in earnings effects by birth cohort: children who were younger at the start of the war and therefore had longer childhood exposure to improved regions did not benefit more, contradicting a childhood exposure-effect mechanism as in Chetty and Hendren (2018b). Adult earnings effects are entirely accounted for by adult location: conditioning on 1979 county of residence eliminates the treatment effect. Stayers in treatment counties show large earnings differences relative to stayers in comparison counties, while movers show none. Men born in treatment counties are also directly documented to have worked in industries with higher wage premiums as adults, with coarse industry classification alone accounting for approximately one-third of the estimated log wage increase.

Policy Scope Conditions. The paper argues these effects are specific to the WWII postwar institutional context — high global demand for U.S. manufactured goods, limited international competition, labor-intensive production techniques, and strong union bargaining power — conditions that no longer hold. Reexamination of “million-dollar plant” openings in the 1980s and 1990s shows manufacturing employment expanded but average manufacturing wages did not increase, suggesting contemporary plant openings do not generate the same high-wage opportunities. The association between manufacturing employment density and upward mobility visible in 1950 has entirely vanished by the end of the twentieth century.

In depth

Q1. What exactly defines the treatment group, and why were these plants built by the government rather than private firms?

A: The treatment group consists of 90 counties outside the 100 largest prewar manufacturing regions that received at least one new, fully publicly-financed manufacturing plant costing $10 million or more (approximately $150 million in 2020 dollars) under the WWII industrial mobilization. Private firms refused to finance construction in dispersed, isolated locations with highly uncertain postwar value; the Air Force historians recorded that “industrialists’ reluctance to invest in dispersed plant facilities was at odds with the government’s hope that private capital could finance new inland construction.” The government built and owned these facilities as GOCO plants, operated by private firms under contract. The 353 plants meeting the cost threshold (including both large and smaller public plants) account for 70 percent of all spending on new plants during the war.

Q2. How do the authors establish that plant siting was quasi-random conditional on population size?

A: Identification rests on three forms of evidence. First, historical documents show procurement decisions were driven by idiosyncratic factors — availability of a suitable parcel, informal connections to procurement officers, short-run expedience — rather than systematic economic characteristics. Members of Congress had little ability to influence siting, and Rhode et al. (2018) find little evidence that federal politics drove the geographic distribution of wartime spending. Second, balance tests (estimating prewar county characteristics as outcomes in Equation 1) show no significant differences between treatment and comparison counties in earnings levels, demographics, manufacturing development, or industrial composition after conditioning on 1940 population, with a joint p-value of 0.30 (0.36 when also conditioning on geography and infrastructure). Third, a placebo test using children in the 1910 Census matched to the 1940 Census finds no differential economic outcomes or upward mobility rates in counties that would eventually receive treatment plants, conditional on basic region size.

Q3. What are the county-level effects on the structure of the labor market in the medium run?

A: By the 1960s–1970s, treated counties had higher predicted union coverage rates and a greater share of men in semi-skilled production occupations, driven primarily by movement away from farm work and supplemented by higher male labor force participation. Average wages in craftsperson and operator occupations rose by 8 percent in treated counties — more than double the increase in wages for high-skill professional and managerial occupations. Treated counties had 8 percent higher median male individual incomes by 1979. Effects on female median individual income were minimal, and there were no effects on female labor force participation rates.

Q4. What is the estimated magnitude of the individual earnings effects, and how do they vary by parent income?

A: Men born in treatment counties averaged $1,200–$1,300 more per year in real wage earnings (2020 dollars) on 1040 tax returns across the four observation years 1969, 1974, 1979, and 1984, a 2.5–3 percent increase equivalent to roughly one percentile in the national earnings distribution. Heterogeneity by parent rank is pronounced and monotone: children of parents at the very bottom of the 1939 earnings distribution gained approximately $2,000 per year (about 4 percent), while children of the highest-earning parents experienced no significant effect. When county weighting is equalized to eliminate the differential representation of rural (lower-income) counties, effects are roughly constant across the bottom six deciles of the parent earnings distribution and then drop steeply at the top, showing that the earnings gradient was not simply an artifact of plant openings in poorer, smaller counties.

Q5. How did effects differ by race?

A: Wartime plant construction increased annual adult earnings of Black men by 4–6 percent ($1,500–$2,500 in 2020 dollars) and of White men by 2–3 percent ($1,000–$1,500 in 2020 dollars). The racial earnings gap in the treatment group is estimated to have narrowed by about 2 percent. However, the pattern of heterogeneity by parent income differs by race: for White men, effects are largest for children of below-median parents and effectively zero for children of above-median parents. For Black men, the largest effects — 7–10 percent ($4,000–$5,000 in 2020 dollars) — accrue to children of parents with earnings above the pooled-race national median, while effects for lower-income Black families range from 3–6.5 percent, suggesting that Black workers from higher-income backgrounds particularly benefited from wartime anti-discrimination policies and the opening of previously restricted manufacturing occupations.

Q6. Why do the 1040 returns show comparable effects for men and women, while W-2 data show no effect on women’s individual earnings?

A: Form 1040 returns are filed at the tax-unit level — for married couples, they report the combined wages of both spouses. Because more than 80 percent of women in the sample are married, an increase in a husband’s earnings raises the joint 1040 figure for both spouses. The SSA-CPS subsample with individual W-2 records shows that the entire effect on men’s Form 1040 wages directly reflects increases in their own W-2 earnings, while women’s own W-2 earnings show no positive treatment effect. This finding is consistent with county-level evidence of no impact on female individual income or female labor force participation, and with Rose (2018) finding that women were almost universally excluded from manufacturing jobs after the war’s conclusion despite high wartime female manufacturing employment.

Q7. What evidence tests the developmental-effects mechanism?

A: Three tests argue against childhood developmental effects as the primary driver. First, educational attainment effects — while statistically significant for children of the lowest-income parents (approximately one-quarter of a year more schooling, 3 percentage points more likely to graduate high school) — are too small to account for the earnings increase: a Mincer-equation calculation shows that the education effects can explain less than one-half of the estimated effect on 1979 wages. Second, there is no gradient in earnings effects by birth cohort — children younger at the war’s start, who had longer post-treatment childhood exposure, did not benefit more, in direct contrast to the Chetty-Hendren childhood-exposure framework. Third, postwar in-migrants into treatment counties were not drawn from better-educated or higher-income families and did not themselves have more education than in-migrants into comparison regions, ruling out peer effects from selective in-migration.

Q8. What evidence directly implicates adult labor market access as the operative mechanism?

A: Four pieces of evidence point to contemporaneous adult labor market access. First, individuals born in treatment counties lived as adults in counties with 3–4 percent higher median male earnings and higher wages in semi-skilled blue-collar occupations but not in highly-skilled professional occupations — a pattern quantitatively consistent with the individual earnings effects. Second, the entire earnings effect is concentrated among those who remain in their birth counties: stayers in treatment counties show earnings differences of similar magnitude to county-level manufacturing wage effects, while movers show no difference compared to movers from comparison counties. Third, conditioning on 1979 county of residence eliminates the earnings effect entirely (1979 location fixed effects specification). Fourth, using W-2 data matched to the Business Register in the SSA-CPS sample, men born in treatment counties are directly shown to work in industries with higher wage premiums, with coarse industry classification alone accounting for approximately one-third of the log wage increase.

Q9. Is the persistence of regional effects driven by continued Cold War military spending at the plants?

A: No. The paper separates ordnance and ammunition plants — which predominantly became GOCO facilities or Air Force Bases after WWII and received disproportionately more Vietnam War-era defense spending — from general manufacturing plants, which overwhelmingly transitioned to privatized civilian production. Both types of plants show similarly persistent effects on manufacturing employment and comparable impacts on the long-run earnings of local children. Moreover, general manufacturing plants — which did not generate increased postwar military spending — had large permanent effects on overall population growth, while ordnance plants had smaller population effects. The persistence therefore does not appear to reflect continued federal expenditure.

Q10. What mechanism explains the permanent population effect even after manufacturing employment shares converge?

A: The authors interpret the permanent population differential — treated counties remain roughly 20 percent larger than comparison counties even at the end of the 20th century, after manufacturing employment shares converge — as evidence of path dependence and multiple equilibria. Once a region reaches a new, larger equilibrium, self-sustaining forces (expanded non-tradable employment, public infrastructure investment) maintain it. Treatment counties are more likely to have been connected to the interstate highway system in subsequent decades and show positive effects on local government capital outlays for utilities. The medium-term persistence is attributed partly to the sunk costs of site establishment (surveying, local approvals, infrastructure connections), which make reinvestment at existing sites more attractive than greenfield construction elsewhere.

Q11. Do smaller plant openings generate comparable effects?

A: No. Counties receiving smaller publicly-financed plants costing between $1 and $10 million show no detectable effects on manufacturing employment, population, median family income, or individual adult earnings comparable to those from the large plants. The authors cannot rule out the presence of small effects, but the null results for smaller plants — combined with evidence that the largest effects are in counties with the highest investment intensity per 1940 resident — are consistent with threshold effects (“big push”) in regional development, though the wide confidence intervals do not allow the authors to conclusively distinguish threshold effects from a linear-in-investment model.

Q12. What do modern “million-dollar plant” openings reveal about the contemporary relevance of these findings?

A: Reexamining plant openings from Greenstone et al. (2010) using an event-study design, the authors find that 1980s–1990s million-dollar plant openings expanded manufacturing employment (consistent with Greenstone et al.) but had no impact on average manufacturing wages — in sharp contrast to the WWII findings. Slattery and Zidar (2020) similarly find no impacts on county-level incomes for plant openings since 2000. The correlation between manufacturing employment density and upward mobility rates visible in 1950 had entirely vanished by the end of the 20th century. The authors attribute the divergent results to the changed institutional environment: contemporary production is highly automated, relies on interchangeable labor from staffing agencies, faces intense international competition, and is conducted under much weaker collective bargaining institutions.

Q13. What is the paper’s assessment of aggregate welfare implications?

A: The paper is explicit that its local estimates do not allow clean conclusions about aggregate effects. Publicly-financed plant construction in peripheral locations may have crowded out private investment that would otherwise have occurred in major manufacturing hubs. If so, the documented regional gains represent geographic reallocation of manufacturing activity rather than a net increase in the aggregate plant stock. Aggregate gains from reallocation would require that the benefits in the selected dispersed locations exceeded what would have occurred in the counterfactual locations — a plausible conjecture given the paper’s evidence that effects are larger in counties with lower prewar manufacturing employment shares and lower initial market access, but one the authors cannot demonstrate decisively.

Key Concepts

Government-Owned, Contractor-Operated (GOCO) Plants: Manufacturing facilities built and owned by a U.S. government agency (typically the Defense Plant Corporation) during WWII but built and operated by private firms under cost-plus contracts. GOCO status meant the government bore full construction risk and that post-war disposition (sale to private buyers at a fraction of construction cost, or continued GOCO operation for ordnance production) was determined by public agencies, not by the constructing firm’s investment calculus.

Place-Based Predistribution: The paper’s term for the mechanism by which wartime plant construction raised the incomes of existing residents — not through ex-post redistribution of income via taxes and transfers, but by expanding the set of high-wage employment opportunities available to incumbent workers in the region, thereby changing the pre-tax, pre-transfer wage structure facing those workers.

Adult Labor Market Access (vs. Childhood Developmental Exposure): A distinction the paper draws in explaining why children born in treated counties had higher adult earnings. The “developmental exposure” mechanism (as in Chetty and Hendren 2018b) implies benefits scale with the amount of time spent in an improved childhood environment. The “adult labor market access” mechanism means children benefit irrespective of years of childhood exposure because they can access improved local labor market conditions when they reach working age as adults — what the paper operationalizes through the finding that earnings effects are entirely accounted for by 1979 county of residence and are concentrated among individuals who remain in their birth counties.

Upward Mobility (Absolute and Relative): Following Chetty et al. (2014), the paper uses both concepts: absolute upward mobility means children from low-income backgrounds have higher lifetime earnings than comparable children in counterfactual regions; relative upward mobility means their outcomes converge toward those of children from affluent backgrounds. The paper documents both: large earnings effects for the lowest parent-income deciles, declining linearly to zero for the top deciles.

Conditional Independence (Plant Siting as Quasi-Random): The paper’s identification assumption — that among counties with observably similar population sizes and basic geographic/infrastructure characteristics, the specific choice of plant siting locations was driven by idiosyncratic, short-run factors uncorrelated with potential postwar outcomes. This is a level-balance assumption (not merely a parallel-trends assumption), required because individual outcomes are only observed in the post-period.

Industry Wage Premium: The paper uses Krueger and Summers (1988) estimates of inter-industry wage differentials (the portion of a sector’s average wage unexplained by worker characteristics) to classify adult employers of treated individuals. Finding that men born in treatment counties work at employers in higher-premium industries — with industry category alone explaining approximately one-third of the log wage increase — provides direct evidence of the adult labor market access mechanism operating through industry sorting.

Path Dependence / Multiple Equilibria in Regional Development: The paper documents that treated counties remain permanently larger in population than comparison counties even after manufacturing employment shares converge and the original plants begin to close. This self-sustaining population differential, inconsistent with a unique spatial equilibrium, is interpreted as evidence that the temporary wartime shock shifted treated regions into a permanently higher equilibrium, sustained by subsequent infrastructure investment and non-tradable sector expansion proportional to the larger population base.

The Productivity of Professions: Evidence from the Emergency Department

Mon, 01 Jan 0001 00:00:00 +0000

This paper studies the productivity of nurse practitioners (NPs) versus physicians performing overlapping tasks in Veterans Health Administration (VHA) emergency departments (EDs), exploiting a quasi-experiment created by the VHA’s December 2016 grant of full practice authority to NPs. The identification strategy instruments patient assignment to NPs versus physicians using quasi-random variation in the number of NPs on duty on a given ED-day, conditional on ED-by-time-category fixed effects. The sample covers 1.1 million ED visits across 44 VHA EDs from January 2017 to January 2020, seen by 1,348 physicians and 156 NPs. The instrument is validated by demonstrating balance in patient observable characteristics across values of the instrument, stability of IV estimates across 256 combinations of patient covariate controls, and absence of spillover effects from NP presence onto physician performance.

On average in the ED setting, NPs increase patient length of stay by 11 percent (approximately 18 additional minutes) and raise the cost of the ED visit by 7 percent (approximately $66 per visit). NPs raise the 30-day preventable hospitalization rate by 0.25 percentage points, a 20 percent increase relative to the mean. No statistically significant effect on 30-day mortality is detected (95 percent confidence interval: -0.34 to 0.11 percentage points). OLS estimates carry the opposite sign because NPs are assigned healthier patients in observational data; the IV design corrects for this selection.

The average NP-physician performance gap varies systematically by case complexity and severity. For the highest-complexity quartile of cases (by Elixhauser comorbidities), NPs increase ED costs by 12 percent and length of stay by 28 percent. For cases at or above the 95th percentile of severity (based on 30-day mortality by diagnosis), NPs increase ED costs by 25 percent, length of stay by 99 percent, and admissions by 26 percentage points (42 percent relative to the mean), while reducing 30-day preventable hospitalization by 3 percentage points — suggesting that NPs’ higher care intensity partially offsets worse intrinsic skill for the most severe cases. For lower-complexity cases, the cost and length-of-stay gaps are smaller, but NPs still significantly raise preventable hospitalizations.

NPs exhibit clinical decision-making patterns consistent with lower diagnostic skill: they are more likely to order consults (2.6 percentage points, or 11 percent of the mean), CT scans (1.2 percentage points, or 8.3 percent), and X-rays (2.0 percentage points, or 6.9 percent). NPs lower opioid prescriptions by 1.8 percentage points (20 percent of the mean) and raise antibiotic prescriptions by 4.0 percentage points (6.3 percent of the mean), consistent with threshold adjustment under lower diagnostic skill with asymmetric error costs. Downstream, patients treated by NPs incur similar opioid use disorder rates despite lower opioid prescribing, and higher infection-related return visit rates despite higher antibiotic prescribing.

Counterfactual analysis finds that allocating one quarter of ED patients to NPs increases net spending by $129 million per year to the VHA after accounting for NPs’ lower wages (approximately half of physicians’). However, deploying NPs exclusively to the least-complex quarter of cases reduces net spending to approximately one-fifth of this amount.

A distributional analysis deconvolving provider-specific IV estimates reveals that within-profession productivity variation substantially exceeds the average between-profession gap. The interquartile range in annual spending attributable to provider productivity within each profession is approximately $900,000, roughly three times the mean annual spending difference between the average NP and the average physician. A randomly chosen NP outperforms a randomly chosen physician in up to 38 percent of pairs. Within professions, individual provider productivity shows essentially no relationship with wages or case complexity assigned, whereas between professions, case assignment and wages are strongly sorted by professional class.

Q: What is the core research question? A: The paper asks whether NPs and physicians, who perform overlapping tasks in the ED but differ sharply in training, selectivity, and pay, differ in productivity, and how that average between-profession difference compares to productivity variation within each profession. It also asks what mechanisms drive any observed gap and how case assignment responds to provider skill differences.

Q: What is the identification strategy and why is it credible? A: The authors instrument patient assignment to NPs with the number of NPs on duty on the ED-day, conditional on ED-by-year, ED-by-month, ED-by-day-of-week, and ED-by-hour fixed effects. Credibility rests on: provider schedules being set months in advance, decoupling NP availability from arriving patient characteristics; patient characteristics being well balanced across values of the instrument conditional on fixed effects; IV estimates being stable across all 256 covariate-control combinations; and on-duty physician and NP characteristics also being balanced across the instrument.

Q: What are the main average effects of NPs on resource use? A: IV estimates show NPs increase patient length of stay by 11 percent (approximately 18 minutes) and ED cost by 7 percent (approximately $66 per visit). There is no significant average effect on inpatient admissions in the overall sample, though NPs significantly raise admissions for high-severity cases.

Q: What is the effect of NPs on patient health outcomes? A: NPs raise 30-day preventable hospitalizations by 0.25 percentage points, a 20 percent increase relative to the mean. The 95 percent confidence interval for 30-day mortality is -0.34 to 0.11 percentage points, implying no statistically significant mortality effect in the overall sample.

Q: Why do OLS and IV estimates have opposite signs? A: In observational data, NPs treat healthier patients than physicians: NP patients are younger (60.7 versus 62.5 years), have fewer Elixhauser comorbidities (3.2 versus 3.7), and have fewer prior inpatient stays (0.4 versus 0.7). This selection causes OLS estimates of NP effects to be negative. The IV corrects for this by exploiting quasi-random variation in NP availability; IV estimates are stable across all combinations of patient controls, consistent with the instrument being orthogonal to unobservable patient health.

Q: How does the NP-physician performance gap vary with case complexity and severity? A: For the highest-complexity quartile, NPs increase length of stay by 28 percent and ED costs by 12 percent without a significant preventable hospitalization effect. For cases at or above the 95th severity percentile, NPs increase length of stay by 99 percent, ED costs by 25 percent, and admissions by 26 percentage points (42 percent relative to the mean), while reducing 30-day preventable hospitalization by 3 percentage points. For lower-complexity quartiles, NPs show smaller cost and length-of-stay effects but significantly raise preventable hospitalizations, suggesting the higher care intensity at high severity compensates for lower skill.

Q: What does the heterogeneity by severity imply for optimal case assignment? A: The pattern is consistent with skill-task matching: NPs have a comparative and absolute disadvantage in complex cases, so optimal assignment directs less complex cases to NPs and fewer patients to NPs when physicians are more available. Empirically, NPs are indeed assigned healthier patients from the available pool, and are assigned a modestly smaller share when the ED is less busy.

Q: What mechanisms explain the average NP-physician gap? A: Three mechanisms are examined. First, experience: a one-standard-deviation increase in specific experience is associated with a 5.8 percent decline in the NP-physician length-of-stay gap, and general experience with a 10 percent decline; however, experience does not significantly narrow the preventable hospitalization gap. Second, information acquisition: NPs order more consults, CT scans, and X-rays, consistent with compensating for lower diagnostic skill. Third, prescription thresholds: NPs reduce opioid prescribing by 20 percent and raise antibiotic prescribing by 6.3 percent, consistent with threshold adjustment under asymmetric error costs, but downstream outcomes are not improved correspondingly.

Q: What do prescription patterns and downstream outcomes reveal about NP diagnostic skill? A: NPs prescribe fewer opioids yet patients treated by NPs obtain similar downstream opioid use disorder rates; NPs prescribe more antibiotics yet patients treated by NPs have higher rates of return visits with infections. This pattern is consistent with NPs exhibiting higher rates of both false positives and false negatives, not merely adjusted thresholds, suggesting genuinely lower diagnostic skill rather than threshold differences alone.

Q: What do counterfactual cost calculations show? A: Allocating one quarter of ED patients to NPs raises non-wage spending by $197 million per year to the VHA; after accounting for NP wages being half of physician wages (approximately $120,000 versus $240,000 per year), net cost is still $129 million per year. Restricting NP deployment to the least-complex quarter of cases reduces net spending to approximately one-fifth of this amount, illustrating that targeted case assignment substantially improves NP cost-effectiveness.

Q: How large is within-profession productivity variation relative to between-profession differences? A: The interquartile range in annual spending attributable to provider productivity within each profession is approximately $900,000, roughly three times the mean annual spending difference between the average NP and the average physician. A randomly chosen NP outperforms a randomly chosen physician in up to 38 percent of random pairs. The authors conclude that, despite stark differences in training and selection between professions, within-profession variation dominates.

Q: Is individual provider productivity reflected in wages or case assignment within professions? A: Within each profession, provider productivity shows essentially no relationship with wages or with the complexity of assigned cases. This contrasts sharply with between-profession patterns, where professional class strongly predicts both wages (NPs earn approximately $120,000 per year versus $240,000 for physicians) and assigned case complexity. The authors interpret this as evidence of informational and organizational frictions in recognizing individual productivity within professional classes, and note that professional class is a far stronger predictor of pay and case assignment than is individual productivity.

Q: How do complier characteristics relate to the broader patient population? A: Compliers — cases whose provider type is determined by the instrument — are healthier than the average case: younger, with fewer comorbidities, fewer prior inpatient stays, and lower predicted mortality. Never-takers are riskier than the average case. There are no always-takers since patients cannot be assigned to NPs on days when no NPs are on duty.

Q: How does this paper relate to the literature on NP scope-of-practice laws? A: The scope-of-practice literature estimates general-equilibrium effects of allowing NPs greater autonomy, including labor reallocation between professions. This paper instead estimates the partial-equilibrium causal effect of assigning a patient to an NP versus a physician, holding the broader labor market fixed. The two literatures are complementary: the heterogeneity findings here suggest that scope-of-practice expansions may be more beneficial in lower-complexity primary care settings where the NP-physician performance gap is smaller.

Q: What are the policy implications of the findings? A: Three implications are highlighted. First, the efficiency of using NPs depends critically on case assignment: deploying NPs on the least-complex cases reduces net costs to approximately one-fifth of indiscriminate deployment. Second, the substantial overlap between NP and physician productivity distributions provides support for NP use in less complex settings even within the ED context. Third, within-profession productivity variation far exceeding between-profession differences suggests that individual-level productivity assessment, rather than professional class, may be a more accurate guide to case assignment and compensation.

Quasi-experimental variation in NP availability: The identification strategy exploits day-to-day variation in the number of NPs scheduled to work in a given VHA ED, conditional on ED-by-time-category fixed effects, as an instrument for whether a patient is assigned to an NP versus a physician. Schedules are set months in advance, rendering the NP count orthogonal to arriving patient characteristics conditional on those fixed effects.

30-day preventable hospitalization: A standardized quality-of-care outcome defined by the Agency for Healthcare Research and Quality, measuring hospitalizations occurring within 30 days of ED discharge that are classified as preventable given adequate prior outpatient management. Used by the paper as the primary downstream health outcome beyond the ED visit itself.

Elixhauser comorbidities: A set of 31 binary indicators for chronic conditions (e.g., cancer, diabetes) based on medical histories in the prior 365 days, used in this paper to measure and stratify case complexity into quartiles for heterogeneity analysis.

Productivity distributions within professions: Provider-specific productivity estimates derived from a just-identified IV model that instruments assignment to individual providers by indicators for on-duty providers, then deconvolved into underlying distributions using the Efron (2016) and Kline-Rose-Walters (2022) method. These distributions characterize the spread of productivity within each professional class, separate from measurement error.

Prescription threshold adjustment: The mechanism, formalized in Chan, Gentzkow, and Yu (2022), by which providers with lower diagnostic skill optimally adjust treatment thresholds in response to asymmetric costs of false-positive versus false-negative errors. In this paper’s application, NPs lower the opioid prescription rate (where false positives carry higher costs: addiction and overdose) and raise the antibiotic prescription rate (where false negatives carry higher costs: untreated infection), but downstream outcomes do not improve correspondingly.

Skill-task matching: The organizational economics principle (Acemoglu and Autor 2011) that efficiency requires assigning more complex tasks to higher-skilled workers. The paper documents that between professions, case assignment broadly follows this principle (NPs receive less complex patients on average), but within professions, essentially no matching between individual provider productivity and case complexity is observed.

Full practice authority (VHA, December 2016): The VHA policy that allowed NPs to treat patients independently without physician supervision at VHA facilities, superseding state-level restrictions. This policy change defines the start of the paper’s sample period and establishes the institutional context in which the quasi-experiment occurs, as it removed the requirement for physician oversight that previously constrained NP independence.

When Did Growth Begin? New Estimates of Productivity Growth in England from 1250 to 1870

Mon, 01 Jan 0001 00:00:00 +0000

Overview

Research Question. When did sustained productivity growth begin in England? This paper constructs new estimates of the evolution of productivity in England from 1250 to 1870, with the goal of both dating the onset of growth and using that dating to discriminate between competing theories of why growth began.

Methodological Innovation. The core challenge is that real wages over this period were heavily distorted by Malthusian population dynamics. Plague-induced population collapses (most dramatically the Black Death of 1348, which killed roughly 25% of England’s population) drove enormous swings in real wages that reflect movements along a stable labor demand curve, not changes in productivity. A naive regression of wages on labor supply is therefore inconsistent, because in a Malthusian world productivity growth induces population growth, making labor supply endogenous to productivity.

The authors address this by writing down and structurally estimating a full Malthusian model of the economy. Output is produced with fixed land and variable labor (and, in an extended model, capital) via a Cobb-Douglas production function. The labor demand curve equates the real wage to the marginal product of labor. Population growth is increasing in real per-capita income (the Malthus law of motion), capturing both preventive and positive checks. Productivity follows a random walk with drift, and the paper allows for two structural breaks in the average drift rate mu. Exogenous population shocks, modeled as infrequent, sizable plague draws from a beta distribution plus a Gaussian noise term, provide identification: plague shocks and productivity shocks generate observationally distinct dynamics – plague shocks cause an immediate population drop that gradually reverts, while productivity shocks cause an immediate wage jump followed by a slow population rise to a new steady state. The model is estimated via Bayesian Hamiltonian Monte Carlo (Stan), and structural break dates for mu are chosen by maximizing the Bayes factor (marginal likelihood) over the observed data on real wages, population, and days worked per worker.

Key Data. Real wages are from Clark (2010) unskilled building workers series. Post-1540 population is from Wrigley et al. (1997); pre-1540 population trends are from Clark (2007b) manorial records. Days worked per worker are from Humphries and Weisdorf (2019). All series are used as decadal averages.

Main Findings.

Onset of growth: 1600. Productivity growth was zero before 1600. The Bayes factor strongly favors a first structural break in mu at 1600; break dates before 1590 and after 1640 are clearly rejected.
Two-phase post-1600 growth. Between 1600 and 1810, average productivity growth was 4% per decade (posterior mean; 95% credible interval approximately 2%-6%). After 1810, productivity growth accelerated sharply to 18% per decade (95% CI approximately 12%-23%). The second break date is estimated to 1810 (the only alternative not clearly rejected is 1800).
Magnitude of productivity change. By the authors’ estimates, productivity in England was approximately 540% higher in 1850 than in 1500. This contrasts sharply with Clark’s (2010) dual-approach TFP series, which implies essentially no change over this period. The authors attribute the discrepancy to mismeasurement in Clark’s land rent series.
Productivity growth preceded the Glorious Revolution. Productivity rose by an estimated 48% between 1600 and 1680, well before the Glorious Revolution of 1688 and the English Civil War (1642-1651). This supports the view that economic change contributed to causing the bourgeois institutional reforms of the 17th century, consistent with the Marxist tradition (Hill, 1940, 1961), rather than that institutional change preceded and caused growth.
Weakness of Malthusian population force. The elasticity of population growth with respect to real income (gamma) is estimated at 0.09. Combined with a slope of the labor demand curve (alpha) of 0.53, this implies a half-life of plague-induced population dynamics of approximately 150 years. A doubling of real per-capita income stimulated population growth by only 6 percentage points per decade – indicating Malthusian forces were sufficiently weak to be overwhelmed by post-1800 productivity growth. The model implies that the post-1810 productivity growth rate would have produced a 28-fold long-run increase in steady-state real wages even without the Demographic Transition.
Capital extension. When capital is explicitly incorporated, using rates of return on agricultural land and rent charges to infer the capital stock, results are broadly similar: productivity growth from 1600-1810 is 3% per decade and post-1810 is 14% per decade. Capital’s production function exponent is estimated at 0.18, confirming that capital accumulation explains only a modest share of growth.

Scope Conditions. All estimates are for England specifically. The model assumes competitive factor markets, a Cobb-Douglas (or CES) production function, and a log-linear Malthusian population law of motion. Results are robust to alternative wage series (farm laborers, craftsmen, Allen’s series), alternative population sources (Broadberry et al., 2015), constant-days-worked assumption, and alternative prior distributions.

In depth

Q1. Why can’t standard OLS regression of wages on labor supply recover productivity in this setting?

In a Malthusian world, productivity growth causes population growth, which in turn raises labor supply. This means labor supply and productivity are positively correlated, biasing OLS estimates. The authors demonstrate this concretely: from 1300 to 1450 (plague era), wages and labor supply moved in opposite directions along a stable labor demand curve, while after 1630 the same data points begin shifting off that curve – a pattern that OLS would confound with changes in the slope rather than shifts in the intercept.

Q2. How do the authors distinguish empirically between a plague shock and a productivity shock?

The two shocks generate fundamentally different dynamics. A plague shock causes an immediate, large drop in population and a corresponding spike in wages; over time, high wages induce population growth and both wages and population gradually return to their pre-plague levels. A permanent productivity shock, by contrast, causes an immediate rise in wages with no contemporaneous population change; population then slowly rises and wages partially revert until a new, higher steady-state population is reached. The model exploits these different impulse-response signatures in the joint data on wages and population to identify the two shocks separately.

Q3. What is the Bayes factor evidence for the 1600 break date?

Figure 8 in the paper shows the Bayes factor for models with different first break dates (all holding the second break at 1810). The Bayes factor rises sharply from 1580 to 1600 and falls more gradually from 1600 to 1650. Break dates before 1590 and after 1640 are clearly rejected using the standard rule of thumb that a Bayes factor of 10 constitutes strong evidence. The 1600-1810 pair of break dates yields the highest marginal likelihood of any combination considered.

Q4. How does the paper’s productivity estimate compare to Clark’s (2010) dual-approach TFP series?

Clark’s series implies productivity in England was essentially unchanged between the 15th and mid-19th centuries – a result the paper argues is implausible and inconsistent with Allen’s (2005) agricultural TFP estimates (which show a 162% increase in agricultural TFP between 1500 and 1850). The authors’ baseline estimate implies productivity was approximately 540% higher in 1850 than in 1500. The authors conjecture that a key driver of the difference is mismeasurement in Clark’s land rent series, which appears essentially flat from 1250 to 1600 despite enormous plague-induced swings in the land-labor ratio over this period.

Q5. What does the Malthusian model imply about “Engel’s Pause” – the apparent stagnation of real wages during early industrialization?

Between 1730 and 1800, real wages fell slightly despite what the model estimates to be substantial productivity growth. The conventional explanation is that the gains from early industrialization accrued to capitalists rather than workers. The authors offer an alternative Malthusian explanation: England’s population grew rapidly over this period, and in the Malthusian model this population growth depressed wages relative to productivity. The authors do not reject the distributional explanation but show that Malthusian forces alone are sufficient to explain the wage-productivity divergence.

Q6. How quantitatively important are days worked (the Industrious Revolution) for the productivity estimates?

The authors find that their productivity estimates are largely insensitive to whether the Humphries-Weisdorf (2019) days-worked series or a constant-days assumption is used. The qualitative pattern – zero growth before 1600, modest growth 1600-1810, rapid acceleration post-1810 – and the quantitative magnitudes remain similar. What does change is the estimated slope of the labor demand curve alpha: assuming constant days makes the labor demand curve steeper. This robustness is reassuring given that the Industrious Revolution is a contested empirical phenomenon.

Q7. What does the model imply about the speed of Malthusian population dynamics, and how does this compare to prior estimates?

The estimated elasticity of population growth to real income gamma = 0.09, combined with alpha = 0.53, implies a half-life of population dynamics of approximately 150 years. This is consistent with but lies between prior structural estimates: Lee and Anderson (2002) find a half-life of 107 years, and Crafts and Mills (2009) find 431 years. All estimates agree that Malthusian dynamics in England were slow relative to the conceptual ideal of rapid subsistence convergence.

Q8. Can the model explain the post-1750 population explosion without invoking the Demographic Transition?

Yes. The authors simulate predicted population paths from 1740 to 1860 taking real wages and days worked as given and using their estimated gamma and alpha. Despite the weak Malthusian population force, the model can explain the vast majority of the observed population growth from 6 million in 1740 to nearly 20 million in 1860 (10.4% per decade). The key mechanism is that days worked increased substantially over this period, raising per-capita income well above what real wages alone would suggest.

Q9. How does incorporating capital change the productivity estimates?

In the capital-augmented model, the capital stock is inferred from rates of return on agricultural land and rent charges (Clark 2002, 2010). The capital exponent beta is estimated at 0.18, indicating a modest role for capital in pre-industrial England. Average productivity growth from 1600-1810 falls from 4% to 3% per decade, and post-1810 growth falls from 18% to 14% per decade. The authors conclude that the vast majority of growth from 1600 to 1870 cannot be attributed to capital accumulation. From 1600 to 1860, the estimated capital stock grew by a factor of five (8% per decade).

Q10. What theories of the onset of growth are consistent vs. inconsistent with the authors’ timing evidence?

Inconsistent: The North-Weingast (1989) view that the Glorious Revolution of 1688 was the key institutional trigger, since productivity had already risen 48% between 1600 and 1680. Also inconsistent: gradual-growth theories (Kremer 1993, Galor-Weil 2000) in which there is no discrete acceleration. Consistent: Marxist accounts (Hill 1940, 1961) that economic change drove 17th-century institutional change; Acemoglu-Johnson-Robinson (2005) accounts linking Atlantic trade enrichment to the demand for secure property rights (timing broadly consistent, though growth rates do not visibly accelerate after the Civil War or Glorious Revolution); cultural-change accounts (Mokyr, McCloskey) tracing the onset of growth to the spread of literacy and scientific rationalism around 1600; Allen’s (2009a) directed-technical-change theory linking 17th-century wage growth to the later profitability of labor-saving innovation in the Industrial Revolution.

Q11. What does the model imply about the long-run real wage consequences of post-1810 productivity growth, even counterfactually assuming Malthusian forces persisted?

The steady-state real wage in the Malthusian model is w-bar = mu/(alpha*gamma) minus subsistence-related terms. For mu = 0.018 (the post-1810 estimate), this formula implies a long-run real wage 28 times higher than the steady state under zero productivity growth. In other words, even if the Demographic Transition had not occurred and birth and death rates had remained sensitive to income, post-1810 productivity growth was fast enough relative to the weak Malthusian force to generate substantial sustained rises in living standards.

Key Concepts

Labor demand curve (in the paper’s sense). The equilibrium relationship between real wages and labor supply derived from competitive profit maximization by landowners facing a fixed land endowment: w_t = phi - alpha*l_t + a_t. Productivity is identified as shifts in this curve across time periods. The slope alpha is not simply the land share under a CES production function but equals one minus the labor share divided by the elasticity of substitution between labor and land.

Malthusian population force. The feedback mechanism by which higher real wages induce faster population growth, expanding labor supply and pushing wages back toward a steady state. Its speed is governed jointly by gamma (elasticity of population growth with respect to income) and alpha (slope of the labor demand curve); the half-life of wage/population dynamics after a shock equals log(0.5)/log(1 - alpha*gamma). In the paper’s estimates, this force was sufficiently weak (half-life approximately 150 years) that post-1800 productivity growth overwhelmed it.

Plague shock (xi_1t). An infrequent, large, exogenous negative population shock modeled as a draw from a beta distribution occurring with probability pi. Plagues are the primary source of identifying variation for the pre-1600 period: they generate movements along a stable labor demand curve and allow the slope alpha and the (lack of) productivity trend to be separately identified from labor demand shifts.

Structural break in average productivity growth (mu). The drift parameter in the random-walk model for the permanent component of productivity. The paper allows two breaks in mu, with break dates chosen to maximize the marginal likelihood (Bayes factor). The best-fitting breaks are at 1600 (zero to 4% per decade) and 1810 (4% to 18% per decade).

Permanent vs. transitory productivity component. Productivity is decomposed into a permanent component a-tilde_t (random walk with drift, sigma_epsilon1) and a transitory component epsilon_2t (iid noise, sigma_epsilon2). The paper reports and interprets the permanent component as the meaningful measure of underlying technological change; transitory shocks are treated as measurement error and short-run fluctuations.

Industrious Revolution. The hypothesized long-run increase in days worked per worker in England, associated with de Vries (1994, 2008). The paper uses Humphries-Weisdorf (2019) estimates showing a sharp drop after the Black Death followed by a sustained rise from 1350 onward. A key robustness result is that the paper’s productivity estimates are insensitive to whether this Industrious Revolution is assumed to have occurred.

Bayes factor (model selection). The ratio of marginal likelihoods p(y|M_t)/p(y|M_t’) for two competing models, used here to select structural break dates for mu. A factor of 10 is treated as strong evidence. The bridge sampling method of Gronau, Singmann, and Wagenmakers (2020) is used to compute marginal likelihoods.

Forthcoming | Macro Paper Warehouse

Diversion Risk, Markups, and the Financing Cost Advantage of Trade Credit

In depth

Q1. Why does a higher markup make trade credit more attractive?

Q2. What is the unique empirical prediction that distinguishes the financing cost channel?

Q3. What do the Chilean export data show?

Q4. How does the paper handle endogeneity of markups?

Key concepts

Optimal Taxation of Inflation

In depth

Q1. What is TIP and what externality does it correct?

Q2. What is the complete-specialization result?

Q3. Does TIP exacerbate relative price distortions?

Q4. How large are the stabilization gains from TIP?

Q5. What equivalent instruments does the paper consider?

Key concepts

A Welfare Analysis of Policies Impacting Climate Change

Additionality and Asymmetric Information in Environmental Markets: Evidence from Conservation Auctions

Balance-Sheet Policy and the Term Premium: High-Frequency Evidence

In depth

Q1. Does balance-sheet policy move long rates through the term premium or through expected short rates?

Q2. How is the effect identified, and why high-frequency?

Q3. What does this imply for the pace of balance-sheet runoff?

Key concepts

Bottom-Up Markup Fluctuations

Layer 1 — Overview

In depth

Q1. What is the central mechanism by which granular firm-level shocks generate markup cyclicality?

Q2. Why does the sign of markup cyclicality differ depending on the level of aggregation?

Q3. What is the within-between decomposition of sectoral markup changes and what does it imply quantitatively?

Q4. How do variable markups affect granular aggregate output volatility relative to a model with constant markups?

Q5. What does the model predict for firm-level markup cyclicality, and how heterogeneous is this across firm size?

Q6. How does the paper calibrate the key demand elasticities, and what are the resulting pass-through implications?

Q7. Why do aggregate productivity shocks not affect markups in the model, and what are the implications for aggregate markup cyclicality?

Q8. How does the paper address the potential measurement-error bias in the negative correlation between markups and marginal costs?

Q9. Is the 50-50 within-between decomposition of sectoral markup changes robust to the choice of competition mode?

Q10. What do model simulations imply for the magnitude and cyclicality of aggregate markups versus the data, and what is the role of variable versus constant markups?

Q11. How does the paper reconcile its findings with prior literature on markup cyclicality (Bils et al. 2018 vs. Nekarda and Ramey 2013)?

Q12. What are the data limitations and how do they affect the interpretation of results?

Key Concepts

Central bank communication by ??? The economics of monetary policy leaks

Layer 1 — Overview

In depth

Q1. How is a “leak” defined in this paper, and how are Eurosystem leaks identified empirically?

Q2. What are the broad trends in the number and topic composition of Eurosystem leaks over 2002–2021?

Q3. How does the timing of leaks within the policy meeting cycle change across sub-periods?

Q4. What regression evidence supports the view that leaks are not random accidents?

Q5. Does the number of pre-meeting leaks predict policy changes?

Q6. How large are the financial market reactions to leaks relative to placebo events and to attributable statements?

Q7. Do the market effects of leaks differ by topic?

Q8. Do leaks move market expectations in the direction of the subsequent policy outcome?

Q9. Do leaks counteract or reinforce prevailing trends in market expectations?

Q10. Do post-announcement leaks dampen the transmission of monetary policy surprises to longer-term rates?

Q11. Does more intense pre-leak attributable communication reduce the market impact of subsequent leaks?

Q12. Does the market impact evidence support the “plant” hypothesis?

Q13. Why do markets react to leaks even though leaks are generally uninformative about policy outcomes?

Q14. What are the implications for the measurement of monetary policy shocks from high-frequency identification?

Q15. What are the implications for the design of central bank quiet periods?

Key Concepts

Community Engagement and Public Safety: Evidence from Crime Enforcement Targeting Immigrants

Consumer Credit and the Incidence of Tariffs: Evidence from the Auto Industry

Overview

In depth

Q1. What is a captive finance subsidiary, and why does it create a novel channel for tariff pass-through?

Q2. Why is the auto loan market a particularly suitable setting for studying this question?

Q3. What is the Regulation AB II data, and how representative is it?

Q4. What is the baseline empirical specification and what identifying variation does it use?

Q5. What are the main coefficient estimates on interest rates, and how do they evolve dynamically?

Q6. How do the authors validate that noncaptive lenders constitute a valid counterfactual?

Q7. Did captive lenders adjust any non-price loan terms in response to the tariffs?

Q8. How do the authors rule out that the increase in captive interest rates reflects a change in borrower composition rather than intensive-margin pass-through?

Q9. How do the authors rule out alternative explanations including demand surges, borrowing cost increases, securitization changes, and dealer markup changes?

Q10. How do the authors measure vehicle price pass-through, and what data do they use?

Q11. How is the overall pass-through rate decomposed between the interest rate and vehicle price channels?

Q12. How large is the estimated aggregate impact of the tariffs on consumer financing costs?

Q13. Which borrowers bore a disproportionate share of the interest rate pass-through, and by how much?

Q14. How does credit market competition affect tariff pass-through via interest rates?

Q15. Why do captive lenders spread interest rate increases broadly across vehicle types rather than targeting directly tariff-exposed new vehicle models?

Key Concepts

Consumer durables and monetary policy according to HANK