Online-First | Macro Paper Warehouse

A Macro Study of the Unequal Effects of Climate Change

Thu, 01 Jan 2026 00:00:00 +0000

Layer 1: Overview

This paper develops a macro heterogeneous-agent model to quantify the distributional welfare impacts of higher temperatures from climate change across income groups in the United States. The motivation is that existing macro climate-economy models either abstract from heterogeneity entirely or focus on spatial heterogeneity across regions rather than income heterogeneity within regions. The paper fills this gap by modeling how the welfare consequences of temperature change depend on both the region a household lives in and its position in the income distribution.

The model is calibrated to the US using five data sources: NIPA accounts from the BEA (averaged 1997–2020), the 2015 Residential Energy and Consumption Survey (RECS), PRISM climate data (1950–2022), a proprietary product-level data set of over 1,000 heaters, air conditioners, and heat pumps scraped from ecomfort.com in fall 2023, and county-level climate projections for year 2100 under RCP 8.5 from Rasmussen et al. (2016). The US is divided into five regions (cold, cool, mild, warm, and hot) of approximately equal population based on average county temperature. The quantitative exercise compares two stationary equilibria: a contemporary equilibrium using the current temperature distribution and a climate-change equilibrium using the projected 2100 distribution under RCP 8.5 (a no-large-scale-climate-policy scenario). Welfare is measured using the consumption-housing equivalent variation (CHEV), defined as the percent increase in consumption and housing a household would require in every period in the contemporary equilibrium to be indifferent between the two equilibria.

Households adapt to temperature through two channels: an intensive margin (adjusting energy use for heating and cooling given existing equipment) and an extensive margin (deciding whether to purchase a heater, air conditioner, or heat pump, each carrying a fixed cost). The production functions for heating and cooling are estimated by OLS on the product-level data set, yielding equipment exponents of 0.35 (air conditioners), 0.28 (heaters), and 0.27 (heat pumps), and energy exponents of 0.77, 0.86, and 0.85, respectively, with R-squared values of 0.97, 0.79, and 1.00. A key analytical insight from a stylized model is that the outdoor temperature acts as a “transfer from nature” to households — warmer days in cold weather and cooler days in hot weather reduce the energy households must purchase, augmenting real income. Because this transfer is a larger share of income for lower-income households, its changes are distributionally regressive when the transfer falls (hotter regions warming further) and progressive when it rises (colder regions warming).

The main quantitative findings are as follows. Among middle- and high-income households, climate change generates progressive welfare gains in colder regions — ranging from +0.71 percent of consumption-and-housing for households in the third income decile in the cool region to near-zero for the highest income households — and regressive welfare losses in hotter regions, ranging from −1.85 percent for third-decile households in the warm region to near-zero for high-income households. These patterns are driven by the intensive margin (changes in transfers from nature). For low-income households, the pattern reverses: low-income households in colder regions suffer welfare losses (the dominant effect is that climate change forces them to purchase their first air conditioner), while some low-income households in hotter regions experience welfare gains (they can forgo purchasing a heater). Climate change raises the Gini coefficient on lifetime welfare by 1.02, 1.01, and 0.50 percent in the cold, cool, and mild regions, and reduces it by 0.09 and 0.21 percent in the warm and hot regions. Aggregate welfare effects from the heterogeneous-agent model substantially exceed what a representative-agent model would imply: for example, in the mild region, climate change reduces aggregate welfare by 0.65 percent in the baseline but only 0.17 percent in the representative-agent version.

Policy experiments reveal: (1) Fully offsetting the welfare costs of climate change for the lowest-income households would require government spending on energy assistance to more than double (a factor of 2.2 increase), with the largest increases concentrated in colder regions. (2) A universal heat-pump mandate eliminates the extensive-margin channel, producing monotonically progressive welfare gains in colder regions and monotonically regressive welfare losses in hotter regions across all income deciles. (3) Heat-pump cost parity with heaters largely increases adoption and moderates welfare costs, but low-income households in the hot region see limited improvement because they still prefer air conditioners. (4) Accounting for temperature effects on the labor productivity of outdoor workers (roughly 8 percent of the workforce, concentrated at lower incomes) amplifies welfare costs in hotter regions and moderates them in colder regions, with magnitudes tied to the share of workers affected.

In depth

Q1. What is the identification strategy, and what are the main threats to it?

The paper is a calibrated structural model rather than an empirical identification exercise. Identification in the sense of parameter estimation comes from two sources: (1) OLS estimation of heating and cooling production functions on cross-sectional product-level data, where manufacturers measure capacity and efficiency under standardized conditions, limiting TFP endogeneity concerns that plague aggregate production function estimation; and (2) internal calibration of remaining parameters to match a set of moments from RECS 2015 and NIPA. Threats to the structural analysis include the assumption that households treat housing and equipment as flow (rental) choices rather than durable stocks, abstracting from switching costs and adjustment costs over the transition — the paper explicitly notes this limits the analysis to long-run stationary equilibria. The small-open-economy assumption for capital removes domestic capital-market clearing as a constraint. The calibration uses 2015 RECS (not 2020) to avoid COVID-19 distortions to cooling budget shares. The paper abstracts from amenity values of outdoor temperature, mortality from temperature exposure (approximately 0.04 percent of US deaths from 1999–2020), and spatial migration responses.

Q2. What are the two core mechanisms and how are they distinguished?

The two mechanisms are the intensive margin (how much energy to use given existing equipment) and the extensive margin (whether to purchase heating or cooling equipment at all). The paper distinguishes them analytically using the simple model, which isolates the intensive margin by assuming all households have equipment. The intuition from the simple model — outdoor temperature as a transfer from nature — explains why welfare effects are progressive in regions where climate change makes temperatures more moderate (transfers rise) and regressive where temperatures become more extreme (transfers fall). The extensive margin is then added in the quantitative model through fixed costs of heater, air conditioner, and heat pump equipment. The paper shows that climate change affects specialization favorability (the degree to which a temperature distribution favors concentrating on only heating or only cooling equipment), and that this extensive-margin channel is most important for lower-income households who are near a corner solution of specializing in only one type of equipment. The heat-pump-mandate counterfactual is used to isolate the intensive-margin channel: when all households use heat pumps in both equilibria, the extensive-margin decision is unchanged by climate change, and all welfare effects are driven purely by transfers from nature.

Q3. What heterogeneity is documented across income groups and regions?

Welfare effects vary dramatically in both sign and magnitude. Among middle- and high-income households, climate change generates progressive welfare gains in colder regions (e.g., +0.71 percent CHEV for third-decile households in the cool region, falling toward zero at the top) and regressive welfare losses in hotter regions (e.g., −1.85 percent CHEV for third-decile households in the warm region, again near-zero at the top). For low-income households, the pattern reverses: they experience welfare losses in colder regions (forced to buy first air conditioner) and welfare gains or smaller losses in hotter regions (can forgo purchasing a heater). Figure 2 in the paper shows these crossing patterns by income decile for all five regions simultaneously. The Gini coefficient changes by +1.02% (cold), +1.01% (cool), +0.50% (mild), −0.09% (warm), and −0.21% (hot). Migration incentives also differ: high-income households gain incentives to move to cooler regions (driven by transfers from nature), while low-income households gain incentives to move to warmer regions (driven by specialization changes).

Q4. What is the ’transfers from nature’ concept and why does it produce differential welfare effects?

The paper formalizes the idea that outdoor temperature provides free heating or cooling that substitutes for costly purchased energy. On a cold day with outdoor temperature ζ, nature provides ζ degrees of heating for free, effectively augmenting household income by p_eh * ζ (the value of that heating at market prices). This transfer is identical in absolute terms for all households regardless of income, but it is a larger fraction of income for low-income households, so its loss or gain has greater proportional welfare impact on them. This parallels the progressivity of lump-sum transfers in public finance: losing a dollar matters more when income is lower. Consequently, when climate change moves a region to more moderate temperatures (colder regions), the resulting increase in transfers from nature is progressive — lower-income households gain proportionally more. When climate change moves a region to more extreme temperatures (hotter regions), the decrease in transfers is regressive — lower-income households lose proportionally more. The amenity value of outdoor temperature (distinct from the heating/cooling transfer) is abstracted from in the quantitative model on the grounds that, per the simple model, it does not affect the cross-income distribution of welfare changes if preferences over amenities are uncorrelated with income.

Q5. How does the extensive margin generate the reversal of welfare effects for low-income households?

The extensive margin works through what the paper calls ‘specialization favorability.’ When a temperature distribution is dominated by cold days, households can optimally purchase only heater equipment, avoiding the additional fixed cost of an air conditioner; the reverse holds in hot climates. Climate change reduces the specialization favorability index in colder regions by adding more hot days, and increases it in hotter regions by reducing cold days. The welfare impact of moving between a corner solution (one type of equipment) and an interior solution (two types of equipment, or a heat pump) tends to be larger than moving between two interior solutions. In the cold region, climate change causes the majority of households in the bottom three income deciles to transition from not having air conditioning to having it (Figure 5, left panel). The fixed cost of buying an air conditioner for the first time exceeds the intensive-margin gains from more moderate temperatures, producing net welfare losses. In the hot region, many second-through-fourth decile households move from having heat in the contemporary equilibrium to not having heat in the climate-change equilibrium (Figure 5, right panel), saving the fixed cost and producing net welfare gains despite more extreme temperatures.

Q6. How is the model calibrated and what is the quality of fit?

Externally calibrated parameters include: capital income share α = 0.26 (Kiyotaki et al., 2011), depreciation rate δ = 0.066, interest rate r* = 0.04, CRRA coefficient σ = 2, bliss point temperature ζ* = 18°C, labor productivity process (ρ = 0.97, σ²_ε = 0.02, σ²_ξ = 0.66 from Kaplan, 2012), and production function exponents estimated from the ecomfort.com data. Internally calibrated parameters are jointly chosen to match: wealth-to-output ratio (3.0), housing-to-non-housing capital ratio (0.88), average heating budget share for non-heat-pump households (0.014), average cooling budget share (0.0055), energy budget share for heat-pump households (0.014), fractions of households with heating (0.95), cooling (0.86), and heat pumps (0.09), the ratio of energy budget shares between the fifth and first income quintile (0.12), the ratio of energy expenditures between high and low income (1.72), and energy assistance as a fraction of energy expenditures (0.83). Table 3 shows the model matches all targeted moments closely. External validation (untargeted moments) shows the model also replicates the associations between heating/cooling degree days and budget shares, equipment ownership, and indoor temperature choices, with similar signs and magnitudes to RECS 2015 data. One limitation is that the model overstates heat pump adoption (17% in model vs. 9% in 2015 RECS, though 14% in 2020 RECS), because it treats modern cold-weather-capable heat pumps as the default.

Q7. What do the policy counterfactuals show?

Four policy experiments are analyzed. First, scaling energy assistance proportionally to energy needs under climate change reduces assistance by 24% in cold and 20% in cool regions (where transfers from nature increase) and raises it by 9%, 36%, and 79% in mild, warm, and hot regions. Government spending increases by 25%, but the program remains smaller than 0.02% of output. This scaling partially offsets but does not eliminate the distributional distortions. Fully eliminating welfare costs for the lowest-income households would require multiplying energy assistance spending by a factor of 2.2. Second, a universal heat-pump mandate (analogous to natural gas bans like New York, Washington DC, or California’s post-2030 ban on natural gas furnaces) eliminates all extensive-margin effects because all households hold heat pumps in both equilibria. Under this mandate, climate change produces monotonically progressive welfare gains across all income groups in colder regions and monotonically regressive welfare costs in hotter regions. Third, heat-pump cost parity with heaters drives near-universal heat pump adoption and broadly moderates welfare costs relative to baseline, but the lowest-income households in the hot region see limited improvement because they still prefer air conditioners over heat pumps even at cost parity (air conditioners are cheaper and heat pumps’ heating advantage is less valuable in an already-hot, increasingly-hotter climate). Fourth, the labor productivity extension (using the Richardson construction cost database adjustment factor of 1% per degree outside 40°F–85°F) implies that climate change raises low-income productivity by 2% in cold and 0.9% in cool regions and reduces it by 0.1%, 1.1%, and 2.2% in mild, warm, and hot regions. These labor-productivity changes modestly moderate welfare costs in colder regions and amplify them in hotter regions for low-income households.

Q8. Why does income heterogeneity matter for aggregate welfare calculations?

The paper demonstrates that a representative-agent model substantially underestimates the aggregate welfare cost of climate change in all regions except the hot region. In the cold region, the aggregate CHEV is −1.03% in the baseline but the average (seventh-decile) household experiences small positive welfare effects (+0.19%), and the representative-agent model yields −0.00%. In the mild region, the aggregate is −0.65% but the representative-agent model gives −0.17%. The discrepancy arises because the welfare distribution is skewed: large losses for low-income households in colder regions are not offset by small or negative gains for high-income households, so the average is dominated by the tails. In the hot region the direction reverses: the baseline aggregate benefit (+0.24%) is driven by large gains at the bottom that the representative-agent model (−0.43%) misses entirely. This finding parallels the broader macroeconomics literature showing that income heterogeneity affects the aggregate welfare cost of business cycles, inflation, and asset pricing.

Q9. How does this paper relate to and differ from prior work?

The paper sits at the intersection of two literatures. The macro climate-economy literature (Acemoglu et al., 2012; Golosov et al., 2014; Barrage, 2020) typically uses representative-agent models that abstract from heterogeneity. The spatial heterogeneity literature (Cruz and Rossi-Hansberg, 2024; Bilal and Rossi-Hansberg, 2023; Rudik et al., 2022) studies how welfare consequences vary across regions based on their income levels and exposures but not within-region income differences. The within-region inequality literature (Dennig et al., 2015; Kornek et al., 2021; Belfori and Macera, 2022; Douenne et al., 2023) adds heterogeneous fixed income types to integrated assessment models, but does not model endogenous income and wealth distributions. Blanz (2023) is the closest precursor: it uses a standard incomplete-markets model to study food-price effects of climate change in developing countries, but does not model the temperature-equipment-energy production technology. The empirical literature (Hsiang et al., 2017; Park et al., 2018; Doremus et al., 2022) estimates reduced-form relationships between temperature and energy spending by income group, but cannot decompose intensive vs. extensive margin mechanisms or conduct structural policy counterfactuals. The key novel contributions are: (1) endogenous income and wealth heterogeneity within the Bewley-Huggett-Aiyagari tradition, (2) explicit modeling of both margins of temperature adaptation with estimated production functions, and (3) the ability to separately identify the roles of transfers from nature and specialization favorability.

Q10. What robustness checks are conducted?

The paper reports several robustness checks. First, the main calibration uses the housing exponent γ = 0.1, but Appendix Figure D.1 shows results with γ = 0.4 (the upper bound implied by the RECS regression of energy on square footage, before controlling for quality), finding broadly similar qualitative results. Second, the 2015 RECS is used instead of the 2020 RECS due to COVID-19 distortions to cooling budget shares; the paper notes heating budget shares are similar between the two surveys while cooling shares are materially higher in 2020. Third, external validation of the model on untargeted moments (associations between HDD/CDD and heating/cooling budget shares, equipment ownership, and indoor temperatures) confirms the model’s predictive validity. Fourth, the welfare results are computed for both the main five-region model and a representative-agent version, documenting the magnitude of the aggregation bias. Fifth, the labor productivity extension bounds the relevant population (bottom 3% vs. bottom 16% of workers) to bracket the Occupational Requirements Survey estimate of 8% of workers constantly or frequently exposed outdoors.

Q11. What are the scope conditions and limitations of the main results?

Several important scope conditions apply. The analysis focuses exclusively on the direct effects of higher temperatures in the US; it does not cover other forms of climate damage (sea level rise, storm frequency, drought, wildfire) or effects in other countries. The model is solved for stationary equilibria, so it cannot speak to transition dynamics or the welfare costs of adjustment during the period when households are switching equipment. Housing and equipment are modeled as flow (rental) choices, abstracting from switching costs, adjustment frictions, and the interaction between homeownership and equipment decisions. The model abstracts from the amenity value of outdoor temperature (e.g., preference for pleasant weather), temperature-related mortality (about 0.04% of US deaths, 1999–2020, heavily concentrated among the unhoused population outside the model), and behavioral adaptation beyond energy and equipment choices (migration is analyzed only as a partial equilibrium incentive calculation, not as an equilibrium outcome). The capital market operates as a small open economy, so general equilibrium effects on interest rates are absent. Labor productivity effects of temperature are only explored for low-income workers in the outdoor sector, not for higher-income or indoor workers.

Q12. What are the migration findings and their caveats?

The paper shows that climate change increases incentives for high-income households to migrate to cooler regions (driven by the transfers-from-nature channel — cooler regions offer larger increases in transfers) and increases incentives for low-income households to migrate to warmer regions (driven by the specialization channel — warmer regions allow forgoing heater equipment). The magnitude of the change in migratory pressure for high-income households is much smaller (order of magnitude roughly 0.15 on the paper’s scale) than for low-income households (order of magnitude roughly 3 on the same scale). The authors explicitly caveat that this is a partial equilibrium exercise: the model abstracts from the amenity value of temperature (which would reduce pressure to move to warmer regions by reducing the attractiveness of hot destinations) and from other dimensions of climate change (storm risk, fire risk) that would affect migration incentives independently.

Key Concepts

Transfers from nature: In this paper’s framework, outdoor temperature acts as a subsidy equivalent to income: on a cold day, nature provides degrees of heating for free, augmenting household real income by the value of that heating energy; on a hot day, it provides degrees of cooling. The transfer is the same in absolute terms for all households but represents a larger fraction of income for lower-income households, making changes in temperature distributionally progressive (when transfers rise) or regressive (when transfers fall).

Extensive margin of temperature adaptation: The binary decision of whether to purchase temperature-control equipment — a heater, air conditioner, or heat pump — each carrying a fixed cost. Households at the extensive margin may optimally forego one type of equipment entirely (complete specialization), and climate change can force them to acquire equipment they previously lacked or allow them to drop equipment they previously held.

Intensive margin of temperature adaptation: The continuous decision of how much energy to purchase to operate existing heating and cooling equipment in order to achieve a desired indoor temperature, conditional on having that equipment. Changes in the outdoor temperature distribution affect energy expenditures along this margin for all households that already own equipment.

Specialization favorability index: A region-level index S_n ∈ [0,1] defined as the absolute difference between total degrees of heating need and total degrees of cooling need, divided by their sum. Higher values indicate that the temperature distribution is more dominated by either heating or cooling demand, making it more efficient for households to specialize in a single type of temperature-control equipment rather than purchasing both. Climate change reduces specialization favorability in colder regions and increases it in hotter regions.

Consumption-housing equivalent variation (CHEV): The paper’s welfare metric: the percentage by which a household’s consumption and housing would need to increase in every period of the contemporary equilibrium for the household to be indifferent between remaining in the contemporary equilibrium and living in the climate-change equilibrium. Negative CHEV values indicate welfare losses from climate change.

Temperature damage function D(T): A function mapping the deviation of indoor temperature from the bliss point to the fraction of full utility the household receives from housing services. D equals 1 when indoor temperature equals the bliss point (18°C in calibration) and falls below 1 as indoor temperature deviates in either direction, with the rate of decline governed by parameter χ. This function creates the motive to use energy for heating and cooling.

RCP 8.5: As used in this paper, a climate scenario from the CMIP archive representing emissions in the absence of large-scale climate policy, used to construct the 2100 temperature distribution in the climate-change equilibrium. County-level projections come from Rasmussen et al. (2016), probability-weighted across climate models.

Are Targeted Matching Schemes Effective in Stimulating Retirement Savings?

Thu, 01 Jan 2026 00:00:00 +0000

Layer 1: Overview

Governments across ten-plus countries — including Australia, the United States, Germany, and New Zealand — have introduced matching schemes to encourage low- and middle-income earners to contribute voluntarily to private pensions, motivated by the concern that progressive tax systems give these groups weaker incentives to save for retirement than high-income earners. Whether such schemes actually raise retirement savings is theoretically ambiguous: by reducing the cost of contributing they produce a substitution effect favoring more contributions, but the government payment also raises anticipated retirement income, reducing the desire to save further (a retirement income effect). The sign of the net effect depends on the distribution of contributions that would have occurred in the scheme’s absence, and it is especially unclear for those who would already have contributed above the matching ceiling.

This paper tests the full set of theoretical predictions from a two-period intertemporal savings model using Australia’s Superannuation Co-contribution Scheme as a clean natural experiment. The scheme matches personal after-tax superannuation contributions up to $1,000 per year at a single, flat matching rate that varied over time — 100% in 2003-04 and 2009-10 to 2011-12, 150% in 2004-05 to 2008-09, and 50% from 2012-13 onward — and eligibility is phased out smoothly with income (no sharp income discontinuity, unlike the US Saver’s Credit), removing incentives for income manipulation. The maximum co-contribution payment was accordingly $1,000, $1,500, or $500 depending on the period. Estimation uses the ATO Longitudinal Information Files (ALife), a 10% random sample of all registered Australian tax filers linked longitudinally since 1990-91, covering 1,416,622 individual-year observations from 1999-2000 to 2016-17. The authors employ a first-differenced estimator exploiting within-individual variation in eligibility and match rates across years, conditioning on income, income squared, demographic controls, and year fixed effects.

On the extensive margin, eligibility is associated with statistically significant but small increases in the probability of making any voluntary after-tax contribution: 0.6 percentage points at the 50% match rate, 0.9 percentage points at 100%, and 2.7 percentage points at 150%. Bunching at the salient $1,000 eligible maximum rises monotonically with the match rate: 0.23, 0.84, and 1.4 percentage points, respectively. Below $1,000, the probability of contributing in that range increases by 1.2, 1.6, and 2.7 percentage points — consistent with the substitution effect drawing in non-contributors and low contributors. Above $3,000, however, the probability of contributing falls significantly at all match rates: -0.66 pp (50%), -0.91 pp (100%), and -0.98 pp (150%), consistent with a retirement income windfall effect inducing high contributors to reduce their contributions toward the kink at $1,000.

These opposing forces mean that average personal after-tax contributions (intensive margin) fall under all match-rate regimes: by $24.0 (50%), $24.6 (100%), and $6.49 (150%) per person-year, all significant. The attenuation of the fall at the 150% rate is consistent with substitution effects beginning to overshoot the eligible maximum and partially offsetting the income effect. When the government co-contribution payment itself is included, the combined personal-plus-government contribution rises ($40 at 100%, $126 at 150%), but these gains are partly offset by crowding out of voluntary concessional (salary sacrifice, pre-tax) contributions: eligibility is associated with 1.1 percentage point and 0.8 percentage point reductions in the proportion making voluntary concessional contributions at the 50% and 100% match rates respectively.

Symmetry tests show no evidence of persistent habit formation: increases and decreases in treatment intensity produce contributions changes of roughly equal and opposite magnitudes on the extensive margin (gains +1.3 pp, losses -1.4 pp), ruling out the hypothesis that temporary eligibility establishes lasting savings behavior.

Heterogeneity analysis reveals that the small average response reflects constrained liquidity. The response is largest for partnered females (+2.7 pp on the extensive margin), who have more discretionary income as secondary earners, and for those in the top permanent-income quintile (+3.6 pp), compared with bottom quintile (+0.4 pp) and second quintile (+0.7 pp). Responses increase with age and with lagged superannuation balance, with those holding balances above $100,000 responding at around 2.5 pp versus only 0.6 pp for those with balances below $25,000. There is no evidence that information is the binding constraint: respondents who use a tax consultant respond no more than those who self-file, and survey data document approximately 80% scheme awareness among superannuants.

The paper’s central policy conclusion is that even a simple, transparent, and generous co-contribution scheme fails to meaningfully raise contributions of those it targets. The negative intensive margin arises because the scheme acts as a windfall for existing high contributors rather than newly inducing saving. These findings raise doubts about analogous reforms under discussion for the US Saver’s Credit.

In depth

Q1. What is the identification strategy and what are the key threats to it?

The primary estimator is a first-differenced OLS regression exploiting within-individual, year-on-year changes in co-contribution eligibility and match rates. Because the income thresholds shift over time and individuals’ income fluctuates, the same person can move in and out of eligibility or across match-rate regimes, providing 16 distinct combinations of year-on-year changes in treatment status that identify the three match-rate coefficients. The key identification assumption is that first-differenced treatment indicators are contemporaneously uncorrelated with first-differenced idiosyncratic shocks. The main threat is income endogeneity — treatment is inversely related to income, and unobserved preferences to save may correlate with income. The authors address this by differencing out individual fixed effects and including income and income-squared as controls. They also test whether income manipulation around thresholds is occurring (it is not, unlike the US Saver’s Credit): frequency distributions of income show no bunching at the eligibility thresholds. The only income bunching observed is at the top of the lowest tax bracket (~$37,000), unrelated to scheme thresholds. As a robustness check, the authors also estimate individual fixed-effects models; results are broadly consistent, except for a theoretically inconsistent anomaly on the extensive margin for the 50% rate in the fixed-effects version, which the authors attribute to that model’s stricter exogeneity assumption being more likely violated in a life-cycle context.

Q2. How does the paper decompose income and substitution effects, and what is the empirical test for each?

The paper uses a two-period intertemporal model to show that the scheme creates a kinked budget constraint at the maximum eligible contribution (pmax). Those who would have contributed below pmax in the absence of the scheme face a lower cost of saving (substitution effect) and may increase contributions up to pmax. Those who would have contributed above pmax receive the co-contribution as a pure retirement income windfall, face no substitution incentive (the matching rate applies only below pmax), and respond only via a negative income effect by reducing contributions toward pmax. The empirical decomposition tests these predictions by estimating contribution probabilities in three ranges: contributions up to $1,000 (captures substitution effect), contributions between $1,001 and $3,000 (theoretically ambiguous — outflow from above $3,000 may offset inflow to $1,000), and contributions above $3,000 (captures negative income effect, as this range sits entirely above pmax). In Figure 5, the paper plots cumulative distribution function effects for each match rate across $100 increments from $0 to $10,000, showing negative effects on the CDF below $1,000 (substitution draws people above zero) and positive effects at and above $1,000 (income effect shifts mass below the maximum). The sign pattern is consistent with theory across all three match rates, and is more pronounced at higher match rates.

Q3. What does the paper find about bunching at the $1,000 maximum eligible contribution?

Eligibility is associated with significantly increased probability of contributing exactly $1,000, rising with the match rate: 0.23 pp at 50%, 0.84 pp at 100%, and 1.4 pp at 150%. The alternative specification distinguishing full eligibility (income below lower threshold, pmax = $1,000) from part eligibility (income in the tapered zone, pmax < $1,000) shows that part-eligible individuals also bunch significantly at $1,000 despite being entitled to match payments only for contributions below $1,000. This highlights the salience of the nominal maximum — people in the tapered zone treat $1,000 as the focal contribution amount rather than computing their individual optimal eligible contribution. The ATO online calculator does not report the maximum eligible contribution for part-eligible individuals, which likely reinforces this behavioral pattern.

Q4. What are the crowding-out effects on unmatched (concessional) contributions?

The co-contribution scheme is associated with reductions in the use of voluntary concessional contributions (salary sacrifice, which are pre-tax and thus ineligible for matching). Using data from 2009-10 to 2016-17 (when salary sacrifice can be separated from compulsory employer contributions), the authors find that eligibility reduces the proportion of people making voluntary concessional contributions by 1.1 pp at the 50% match rate and 0.8 pp at the 100% match rate (both statistically significant). The data do not allow estimation at the 150% match rate because salary sacrifice records are unavailable before 2010. This crowding out compounds the scheme’s limited impact on total retirement savings: the net addition to retirement income from voluntary contributions is even smaller than the after-tax contribution estimates suggest. The mechanism attributed is the income windfall effect — for those who already made after-tax contributions in the absence of the scheme, the matching payment reduces their need for additional voluntary pre-tax saving.

Q5. Is there evidence of asymmetry in scheme effects — do people who gain eligibility respond differently from those who lose it?

The symmetry test in Equation (6) separates increases in treatment intensity (becoming eligible or moving to a higher match rate) from decreases (losing eligibility or moving to a lower rate). On the extensive margin, the effects are approximately symmetric: gaining intensity raises the contribution rate by 1.3 pp on average, while losing intensity reduces it by 1.4 pp. This rules out the ’early targeting’ hypothesis that short-term scheme exposure establishes lasting contribution habits that persist after eligibility ends. There is, however, some distributional asymmetry: bunching at $1,000 and the negative income effect above $3,000 are weaker in response to decreases in treatment intensity than to increases, suggesting some stickiness — people whose treatment falls may sustain slightly higher contributions for a period because prior co-contributions made them feel wealthier. But on the intensive margin, the reduction in average contributions is significant when treatment increases and statistically indistinguishable from zero when treatment decreases. The overall conclusion is no meaningful asymmetry that would justify life-cycle ‘seeding’ arguments for young-age eligibility phased out later.

Q6. What heterogeneity in responses is documented, and what does it imply about who benefits?

Responses are largest among groups with greater discretionary income relative to their current consumption needs. Partnered females respond at 2.7 pp on the extensive margin (versus 1.2 pp for partnered males, 1.1 pp for single females, and 0.6 pp for single males). The interpretation is that partnered females are more likely to be secondary earners whose income is discretionary, reducing the liquidity cost of foregoing current consumption. The extensive margin response increases monotonically with permanent income quintile: 0.4 pp (bottom), 0.7 pp (2nd), 1.3 pp (3rd), 1.8 pp (4th), and 3.6 pp (top). Those in the top quintile are eligible only when their transitory income is temporarily low, and they appear to have both the liquid assets and the foresight to exploit the scheme. Responses increase with age, consistent with older workers facing lower liquidity constraints and having stronger retirement income motives. Lagged superannuation balance matters: those with balances above $100,000 respond at ~2.5 pp versus ~0.6 pp for those with balances below $25,000 — the scheme does not help low-balance individuals catch up. Importantly, there is no evidence that scheme uptake is constrained by information: tax-agent filers and self-filers respond at similar rates (~1.3 pp vs ~1.9 pp), and external surveys show roughly 80% public awareness. This rules out information provision as a policy lever likely to substantially raise the scheme’s impact.

Q7. How does this study relate to and differ from prior evaluations of the US Saver’s Credit and German Riester schemes?

Prior work on the Saver’s Credit (Duflo et al. 2007, Ramnath 2013, Heim and Lurie 2014) found small or null effects, attributed mainly to the scheme’s complexity — non-refundable tax credit with match rates of 11%, 25%, or 100% depending on income thresholds that create sharp discontinuities and strong income manipulation incentives. The Riester scheme (Corneo et al. 2009, 2010) showed zero effects on total savings, attributed to its complex co-contribution formula where the effective match rate depends on income and number of children, making the true incentive opaque. This paper’s contribution is to evaluate a scheme explicitly designed to avoid those complexities: a single flat match rate, co-contribution paid directly to the pension account, eligibility smoothly phased out with no discontinuities, and near-universal institutional coverage through mandatory superannuation. This design is analogous to the Duflo et al. (2006) H&R Block field experiment (which found 5–11 pp increases in contribution rates for 20–50% match rates), and the paper can be read as asking whether those larger field-experiment effects generalize to a national, ongoing program at comparable design simplicity. The answer is no: the national scheme produces responses an order of magnitude smaller than the field experiment. The paper attributes this partly to the field experiment’s ‘one-time-only’ nature (creating urgency), potential interaction with Saver’s Credit tax refunds, and selection of H&R Block clients. The Australian study also goes beyond prior work by estimating distributional effects (contribution ranges), crowding out of unmatched contributions, and symmetry tests — none of which were examined in the prior national scheme evaluations.

Q8. What are the paper’s policy implications and their scope conditions?

The primary implication is that co-contribution matching schemes, even when simple, generous, and widely known, are likely to produce small effects on retirement savings of low- and middle-income earners. The mechanism is that many in the eligible population already contributed more than the scheme maximum and treat the matching payment as a windfall, reducing personal contributions. The scheme is particularly ineffective for the lowest permanent-income earners, who face binding liquidity constraints and respond least even when they are aware of the scheme. This is directly relevant to proposed US reforms of the Saver’s Credit (the Retirement Security and Savings Act considered by Congress at time of writing) that would convert it to a direct co-contribution more like Australia’s scheme — the paper’s results suggest such simplification may not yield large savings increases. A scope condition concerns institutional context: Australia has near-universal mandatory superannuation with employer contributions at 9.5% of earnings, which may reduce the marginal value of voluntary contributions. The authors acknowledge that responses might be higher in countries without mandatory employer coverage, though the finding that lower-balance individuals respond least makes this qualification weak. A second scope condition is that the scheme excludes compulsory employer contributions from the matching base, so the results speak specifically to voluntary behavior. Future research is identified on whether tightening access to public pensions (raising the pension access age) would increase voluntary contributions among low-income earners who currently rely on public pensions as their retirement backstop.

Q9. What robustness checks are conducted?

The authors report four main robustness exercises. First, they estimate an individual fixed-effects model alongside the first-differenced model; results are broadly consistent, with the noted exception of a theoretically inconsistent anomaly at the 50% match rate for the extensive margin in the fixed-effects version, attributed to violation of the strict exogeneity assumption. This validates the first-differenced approach as the preferred specification. Second, they extend the base model to distinguish full eligibility (income at or below the lower threshold, pmax = $1,000) from part eligibility (income in the tapered zone, pmax < $1,000), confirming that even partial eligibility generates bunching at the salient $1,000 level. Third, they examine distributional predictions by estimating the model for 100 incremental contribution thresholds from $0 to $10,000 (Figure 5), verifying that the CDF-effect pattern is consistent with the theoretical predictions across all three match rates. Fourth, information access is tested by interacting scheme response with whether a tax agent was used to lodge the return; the absence of any significant difference between tax-agent filers and self-filers, combined with documented high public awareness, eliminates information deficiency as an explanation for the small response.

Key Concepts

Co-contribution matching scheme: A government program that pays a specified fraction (the matching rate) of the individual’s voluntary personal pension contributions up to a maximum eligible contribution ceiling, credited directly to the individual’s retirement account — as distinct from a tax credit that may not reach the account.

Retirement income effect (windfall effect): The tendency of matching payments to reduce voluntary personal contributions among those who would have contributed above the scheme maximum in the scheme’s absence: because the government contribution supplements their retirement income regardless of their own effort, they rationally reduce personal saving to the eligible maximum.

Substitution effect (in this scheme): The scheme’s reduction in the effective cost of contributing by raising the return to each dollar contributed, inducing those who previously contributed below the eligible maximum to increase contributions toward that maximum.

Bunching at the eligible maximum: Mass concentration of contributions at exactly $1,000 (the scheme’s nominal maximum eligible contribution), drawing both from below (via the substitution effect) and from above (via the income/windfall effect), and reinforced by the salience of the round-number maximum even for part-eligible individuals whose true eligible maximum is below $1,000.

Permanent income (in this context): The predicted value of long-run log total personal income estimated from a Mincer-style regression including individual fixed effects, used to distinguish individuals who are structurally low-income (and face genuine liquidity constraints) from those whose transitory income is temporarily low and who are high-permanent-income individuals exploiting the scheme.

Crowding out of concessional contributions: The reduction in voluntary pre-tax (salary sacrifice) superannuation contributions associated with scheme eligibility, reflecting the income windfall from the matching payment reducing the need for supplementary retirement saving through the pre-tax channel.

Symmetry of scheme effects: The property that the contribution response to gaining eligibility (or a higher match rate) is equal in magnitude and opposite in sign to the response to losing eligibility (or a lower match rate); symmetry implies no lasting habit formation from scheme exposure and rules out ’early targeting’ strategies aimed at establishing lifetime saving patterns.

Balancing Work and Care: How Workplace Factors Can Mitigate the Gendered Impacts of Caregiving

Thu, 01 Jan 2026 00:00:00 +0000

Layer 1: Overview

This paper examines how workplace environments shape the economic consequences that fall on mothers — but not fathers — when a child is diagnosed with cancer. The motivation is a gap in the caregiving-and-labor-markets literature: while the earnings penalties from childbirth are well-documented, less is known about caregiving shocks that arrive later in childhood, or about whether and how the firm, occupation, or industry a parent works in moderates those penalties.

The empirical setting is Australia. The authors use the ABS Person Level Integrated Data Asset (PLIDA), a longitudinal administrative database linking tax records (ATO, 2005–2022), Medicare health records, and 2011 Census occupation and hours data. A distinctive feature is matched employer-employee identifiers, enabling construction of workplace characteristics at the firm, occupation, and industry levels. The sample comprises 3,258 families in which a child (age 4–18, average age 12.98) began chemotherapy between 2012 and 2023 and both parents were employed two years before treatment. Pre-diagnosis average earnings are $37,639 for mothers and $79,702 for fathers (CPI-adjusted to 2012).

The identification strategy is a dynamic difference-in-differences (DiD) model following Fadlon and Nielsen (2019, 2021). The treatment group consists of parents whose children started chemotherapy between 2012 and 2017; the control group consists of parents whose children will receive the same diagnosis later, between 2018 and 2023, with placebo treatment assigned six years before actual treatment. Individual fixed effects absorb time-invariant heterogeneity; year fixed effects absorb common trends. Childhood cancer — specifically chemotherapy-requiring cancer — is treated as a largely random shock with no pre-trend in earnings or employment between treated and control families before diagnosis.

Main findings on the average effects: Maternal earnings fall by $5,608 in the year chemotherapy begins (14.9% of baseline earnings). The earnings decline persists for at least three years even as measured caregiving intensity (child healthcare service use) returns to baseline by year 3, leaving earnings approximately 9.7% below baseline in year 3 (−$3,645). The primary mechanism is a reduction in hours worked rather than outright job exit: employment falls by 4.9 percentage points in year 0, peaking at a decline of 5.6 percentage points two years post-treatment, a modest reduction relative to the earnings loss. Job-to-job transitions are not significantly elevated. Mental health service use (therapy, antidepressants, anxiolytics) shows no significant change for either parent, ruling out a mental health channel and reinforcing that caregiver time demands drive the result. Fathers experience no statistically significant change in earnings, employment, or job transitions across all specifications.

Subgroup heterogeneity: The earnings penalty is substantially larger for mothers of younger children (under 12): −$9,443 in year 0, equivalent to 25.8% of that subgroup’s baseline earnings. For children with above-median healthcare utilization, the year-0 penalty is −$7,826 (21.6%).

Workplace moderation — three dimensions are examined at the firm, occupation, and industry levels:

(1) Gender pay gap: Mothers in occupations with below-average gender pay gaps face lower earnings losses ($5,782 vs $8,409; 16.5% vs 18.1%). The effect is significant at the occupation level but not at the firm or industry level.

(2) Work hour intensity: Mothers in firms with below-median weekly hours face a year-0 earnings loss of $3,240 (9.9%) versus $7,159 (15.6%) in high-hours firms — a difference of $3,919, significant at the firm level. A parallel gap holds at the occupation level. When both firm and occupation are low-hours, the combined loss equals $2,519; when both are high-hours, it reaches $9,357 — a fourfold difference.

(3) Female representation in the top 20% of earners: Mothers at firms where women are the majority of top-20%-earners suffer a penalty of $3,856 (8.3%) versus $7,799 (23.4%) elsewhere — a $3,943 mitigation at the firm level. At the occupation level the corresponding figures are $4,240 (9.2%) versus $8,356 (25.0%). Female representation in middle or bottom earnings tiers carries no significant moderating effect.

In the combined specification (all firm- and occupation-level variables simultaneously), female representation in the top 20% and work hour intensity remain jointly significant; the gender pay gap loses significance, consistent with these variables being correlated. In the polar comparison between fully supportive jobs (low hours, high female senior representation, low occupation gender pay gap) and fully unsupportive jobs (opposite), the difference is dramatic: mothers in supportive jobs suffer a −$6,280 year-0 earnings hit that recovers fully by year 1, while mothers in unsupportive jobs face −$10,416 in year 0 widening to −$13,882 in year 3 before partially recovering in year 4.

Policy implications (with scope conditions): The results support policies that reduce greedy-work norms and increase female representation in senior roles as instruments for attenuating the gendered economic cost of caregiving shocks. The study does not isolate specific workplace policies (e.g., formal paid leave) but identifies observable correlates of supportive environments. Effects are identified among working parents of children requiring chemotherapy; they do not generalize to cancer not requiring chemotherapy or other types of caregiving shocks without further evidence. Notably, fathers’ outcomes are unresponsive to workplace factors, suggesting that social norms or intra-household bargaining — not workplace barriers per se — are the primary constraints on paternal caregiving adjustment.

In depth

Q1. What is the identification strategy and what are the main threats to it?

The authors use a later-treated dynamic DiD, comparing parents whose children began chemotherapy 2012–2017 (treated) to parents whose children will begin the same treatment 2018–2023 (control), with the control group’s placebo treatment assigned six years before their actual treatment. Individual fixed effects absorb time-invariant heterogeneity; year fixed effects absorb macro shocks. The parallel trends assumption is validated by showing: (1) no statistically significant differences in pre-cancer demographic, socioeconomic, or workplace characteristics between treated and control groups (Figure 1); and (2) no pre-trend in earnings or employment in years -4 and -3 relative to baseline (Table A3, estimates small and insignificant). The main threats acknowledged are (a) non-random selection into workplace types — mothers who anticipate greater caregiving loads may sort into more family-friendly jobs — and (b) differences in baseline wage levels across job types. On (a), the authors argue the direction of selection bias goes the wrong way: if selection were driving results, mothers in supportive workplaces (who selected there due to caregiving preferences) would have weaker labor market attachment and larger post-shock earnings declines; instead the opposite is found. On (b), the authors show that absolute dollar declines in less-supportive workplaces also correspond to larger percentage declines relative to baseline, so the pattern is not an artifact of higher baseline wages in high-hour jobs (though Appendix Table A2 confirms mothers in high-hour and high-senior-female firms do have higher baseline earnings of around $46,000–$50,000 vs $32,000–$33,000).

Q2. How is the caregiving shock defined and what does this imply for external validity?

The shock is defined as initiation of chemotherapy by the child, identified from Medicare prescription records using ATC codes beginning with L01 (excluding methotrexate L01BA01) and adding immunomodulators with chemotherapy-like effects. Chemotherapy initiation is treated as a reliable, time-consistent marker because it typically follows immediately from diagnosis of cancers such as acute lymphoid leukemia, astrocytoma, and neuroblastoma. The authors note explicitly that estimates do not represent the effects of childhood cancer not requiring chemotherapy (e.g., early-stage cancers treated with surgery, radiation, or immunotherapy alone). This restriction to chemotherapy-requiring cancers likely selects a sample with above-average caregiving intensity.

Q3. What is the main mechanism through which the earnings decline operates?

The primary mechanism is a reduction in hours worked rather than outright job exit. The employment decline (approximately 4.5–5.0 percentage points in years 0–2 per Table A3) is modest relative to the earnings loss of $5,608. A back-of-envelope calculation in footnote 6 shows that if 5% of mothers left the labor market at average earnings, the implied earnings drop would be only $1,882, far below the observed $5,608. Job-to-job transitions (probability of switching employer) are not significantly elevated. Mental health service use (psychological therapy, antidepressant/anxiolytic/antipsychotic prescriptions) shows no significant change for either parent (Appendix Figure A4), ruling out mental health deterioration as a channel. The persistence of earnings losses beyond the period of peak healthcare service use (which returns to baseline by year 3, per Appendix Figure A2) is consistent with stalled career trajectories — foregone promotions or skill development — or with continued but less-measured caregiving demands.

Q4. At which organizational level (firm, occupation, or industry) do workplace moderators operate most strongly?

Firm and occupation levels are the dominant levels; industry-level measures are consistently insignificant for all three moderating variables. The authors interpret this as follows: industry-level measures are too broad to capture the specific work arrangements and norms that affect caregiving balance. At the occupation level, structural characteristics — profession-wide agreements, flexibility of task-based roles, part-time feasibility — directly govern how feasible it is to reduce hours without exiting employment. At the firm level, immediate workplace culture and specific HR policies apply. The relative contribution of firm vs occupation varies by the moderator: work hour intensity effects are significant at both firm and occupation levels, female senior representation is significant at both, while the gender pay gap effect is significant only at the occupation level.

Q5. Why does female representation in senior roles (top 20% of earners) mitigate the earnings penalty while middle and bottom tier representation does not?

The authors argue that women in the top-20% of earners — effectively leadership positions — are better positioned to advocate for and implement caregiving-supportive policies (paid leave, flexible scheduling). Representation in lower tiers may be indicative of a caregiving-friendly workforce composition but lacks the organizational power to shape policies. This is supported empirically: the moderating interaction is significant and economically large for top-20% female representation at both the firm (mitigating the penalty by $3,943) and occupation levels (mitigating by $4,116), while interactions for the middle 50–80% and bottom 50% earnings tiers are not statistically significant in most specifications.

Q6. Why does the occupational gender pay gap matter for the earnings penalty but not the firm-level or industry-level gap?

The authors offer two explanations. First, occupations define the day-to-day nature of work — task structure, required hours, flexibility — in ways that make caregiving more or less compatible. Occupations that accommodate part-time and flexible scheduling tend to attract more women and develop norms that support caregiving, which in turn narrows occupational gender pay gaps. At the firm level, the same firm often contains diverse occupations with heterogeneous norms, so firm-level gender pay gap is a noisier signal. At the industry level, the measure is too aggregated. Second, narrow occupational gender pay gaps may reflect the collective bargaining power of women in female-dominated occupations (e.g., nursing), which translates into formal caregiving protections. A firm or industry may exhibit a wide gender pay gap due to male dominance in senior or high-earning roles even when specific female-dominated occupations within that firm/industry have caregiving-friendly norms. However, in the combined specification including all workplace factors simultaneously, the gender pay gap variable loses statistical significance, suggesting its initial effect was partly mediated by correlated factors (hours intensity and female senior representation).

Q7. How does the combined ‘supportive vs unsupportive’ comparison work and what does it show?

Supportive jobs are defined as those satisfying all three criteria: low work hour intensity at both firm and occupation levels, high female representation in the top 20% of earners at both firm and occupation levels, and low gender pay gap at the occupation level (N = 2,708 mother-years). Unsupportive jobs are the opposite on all criteria (N = 2,339). Event study estimates (Table A9, Figure 3) show stark divergence. In supportive jobs, the year-0 penalty is −$6,280, and earnings recover quickly to statistically insignificant levels by years 1–4. In unsupportive jobs, the year-0 penalty is −$10,416, it widens to −$10,658 in year 2 and −$13,882 in year 3, before partially recovering in year 4. Pre-treatment estimates are not significantly different from zero in both subsamples, supporting parallel trends within each group.

Q8. What heterogeneity is documented by child and family characteristics?

Appendix Figure A3 presents two subgroup analyses. Mothers of children under age 12 at diagnosis experience a year-0 earnings loss of −$9,443 (25.8% of baseline earnings of $36,567), substantially larger than the average. Mothers of children with above-median healthcare utilization (measured by number of medical appointments in the year following treatment initiation) experience a year-0 loss of −$7,826 (21.6% of baseline earnings of $36,278). These patterns are consistent with the interpretation that caregiving intensity — driven by child age and treatment severity — scales the maternal earnings penalty.

Q9. What robustness checks are conducted?

The paper’s main robustness arguments are: (1) pre-trend validation (Figures 1 and 2, Table A3) confirming no anticipatory effects and balanced pre-characteristics; (2) the selection-direction argument for workplace heterogeneity — the selection story would predict larger penalties in supportive workplaces but the opposite is found; (3) showing that absolute earnings declines in less-supportive workplaces also represent larger proportional declines relative to baseline, ruling out a level-effect interpretation; (4) the mental health non-result (Appendix Figure A4) confirming earnings effects are not confounded by parental mental health deterioration; (5) separate combined specification (Table A8) testing all workplace moderators simultaneously to address multicollinearity. The paper does not report explicit placebo tests using alternative shocks or falsification samples, nor does it report results restricted to narrow geographic areas or specific cancer types.

Q10. How does this paper relate to prior literature on caregiving shocks?

The paper builds most directly on three prior studies using Nordic or European administrative data: Eriksen et al. (2021, Journal of Health Economics) on childhood health shocks and parental labor supply; Breivik and Costa-Ramon (2024, Review of Economics and Statistics) on children’s health shocks and parental earnings and mental health; and Vaalavuo et al. (2023, Demography) on gender inequality from child health shocks on parental trajectories. All three find significant maternal earnings or employment losses and no or small paternal effects. The present paper’s contribution relative to these is the explicit examination of how firm-, occupation-, and industry-level workplace characteristics moderate the maternal penalty — a dimension the prior literature has not addressed. It also connects to Fadlon and Nielsen (2019, 2021) on the methodology and to the broader child-penalty literature reviewed by Cortes and Pan (2023, Journal of Economic Literature). On workplace mechanisms it connects to Goldin (2014) on ‘greedy jobs’ and Goldin and Katz (2016) on pharmacy as a family-friendly profession.

Q11. What are the policy implications and their scope conditions?

The findings suggest that maternal earnings losses from caregiving shocks can be substantially mitigated by workplace environments characterized by lower work hour intensity and higher female representation in senior earnings tiers. This points to policies promoting: (1) reduced greedy-work norms — discouraging long-hours cultures and enabling part-time flexibility without disproportionate wage penalties; (2) greater female representation in leadership and high-earning positions, which appears to create cultural and policy environments more accommodating of caregiving. Scope conditions: the results apply to working mothers (and fathers) of children requiring chemotherapy in Australia, where Medicare provides universal healthcare coverage and existing social insurance exists. The paper explicitly does not identify specific causal mechanisms (e.g., it cannot isolate the effect of formal paid leave from culture). On fathers, the implication is that workplace factors alone are unlikely to induce fathers to increase caregiving, pointing instead to the need to shift social norms around paternal caregiving and intra-household bargaining.

Q12. How do the Australian institutional context and data compare to European studies?

Australia’s PLIDA dataset is exceptional in combining population-level coverage, employer-employee identifiers (enabling firm-level workplace measures), and Medicare healthcare records (enabling both shock identification via chemotherapy and caregiving-intensity proxying via healthcare utilization). The employer identifiers are critical for this paper’s contribution — most comparable European studies cannot construct firm-level workplace characteristics. The Australian context differs from Nordic studies in terms of family policy generosity (less universal paid parental leave), but Medicare provides universal healthcare access. Pre-diagnosis earnings ($37,639 for mothers vs $79,702 for fathers) indicate a large pre-existing earnings gap, consistent with a majority-male breadwinner household structure in the sample.

Q13. Do fathers’ outcomes respond to any workplace factor?

In almost all specifications, fathers’ earnings, employment, and job changes show no statistically significant effects of the caregiving shock and no significant interactions with workplace characteristics (Appendix Tables A4 and A6). One exception: in Table A4, the interaction between the cancer shock and working at a firm with above-median work hours is negative and significant at the 5% level for fathers, suggesting that fathers who work in high-hours firms do experience some earnings reduction — consistent with them reducing hours in an environment that penalizes deviations from long hours. However, the authors note the effect is substantially smaller relative to baseline earnings than the corresponding maternal effect. The broader pattern implies that workplace flexibility does not appear to be the binding constraint preventing fathers from taking on more caregiving; social norms and intra-household bargaining are posited as more important.

Q14. What are the data limitations and caveats?

First, work hours at the firm and occupation levels are constructed from the 2011 Census, which is a single cross-section; work hour norms may have shifted between 2011 and the 2012–2023 sample period. Occupation and industry codes also come from the 2011 Census, so parents who changed occupation between 2011 and their baseline year may be misclassified. Second, employment status is inferred from positive ATO earnings in a financial year, a coarser measure than actual employment spells. Third, the sample is restricted to firms with at least 10 employees, which excludes small-firm workers. Fourth, the analysis uses dollar earnings levels, not log earnings, which means baseline wage differences across workplace types can affect the interpretation of absolute dollar results (though the authors show percentage effects are also larger in less-supportive workplaces). Fifth, the study identifies workplace correlates of smaller penalties but does not isolate the causal effect of any specific policy. Sixth, the paper covers only cancer requiring chemotherapy — typically more intensive cancers — so results may overstate average caregiving-shock effects.

Key Concepts

Caregiving shock: In this paper, a sudden, largely unanticipated increase in caregiving demands on parents triggered by a child’s initiation of chemotherapy. Distinguished from the chronic caregiving burden of childbirth; specifically refers to health events that arrive later in childhood and impose large, time-intensive care requirements.

Later-treated dynamic DiD: The paper’s identification design, following Fadlon and Nielsen (2019, 2021), in which the control group consists of parents who will receive the same treatment (child’s cancer diagnosis) at a later date. The control group’s placebo treatment year is set six years before their actual treatment, enabling estimation of time-path effects relative to diagnosis while accounting for pre-existing differences via individual fixed effects.

Work hour intensity: Median weekly hours worked by employees at a given firm or in a given occupation (from the 2011 Census), used as a proxy for ‘greedy job’ characteristics — workplaces that reward continuous long-hours presence and penalize deviations. High work hour intensity captures both above-full-time norms and the likely presence of evening and weekend work requirements.

Female representation in the top 20% of earners: A binary indicator equal to one when women are the majority (above 50%) of workers in the top quintile of earnings at a given firm or occupation. The paper distinguishes this from female representation in middle and lower earnings tiers to isolate the effect of women’s presence in positions with organizational power to influence workplace policies.

Supportive job: As defined operationally in this paper: a job in which the worker’s firm and occupation both have below-median work hour intensity, both have majority female representation in the top 20% of earners, and the occupation has a below-average gender pay gap. Mothers in supportive jobs suffer smaller and shorter-lived earnings penalties following a caregiving shock.

Greedy occupation: Borrowed from Goldin (2014), and used in this paper to describe occupations that disproportionately reward workers who supply long, often inflexible, hours. In the paper’s empirical framework, these are occupations with above-median work hour intensity, which are shown to amplify maternal earnings losses after a caregiving shock.

Caregiving intensity: The time-varying burden of care associated with a child’s illness, proxied in this paper by the volume of child healthcare service utilization (Medicare items: GP visits, specialist consultations, diagnostic imaging, prescriptions). Caregiving intensity peaks at year 0 (treatment initiation), declines significantly by year 2, and returns to baseline by year 3 — yet maternal earnings penalties persist beyond this return to baseline.

Business Cycle during Structural Change: Arthur Lewis' Theory from a Neoclassical Perspective

Thu, 01 Jan 2026 00:00:00 +0000

Layer 1: Overview

This paper asks why the nature of business cycles changes systematically as economies develop and shed their large agricultural sectors. The motivation is both empirical and theoretical. Empirically, countries with large declining agricultural sectors—most prominently China—exhibit business cycle patterns that depart sharply from the textbook procyclical-employment pattern seen in mature economies: aggregate employment is acyclical with respect to GDP, nonagricultural employment is strongly procyclical, agricultural employment is countercyclical, and the labor productivity gap between nonagriculture and agriculture narrows during booms. These cross-country regularities hold in a sample of 63–66 countries using ILO sectoral employment data over 1970–2015, with the correlation between aggregate employment and GDP declining monotonically as the agricultural employment share rises. The cross-country correlation between the agricultural employment share and log GDP per capita is −0.84. For China specifically over 1978–2012, the correlation between HP-filtered agricultural employment and GDP is −0.69, while the correlation for nonagricultural employment with GDP is 0.73. Agricultural employment fell from about 62.4% of total Chinese employment in 1985 to 33.6% in 2012.

The authors construct a unified neoclassical model of growth, structural change, and business cycles. The economy produces a CES aggregate of agricultural and nonagricultural output (elasticity of substitution epsilon), with agriculture itself being a CES aggregate of modern and traditional sub-sectors (elasticity omega). Modern agriculture uses capital and labor (Cobb-Douglas), whereas traditional agriculture uses only labor. This nested structure means the effective elasticity of substitution between capital and labor in agriculture is variable and declines as the traditional sector shrinks—formalizing the Lewisian surplus-labor mechanism within a neoclassical framework. A time-invariant tax wedge tau on nonagricultural wages captures rural-urban earnings gaps and keeps agriculture inefficiently large.

The deterministic model is estimated using Simulated Method of Moments on Chinese data from 1985 to 2012, targeting seven moment sequences: employment share in agriculture, capital share in agriculture, agricultural output-to-GDP ratio, agricultural expenditure share, aggregate GDP growth, the aggregate capital-output ratio path, and the change in the productivity gap. Key findings from estimation: the elasticity of substitution between agricultural and nonagricultural goods epsilon is estimated at 3.6 (significantly greater than 1 at 1% level), and the elasticity between modern and traditional agriculture omega is also very large. The estimated subsistence level in a Stone-Geary extension is small (11% of agricultural production in 1985), so nonhomothetic preferences play only a minor quantitative role. Nonagricultural TFP growth gM is estimated at 6.5% per year; modern-agricultural TFP growth gAM at 6.1% per year; traditional-sector TFP growth gS at 0.9% per year. The estimated labor wedge tau implies persistent misallocation.

Stochastic TFP shocks (VAR(1) for each of the three sectors) are then estimated from observed data by exploiting the model’s equilibrium conditions. The persistence parameters are 0.63 (nonagriculture), 0.90 (modern agriculture), and 0.42 (traditional agriculture). The model, simulated 1,000 times starting in 1980, reproduces the salient Chinese business cycle features: the standard deviation of GDP is 1.7% (matching the data), agricultural employment is countercyclical (model correlation with GDP: −0.25; data: −0.23), nonagricultural employment is strongly procyclical (model: 0.99; data: 0.73), and aggregate employment has a low correlation with GDP (model: 0.42; data: 0.10). A variance decomposition shows nonagricultural TFP shocks account for approximately 95% of GDP fluctuations.

The key mechanism is that a large traditional sector provides an elastic labor supply to nonagriculture at low marginal cost (a neoclassical Lewisian buffer). Positive TFP shocks to nonagriculture draw labor out of traditional agriculture, raising average capital intensity and labor productivity in agriculture—hence the countercyclical productivity gap. As structural change progresses and the traditional sector shrinks, this labor buffer disappears, the effective labor supply elasticity declines, and business cycle properties converge toward those of a standard neoclassical (Hansen-Prescott) economy. Out-of-sample simulations confirm this convergence: the correlation between total employment and GDP rises from around 40% to near 100% as the agricultural employment share falls below 10%. The paper also shows that positive TFP shocks in agriculture slow structural change, consistent with empirical evidence from the Green Revolution (Foster and Rosenzweig 2004; Bustos et al. 2016; Moscona 2018; Jayachandran 2006).

Elasticity estimates using CES production functions for the US, Japan, and China from consumption value-added data yield epsilon of 2.49, 1.58, and 1.70 respectively, all significantly above unity at the 1% level—supporting the labor-pull interpretation of structural change. The authors find that imposing the symmetry restriction (epsilon = epsilon_ms) used by Herrendorf et al. (2013) replicates their near-zero estimate for the US, but relaxing that restriction reveals the agriculture-nonagriculture elasticity to be large while the manufacturing-services elasticity is near zero.

In depth

Q1. What are the four key business cycle stylized facts documented for countries with large agricultural sectors?

The paper documents four regularities that hold across 63–66 countries (ILO data, 1970–2015): (1) aggregate employment is less correlated with GDP and less volatile; (2) agricultural employment is countercyclical; (3) the labor productivity gap (nonagriculture/agriculture) is negatively correlated with nonagricultural employment; (4) consumption is highly volatile relative to GDP. All four are quantitatively documented for China and compared with the US.

Q2. What is the core theoretical mechanism distinguishing this paper from earlier structural-change models?

The paper adds an internal split of the agricultural sector into modern (capital-using Cobb-Douglas) and traditional (labor-only) sub-sectors that are imperfect substitutes. This nested structure generates a variable effective elasticity of labor supply to nonagriculture: when the traditional sector is large, labor can be released to industry at near-constant marginal cost (a continuous Lewisian surplus), dampening wage and price fluctuations and decoupling aggregate employment from GDP. As the traditional sector shrinks through capital accumulation and differential TFP growth, the effective labor-supply elasticity falls, progressively transforming the economy into a standard neoclassical one.

Q3. How does the paper handle the lack of a steady state for the business cycle analysis?

Because structural change is ongoing in China, approximating the model around a balanced growth path is infeasible. The authors instead solve the model recursively over 250 periods back from an assumed one-sector asymptotic balanced growth path (ABGP), using a 27-state Tauchen Markov chain for the three TFP shocks and piecewise linear decision rules on a 75-point grid for each of the two continuous state variables (kappa and kappa-tilde). They simulate 1,000 economies and compute rolling 28-year window statistics, which are then compared to the data.

Q4. What is the identification strategy for the elasticity of substitution epsilon, and what are the main threats?

The primary strategy is Simulated Method of Moments on 143 moment conditions from Chinese data 1985–2012 (28 annual observations each for five moment series plus two level/change moments). A second strategy uses IFGNLS estimation of a Stone-Geary demand system for three countries (US, Japan, China) using both consumption value-added (Herrendorf et al. method) and production value-added (GGDC data). The main threats acknowledged: (a) endogeneity—both sides of the demand equations are driven by unobserved productivity and preference shocks with opposite sign implications (addressed by turning to exogenous Green Revolution shocks); (b) measurement error; (c) the symmetry restriction in prior work; (d) the model is closed-economy and abstracts from demand shocks.

Q5. What role do agricultural TFP shocks versus nonagricultural TFP shocks play in GDP fluctuations?

A variance decomposition shows nonagricultural TFP shocks (ZM) account for approximately 95% of GDP fluctuations in the benchmark economy over 1985–2012. The logic is that positive TFP shocks to ZM reduce misallocation by drawing labor from the (inefficiently large) agricultural sector to nonagriculture, amplifying the GDP response. In contrast, positive TFP shocks to agriculture partially offset the direct productivity gain by worsening misallocation (labor stays in agriculture), so GDP barely responds. In the low-elasticity (epsilon = 0.5) alternative model, agricultural TFP shocks account for about half of GDP fluctuations—one reason the authors reject this alternative.

Q6. How does the model’s prediction for business cycle evolution as structural change progresses compare to cross-country evidence?

Using rolling 28-year windows of simulated data from 1985 to 2185, the paper documents four monotone transitions as the agricultural employment share falls: (a) the correlation between agricultural employment and the productivity gap falls toward zero; (b) the correlation between agricultural and nonagricultural employment rises from large and negative (around −0.75 for China’s current employment share of 40–50%) toward zero; (c) the correlation between total employment and GDP rises from about 40% to nearly 100%; (d) the volatility of employment relative to GDP rises toward the level of mature economies. All four patterns match the cross-country empirical patterns documented in Figure 5.

Q7. What does the labor-push versus labor-pull debate imply for the estimated elasticity, and how is it resolved?

With epsilon > 1 (gross substitutes), nonagricultural TFP growth attracts labor from agriculture (labor pull), whereas agricultural TFP growth keeps workers on farms and slows structural change. With epsilon < 1 (complements), agricultural TFP growth would instead push workers into industry. The structural estimate epsilon = 3.6 > 1 strongly favors the labor-pull interpretation. This is confirmed by the Green Revolution evidence: Foster and Rosenzweig (2004), Moscona (2018), Bustos et al. (2016), and Jayachandran (2006) all find that positive agricultural TFP shocks slow industrialization and expand agricultural employment—consistent with epsilon > 1 and inconsistent with epsilon < 1.

Q8. What robustness checks are run on the business cycle model?

Four robustness exercises: (1) Low elasticity epsilon = 0.5 with a large food subsistence level—this version fails to generate the observed countercyclicality of the productivity gap and implies an empirically incorrect response to agricultural TFP shocks. (2) Sectoral capital adjustment costs (quadratic, kappa = 2.5)—improves the cyclical behavior of aggregate employment and consumption but makes investment too smooth. (3) Raising the persistence of traditional-sector TFP shocks to match that of modern agriculture (phi_S = phi_AM = 0.90)—reduces aggregate labor volatility and makes the relative volatility of employment monotonically increasing with development. (4) Orthogonal shocks (zero cross-sector correlation)—results are negligibly different from the benchmark. These exercises indicate that the qualitative conclusions are robust across specifications.

Q9. How is the productivity gap between nonagriculture and agriculture generated by the model, and does it match the data?

In the model, the productivity gap (nonagricultural output per worker divided by agricultural output per worker) declines with development because the traditional, labor-intensive sector shrinks, raising average labor productivity in agriculture. This is both a long-run trend prediction and a business-cycle prediction: positive TFP shocks to nonagriculture draw workers from the traditional sector, raising agricultural capital intensity and productivity, thereby reducing the gap. The model successfully captures the falling trend in the productivity gap for China. The correlation between the HP-filtered productivity gap and nonagricultural employment in the model is −0.74, close to the empirical value of −0.54 for China. The model predicts lower volatility of the productivity gap than observed in the data.

Q10. What is the estimated role of nonhomothetic preferences?

The authors extend the baseline homothetic CES model to allow Stone-Geary preferences (agricultural good as a necessity). The estimated subsistence level c-bar corresponds to only 11% of agricultural production in 1985, making the income effect through nonhomotheticity quantitatively small. The estimated epsilon falls only marginally when Stone-Geary preferences are introduced. The remaining structural parameters are virtually unchanged. The authors interpret this as evidence that, at the macroeconomic level, technological factors (TFP growth differences and capital accumulation) rather than nonhomothetic preferences are the primary drivers of structural change in China—a finding consistent with Alvarez-Cuadrado and Poschke (2011).

Q11. How does this paper relate to Acemoglu and Guerrieri (2008) and Herrendorf et al. (2013)?

The model builds on Acemoglu and Guerrieri (2008) in having capital deepening and differential TFP growth drive reallocation from agriculture to nonagriculture, but adds the traditional sector (absent in Acemoglu-Guerrieri), which generates the Lewisian surplus-labor mechanism and the declining productivity gap. With respect to Herrendorf et al. (2013): their three-sector CES model imposes a common elasticity across agriculture, manufacturing, and services, yielding a near-Leontief (epsilon near zero) estimate for the US. The authors show this estimate is an artifact of the symmetry restriction: when that restriction is relaxed, the agriculture-nonagriculture elasticity is large (2.32–2.49 for the US) while the manufacturing-services elasticity is near zero. The asymmetric three-sector estimates for the US (2.49), Japan (1.58), and China (1.70) are all above unity at the 1% significance level.

Q12. What are the main limitations and open questions?

The paper explicitly identifies several limitations: (1) the business cycle analysis is restricted to productivity (TFP) shocks only and does not include demand shocks; (2) the model is closed-economy and ignores trade; (3) the distinction between traditional and modern agriculture is not directly observed in the data—the traditional sector’s TFP process is estimated indirectly, introducing potential measurement error that may exaggerate the volatility and understate the persistence of traditional-sector shocks; (4) the prediction that agricultural value added is positively correlated with nonagricultural labor (and negatively with agricultural labor) is inconsistent with Chinese data, a failure the paper acknowledges. Future work is flagged on demand shocks and open-economy extensions.

Q13. What cross-country empirical evidence beyond China is presented?

Using ILO sectoral employment data for 63–66 countries over 1970–2015 (requiring at least 15 consecutive years of observations), the authors document: the correlation between agricultural and nonagricultural HP-filtered employment shifts from positive for countries with small agricultural sectors to strongly negative for countries with large sectors; the correlation between total employment and GDP declines monotonically with the agricultural employment share; the productivity gap is negatively correlated with nonagricultural employment in countries with large agricultural sectors (correlation of −0.54 for China) but near zero in mature economies; consumption volatility relative to GDP declines with development. The US historical time series (1929–2015) shows that before 1960 NBER recessions were associated with reversals in structural change—mirroring today’s China—while this pattern ceased after 1960.

Key Concepts

Traditional agriculture (subsistence sector): A sub-sector of the agricultural sector that uses only labor (no capital) and produces an imperfect substitute for modern agricultural output. Its presence generates a reserve pool of labor that can move to nonagriculture at low marginal cost, creating the Lewisian surplus-labor property within a neoclassical framework. As the economy develops, this sector is crowded out by capital-intensive modern agriculture.

Modern agriculture: A Cobb-Douglas sub-sector within agriculture that uses both capital and labor. Its expansion—crowding out the traditional sector—constitutes the modernization of agriculture. As workers leave the traditional sector, average capital intensity and labor productivity in agriculture rise, generating the procyclical productivity-gap pattern observed in developing economies.

Asymptotic Balanced Growth Path (ABGP): The long-run equilibrium toward which the model economy converges, characterized by a fully modernized (traditional sector vanished), small agricultural sector, constant growth rates of sectoral capitals, and standard neoclassical business cycle properties. The paper establishes conditions under which the ABGP is asymptotically stable.

Labor wedge (tau): An exogenous, time-invariant tax on nonagricultural wages that prevents equalization of marginal products of labor across sectors, standing in for a variety of frictions (migration barriers, rural overpopulation, institutional barriers) that keep agriculture inefficiently large. Its presence means that positive TFP shocks to nonagriculture both raise productivity directly and reduce misallocation by drawing workers out of the oversized agricultural sector.

Elasticity of substitution between agriculture and nonagriculture (epsilon): The elasticity governing substitution between agricultural and nonagricultural goods in aggregate CES production. When epsilon > 1 (gross substitutes, as estimated: epsilon = 3.6 for China), positive TFP shocks to nonagriculture pull labor from agriculture (labor-pull structural change), while positive shocks to agriculture slow structural change—consistent with Green Revolution evidence. When epsilon < 1 (complements), the opposite holds, implying counterfactual predictions.

Productivity gap: The ratio of average labor productivity in nonagriculture to average labor productivity in agriculture. In the model and the data this gap declines over the course of development (because agriculture modernizes and raises its average productivity) and also narrows during booms in countries undergoing structural change (because booms draw workers from low-productivity traditional agriculture). The model relates the gap formally to the ratio of labor income shares: APLM/APLG = (1−tau) × (LISM/LISA)^(−1).

Sullying effect of recessions on agriculture: The paper’s terminology for the pattern—documented empirically for China by Zhang et al. (2001)—whereby recessions induce workers to return to or remain in the agricultural sector, reversing structural change and lowering average agricultural productivity. This is the cyclical analog of the Lewisian adjustment: in downturns, the labor buffer of traditional agriculture absorbs displaced workers, cushioning aggregate employment but impairing agricultural productivity.

Carbon Pricing and Inequality: A Normative Perspective

Thu, 01 Jan 2026 00:00:00 +0000

Layer 1: Overview

This paper quantifies the sources and distributional consequences of unexpected carbon price changes for European households using a money-metric welfare framework. The motivation is stark: while carbon taxes enjoy broad support among economists, they face persistent public opposition — exemplified by Australia’s 2014 repeal, France’s 2018 Yellow Vest protests, and the 2025 rollback of Canada’s consumer carbon tax. The authors ask whether average welfare losses are unusually large, and whether the burden falls disproportionately on vulnerable groups, both questions with direct implications for understanding and reducing political resistance.

The empirical approach rests on the “feasible set approach” of Del Canto et al. (2025), which applies the Envelope Theorem to show that the first-order welfare impact of a shock on any household is fully summarized by how the shock changes the discounted present value of their future budget sets — through consumption-basket prices, labor income, financial wealth (asset prices and dividends), and government transfers. This money-metric welfare change is preference-free up to first order: behavioral responses drop out, and the measure is independent of specific utility-function assumptions. The framework is appropriate for policy shocks (supply-side) but not for preference shocks.

The geographic focus is euro-area countries (excluding the Netherlands and Austria due to data gaps) over 1999–2019. The identification strategy follows Känzig (2023): high-frequency shifts in EU ETS carbon futures prices around regulatory events affecting allowance supply are used as instruments in an external-instruments VAR to isolate plausibly exogenous carbon policy shocks. These shocks are then projected onto a wide array of household-level outcomes using local projections (Jordà 2005). The normalization throughout is a 1% increase in the HICP energy component on impact, which corresponds to roughly a 2.5-euro (or about 20%) increase in EU ETS carbon prices. Cross-sectional household budget data come from three Eurostat/ECB surveys: the Household Budget Survey (HBS, 2015 wave) for consumption baskets, EU-SILC (from 2004) for labor and transfer income by demographic group, and the Household Finance and Consumption Survey (HFCS) for household portfolio positions. Demographics are grouped by four age brackets (25–34, 35–49, 50–64, 65+), two education levels (college vs. non-college), three income brackets (bottom quartile = low, middle 50% = mid, top quartile = high), and four geographic regions (Southern, Western, Northern, Eastern Europe).

The main quantitative findings are as follows. First, aggregate welfare losses are large: a 1% carbon-policy-induced energy price increase causes an average welfare loss of approximately 250 euros, corresponding to about 0.5% of a household’s three-year consumption (68% confidence band: 0.06% to 0.94%). Second, decomposing by channel, the direct consumption-price effect accounts for 0.19% of three-year consumption (68% CI: 0.02% to 0.35%); the labor income channel for 0.43% (68% CI: –0.08% to 0.93%); the portfolio channel for –0.04% (a welfare gain; 68% CI: –0.10% to 0.01%); and the transfer income channel for –0.07% (a welfare gain; 68% CI: –0.15% to 0.02%). Labor income is thus the dominant driver — both in aggregate and in the distributional patterns.

Third, distributional heterogeneity is pervasive and statistically significant (joint F-tests reject uniformity with p-value = 0.00 across all demographic groupings). Non-college-educated households bear welfare losses of roughly 0.6% of three-year consumption, versus roughly 0.3% for college graduates — a gap concentrated in the labor income channel, not the consumption channel (which is broadly similar across groups at around 0.2%). By income, the pattern is U-shaped: young, low-income households suffer the largest losses, exceeding 1% of three-year consumption, while middle-income and older households are the most insulated; high-income households also experience significant losses (around the 0.5% average), driven by their own labor income exposure. Households aged 65 and over suffer welfare losses of only around 0.15%, largely because they are retired from the labor market.

Fourth, regional heterogeneity is stark. Southern Europe bears the highest burden, with welfare losses of 0.5% to 0.8% for working-age households; Eastern Europe also faces substantial losses; Western Europe stands at around 0.2% to 0.3%; Northern Europe is the most insulated, with losses below 0.2% and not statistically significant. The labor income channel is the primary driver of these regional differences, consistent with more rigid labor markets in Southern and Eastern Europe (stronger employment protection, less flexible wage-setting). Northern Europe is protected partly by its high share of renewable energy, which mutes the carbon-price pass-through. Eastern Europe benefited from disproportionate free ETS allowance allocations over the sample period, dampening direct price impacts.

These results collectively suggest that public opposition to carbon taxes may stem from legitimate distributional concerns rather than mere ideological resistance or ignorance. The authors conclude with three policy implications: (1) compensation schemes focused only on consumption prices will be insufficient because the dominant channel is labor income; (2) expansionary (green) monetary policy could ease the income burden, though at some inflationary cost; and (3) redistribution should run from older to younger households, since working-age groups bear the disproportionate burden while retirees are largely insulated.

In depth

Q1. What is the identification strategy for the carbon policy shock, and what are the main threats to identification?

The instrument is the high-frequency shift in EU ETS carbon futures prices around regulatory events affecting allowance supply (following Känzig 2023). The logic is that economic conditions are already priced in prior to the regulatory news, so futures-price movements in a tight window around those events reflect only policy surprises. This instrument is then used in an external-instruments VAR to identify a monthly structural carbon policy shock series (1999–2019). The local projections use 6 lags for monthly outcomes and 2 lags for quarterly outcomes, plus a linear trend and a dummy for the euro sovereign debt crisis (July 2011–March 2012). The main identification threats are: (a) if economic conditions are not fully priced into carbon futures before the regulatory events, the instrument could be correlated with macroeconomic conditions; (b) the framework assumes no preference shocks, which rules out COVID-style demand shifts; (c) the small-noise approximation underlying the feasible-set approach is less suitable for large aggregate shocks.

Q2. Why does the feasible-set approach not require specific preference assumptions, and what are its limitations?

By the Envelope Theorem applied to household optimization, first-order welfare effects depend only on how the policy changes the prices and quantities in the household’s budget constraint — not on how preferences are shaped. Behavioral responses drop out at first order. The welfare metric is money-metric: the willingness-to-pay to avoid the shock, expressed in euros (income units). Limitations: (1) It is a small-noise approximation around a zero-risk limit; large aggregate shocks are not well-handled. (2) It is valid for shocks from the production or policy side but not for preference shocks (e.g., discount rate changes). (3) Accounting properly for idiosyncratic risk requires covariance weights (Theta terms in Proposition 1 of the appendix); Del Canto et al. (2025) estimate these at –0.1 to –0.4, implying somewhat attenuated welfare levels but no meaningful change to the distributional comparisons. (4) Carbon emissions-reduction benefits are excluded from the welfare calculation by design, since the paper focuses on the pecuniary costs side only.

Q3. What is the mechanism behind the labor income channel, and how does it vary across demographic groups and regions?

Carbon price increases raise production costs for energy-intensive sectors, reduce output and employment, and depress aggregate wages — a general equilibrium effect that transmits to household labor income over multiple quarters. The average labor income response peaks at around 1% below trend. For non-college-educated households the peak fall exceeds 1%, while for college graduates the response is more muted. By income group, low-income households face the sharpest falls — around 2–4% over the three-year horizon — whereas middle-income households fall by approximately 0.5–1% and high-income households by about 1%. These effects are larger than those estimated by Del Canto et al. (2025) for oil price shocks on US households (approximately 0.3% welfare loss from labor income after a 10% oil price increase), which the authors attribute to more rigid European labor markets: strong employment protection limits wage cuts but discourages hiring and prolongs unemployment spells, amplifying extensive-margin adjustments. In Southern and Eastern Europe, rigidities are most pronounced, generating the largest regional labor-income responses. Northern and Western Europe show more muted responses.

Q4. What is the role of the portfolio channel, and who gains or loses through it?

Stock prices fall by a peak of about 5% and dividends decline by about 3% after a carbon policy shock. Bond prices initially decline then partially recover. House prices decline substantially but with a lag. The welfare effect of asset price changes depends on whether a household is a net buyer or net seller of the asset. Younger households in the accumulation phase gain from falling asset prices (they can buy cheaply); older households planning to dis-save lose. The portfolio channel is quantitatively modest: average welfare gain of about 0.04%, most pronounced for younger college-educated households. The channel is not large enough to offset labor income or consumption-price losses for any group.

Q5. What is the role of the transfer income channel, and which groups benefit most?

Transfer income — which the paper splits into inflation-indexed pension income and other government transfers (unemployment, sickness, disability, education benefits) — generates a welfare gain of about 0.07% on average. Pensions are indexed to inflation and rise as carbon pricing lifts headline prices; this benefit accrues primarily to older households (aged 65+), who have large pension income. Other transfers show an increase post-shock but the responses are generally not statistically significant at conventional levels. High-income households show a negative transfer response. Northern and Southern Europe benefit more from the transfer channel, consistent with more generous welfare programs; Eastern Europe shows little or negative transfer response, consistent with weaker automatic stabilizers.

Q6. What is the U-shaped pattern of welfare losses by income, and what explains it?

The paper finds that low-income and young households suffer the largest losses (exceeding 1% of three-year consumption), middle-income and older households are most insulated, and high-income households also face significant losses (broadly around the 0.5% average). The U-shape arises from the labor income channel: low-income households are concentrated in sectors and employment types most exposed to carbon pricing contractions; high-income households also have substantial labor income (in absolute terms) that contracts; middle-income households appear more buffered, possibly due to sector composition or greater employment stability. The consumption channel contributes approximately uniformly across income groups (around 0.2%), so does not generate the U-shape.

Q7. How does this paper differ methodologically from prior distributional studies of carbon taxes?

Prior work such as Andersson and Atkinson (2020) and Beznoska et al. (2012) focused on direct consumption-price incidence, following Poterba (1989) and using static input-output methods or cross-sectional spending data to estimate first-round price effects. The present paper differs in three ways: (1) it instruments for unexpected carbon price shocks, isolating exogenous variation; (2) it incorporates indirect channels — labor income, asset prices, and transfers — in addition to direct consumption prices; (3) it estimates dynamic IRFs directly, capturing the persistence of effects over a three-year horizon. The key novel finding is that indirect labor income effects are the dominant driver of both the level and the distribution of welfare losses, and that neglecting these indirect channels substantially understates both the size and the regressiveness of carbon pricing.

Q8. Why are regional differences in welfare loss so large, and what drives Northern Europe’s relative insulation?

Regional differences are driven primarily by differential pass-through from carbon prices to consumer prices and by differential labor market rigidity. Northern Europe sources a large share of energy from renewables, so a carbon price increase has a smaller pass-through to domestic energy costs. Eastern Europe was allocated disproportionate free ETS allowances over the 1999–2019 sample period, also dampening direct price impacts — consistent with Känzig and Konradt (2024). Southern and Eastern Europe have more rigid labor markets (stronger employment protection, less flexible wage-setting), amplifying the labor-income contraction. Northern and Western Europe have more flexible labor markets. Additionally, Northern and Southern Europe have more generous welfare programs that partially cushion losses via the transfer channel; Eastern Europe lacks this buffer.

Q9. What data sources does the paper combine, and what are the key sample restrictions?

The paper combines three Eurostat/ECB household surveys: (1) the Household Budget Survey (HBS), 2015 wave, for consumption basket shares by COICOP categories for demographic groups; (2) EU-SILC (2004 onward for some countries, 2005 for most) for annual labor income and transfer income time series by group, converted to quarterly frequency via Chow-Lin interpolation; (3) HFCS (conducted every 4 years by the ECB) for household portfolio positions. Time-series macro data on HICP components, house prices, bond prices, stock prices, and dividends come from Eurostat and ECB/Bloomberg. The sample covers euro-area countries (excluding Netherlands and Austria for data reasons) over 1999–2019. Households are restricted to ages 25–75; top and bottom 1% by net worth are excluded from portfolio statistics. The base year for all life-cycle variables is 2015.

Q10. What are the policy implications and their scope conditions?

Three main implications are drawn: (1) Public resistance to carbon taxes is not merely ideological — the estimated welfare losses are sizable (about 0.5% of three-year consumption for a 1% energy-price increase), so opposition reflects genuine economic concerns. (2) Standard compensation via energy-bill rebates or consumption-basket adjustments is insufficient because the dominant channel is labor income (0.43% vs. 0.19% for consumption). Compensation schemes should include labor-market policies; the authors also suggest expansionary (green) monetary policy as a tool to ease the income burden, though at some inflationary cost. (3) The intergenerational dimension is important: working-age households (especially young, less-educated, lower-income ones) bear the brunt while retirees are largely shielded. Redistribution should run from old to young, not just from rich to poor. Scope conditions: the estimates are derived from the EU ETS context (European carbon market, euro area, 1999–2019), rely on a small-shock linear approximation, and focus on short-to-medium-run impacts (three-year horizon). The benefits of reduced carbon emissions are excluded from the welfare calculation.

Q11. How does the paper handle inference given the short time series and estimation uncertainty?

The sample runs from 1999 to 2019, which is relatively short for the IRF exercises. The paper reports 68% and 90% confidence bands throughout (rather than the conventional 95%), using the lag-augmentation approach of Montiel Olea and Plagborg-Møller (2021) to account for serial correlation. For the money-metric welfare calculations, inference uses a parametric bootstrap that draws from the estimated distribution of IRFs (assuming block-wise uncorrelatedness across variables, justified by low cross-residual correlations averaging 0.16). Cross-sectional group shares are treated as given. The authors explicitly acknowledge considerable uncertainty: the 68% confidence band on the aggregate welfare loss spans 0.06% to 0.94%. They conduct joint F-tests for homogeneity of welfare effects across demographic groups; in all cases the null is rejected with p-value = 0.00. Only 68% bands are reported for welfare calculations given short sample and estimation uncertainty.

Q12. What are the heterogeneous labor income IRF magnitudes for different groups, and are they statistically significant?

Average labor income falls by about 1% at the peak (imprecisely estimated). Non-college-educated peak fall exceeds 1%; college-educated peak fall is more muted. By income group: low-income households see falls of roughly 2–4% over three years; high-income households see a fall of about 1%; middle-income households fall by approximately 0.5–1%. These effects are noted to be larger than analogous results for oil shocks in the US (Del Canto et al. 2025), attributed to European labor market rigidity. The responses are described as featuring ‘a considerable degree of persistence but only imprecisely estimated’ at the average level. The welfare calculations based on these IRFs have wide confidence bands, reflecting this imprecision.

Q13. What are the consumer price dynamics following a carbon policy shock?

Energy prices (HICP energy component) rise by 1% on impact and remain elevated for approximately one year before returning toward baseline. Housing and utilities experience a significant, persistent increase, remaining approximately 0.5% above baseline three years after the shock. Transport prices increase by 0.5% on impact but revert within a year. Food prices rise to a lesser extent. Restaurants and hotels, recreation and culture, and clothing also show significant impact-period increases, though most effects become insignificant after 12 months. Two exceptions at 12 months: housing and utilities remain significantly elevated; education and communication prices actually fall, possibly reflecting adverse general-equilibrium wage and employment effects.

Q14. How is the welfare analysis limited to short-to-medium-run effects, and what longer-run effects are left unaddressed?

The welfare calculations are restricted to a three-year horizon because statistical power in the local projections declines beyond that point given the available sample (1999–2019). The paper explicitly notes that the estimates may miss unemployment hazard effects (i.e., transitions into and out of employment), borrowing cost effects induced by carbon taxes, and any long-run structural adjustments (sectoral reallocation, green investment, capital formation). The benefits of reduced carbon emissions — which may be very large in welfare terms but are realized over much longer horizons — are also excluded by design.

Key Concepts

Feasible Set Approach: A welfare-measurement methodology (from Del Canto et al. 2025) that applies the Envelope Theorem to show that the first-order welfare impact of any shock on a household equals the change in the discounted present value of that household’s budget set — encompassing consumption prices, labor income, asset income, and transfers. The measure is preference-free at first order and is expressed in money-metric (income-equivalent) units.

Money-Metric Welfare Loss: In this paper, the number of euros a household would be willing to pay to avoid exposure to the carbon policy shock, computed as a share of total three-year consumption. It is derived from the feasible-set formula and expressed in income units, making it directly interpretable and comparable across demographic groups without requiring preference parameters.

Carbon Policy Shock: An exogenous, unexpected change in carbon prices driven by regulatory events affecting the supply of EU ETS emission allowances, identified via high-frequency shifts in carbon futures prices around those events used as instruments in an external-instruments VAR. Distinguished from demand-driven carbon price fluctuations correlated with the business cycle.

Labor Income Channel: The indirect welfare effect of a carbon price shock that operates through general-equilibrium changes in aggregate wages and employment. It is the dominant welfare channel in the paper (0.43% of three-year consumption on average, versus 0.19% for direct consumption-price effects), and the primary driver of both the aggregate welfare loss and the distributional heterogeneity across education, income, and regional groups.

Consumption Channel (Direct Effect): The welfare impact arising from higher prices for goods in the household’s consumption basket following a carbon price increase. Weighted by the household’s nominal expenditure on each good. Broadly similar across demographic groups (clustering around 0.2% of three-year consumption), so it does not generate the observed distributional heterogeneity — in contrast to the labor income channel.

Portfolio Channel: The welfare effect transmitted through changes in asset prices (equities, bonds, housing) after a carbon shock. The sign depends on whether a household is a net buyer or net seller of the asset: younger households in the accumulation phase gain from falling asset prices; older households in the dis-saving phase lose. Quantitatively small on average (net welfare gain of about 0.04%), most pronounced for younger, college-educated households.

Transfer Channel: The welfare effect operating through government transfer income (unemployment and other social benefits) and inflation-indexed pension payments. Because pensions are indexed to the price level, carbon-induced inflation raises pension income and benefits older households. Other transfer income tends to rise post-shock but the responses are generally imprecisely estimated. On average the channel generates a modest welfare gain (about 0.07% of three-year consumption), primarily for the elderly.

Greenflation: The phenomenon, documented empirically by Bettarelli et al. (2025) and referenced in this paper, whereby carbon-tax shocks contribute to broader consumer price inflation beyond the direct energy-price impact — through pass-through to housing, transport, food, and other categories, and by raising inflation expectations and triggering tighter monetary policy, which in turn depresses bond and house prices.

Diet, Economic Development and Climate Change

Thu, 01 Jan 2026 00:00:00 +0000

Layer 1: Overview

Food production accounts for roughly one-third of global greenhouse gas (GHG) emissions, and richer nations contribute disproportionately through meat-intensive diets and input-intensive farming. This paper asks how much of that disparity will be exported to the developing world as it grows, and which policies can most cost-effectively reduce agricultural emissions during that transition. The answer requires separately identifying two distinct channels—demand-side dietary change and supply-side technological change—and tracing their general equilibrium consequences through global food markets.

The authors build a quantitative multi-country general equilibrium model calibrated to 90 countries (plus a rest-of-world aggregate) and 47 food products for 2010. The demand side features nested non-homothetic CES preferences, which allow income elasticities to differ across food products—the core mechanism of the nutrition transition. The supply side, built on Farrokhi and Pellegrina (2023), operates at a granular grid-cell level covering the Earth’s surface, with producers on each plot choosing both which crop to grow and whether to use a modern, input-intensive (higher-GHG) technology or a traditional, labor-intensive one—the core mechanism of agricultural modernization. GHG emissions are tracked from both production and transportation. Data on calorie intake come from FAO Food Balance Sheets; emissions from Poore and Nemecek (2018) and EDGAR-FOOD; yields from FAO-GAEZ (approximately 1.1 million fields).

A key methodological contribution is an identification result for income elasticities that requires no price data. In open-economy models, trade shares provide a sufficient statistic for consumer prices, so the model’s implicit Marshallian demand equations can be estimated using only expenditure shares and bilateral trade flows—a cleaner identification than prior closed-economy approaches. Structural elasticity estimates are validated against reduced-form regressions that regress product-level log absorption on log GDP per capita interacted with the product’s GHG intensity; the cross-method correlation has a slope of 0.64–0.77 and R² of 0.93–0.95.

Four empirical patterns motivate the model. First, diet composition alone drives large variation in emissions: if the whole world adopted the US diet (holding total calories fixed), the food share of global GHG emissions would rise from 30% to 42%; adopting the Argentinian diet would raise it to 74%; adopting the Ethiopian diet would lower it to 12%. Second, GHG emissions per capita from food rise strongly with GDP per capita (elasticity 0.39 in the cross-section); about one-third of this is a pure scale effect (more calories) and two-thirds is a compositional shift toward higher-emission foods (elasticity of emissions per calorie with respect to GDP per capita is 0.23–0.28). Third, products with higher GHG emissions per calorie have higher income elasticities; a 1% rise in a product’s GHG intensity is associated with a 0.17–0.21% higher income elasticity, robust to excluding all meat products. Fourth, emissions from fertilizers and energy use as a share of total agricultural emissions rise with GDP per capita (slope 0.82), indicating that agricultural modernization independently amplifies GHG emissions within each crop.

Model decompositions reveal that about two-thirds of the cross-sectional correlation between food emissions per capita and GDP per capita is attributable to intrinsic dietary preferences (culture, religion, demographics) rather than to income itself, and about one-half of the correlation for emissions per calorie. This implies that the causal effect of economic growth on emissions is substantially smaller than raw correlations suggest.

Policy counterfactuals (Table 4) are the paper’s centerpiece. A uniform 10% TFP shock across all modern agricultural, non-agricultural, and input producers raises global welfare by 14.9% and increases global agricultural GHG emissions by 5.0% (approximately 0.6 Gt CO₂ from production, 0.004 Gt from transport). Shutting down the nutrition transition channel reduces this emission increase by 28%; shutting down agricultural modernization reduces it by a further 16%; shutting both down reduces it by 42%—so the two mechanisms together account for more than one-third of the growth-induced emission increase. Crucially, ignoring general equilibrium supply responses would overstate the emission impact of economic growth by 100%: higher food demand raises production prices, which dampens both consumption growth and further technology adoption.

For dietary restrictions: a global no-beef mandate would reduce agricultural GHG emissions by 20%, at a global welfare cost of 0.6%, with large concentrated losses in major beef-producing and consuming countries (Argentina −3–5%; Uruguay −4%). A global vegetarian mandate would reduce emissions by 30% (approximately the same 20% figure is given in the abstract with apparent inconsistency but Table 4 column 3 shows −20% for no-beef and −30% for vegetarian), at a welfare cost of 2.8% globally and with greater inequality impacts for developing countries. Back-of-the-envelope calculations that ignore general equilibrium overstate the emission reductions from dietary restrictions by roughly one-third.

For food trade policy: raising trade costs enough to cut transportation emissions by 75% reduces total agricultural GHG emissions by 11.9%, but at a global welfare cost of 17.8%—a ratio far worse than dietary policies. The welfare loss is highly unequal: countries in the bottom quartile of the GDP per capita distribution face welfare losses of up to 41% (the abstract states this figure; Table 4 col. 2 shows the Q4/Q1 inequality worsening by 4.9 percentage points in the eat-local scenario). The conclusion is that dietary policies dominate food trade policies on both effectiveness and equity grounds.

Transportation emissions account for only about 5% of agricultural GHG (0.7 Gt CO₂ vs. 16.5 Gt from production), so policies targeting transport emissions alone have limited aggregate impact.

In depth

Q1. What is the core identification strategy for income elasticities, and why is it novel?

Standard non-homothetic CES estimation requires price data because the demand equation depends on price indices. In a closed economy this problem is severe. The authors show that in an open economy, bilateral trade shares provide a sufficient statistic for variety price indices: averaging trade shares across a country’s import partners yields a geometric mean of production prices that can be differenced out using fixed effects. The key estimating equation (40) regresses an adjusted expenditure share on log income per capita, with fixed effects absorbing production-price variation through the set of import partners. No price data is needed. This is exact—not an approximation—unlike the approximate methods in Comin et al. (2021) or Caron and Fally (2022), which either impose additional assumptions about price variation across consumer groups or require proxies for crop-specific trade costs such as gravity variables.

Q2. What are the main threats to identification and how are they addressed?

The key concern is that income is correlated with prices and preference shifters that also affect food expenditure shares. In the reduced-form regressions (equation 1), country-year and product-year fixed effects control for country-specific factors (including regional technology change) and global product-specific factors (including product-specific technological progress). In the structural estimation (equation 40), the model’s functional form is used to control fully for endogeneity arising through prices, since trade shares substitute out unobservable price indices exactly. The close agreement between reduced-form and structural income elasticity estimates (slope 0.64–0.77, R² 0.93–0.95 in cross-validation) is reassuring that the two quite different identifying assumptions yield similar results. One remaining concern is unobservable preference shifters (ai,k and ã_i,s), which appear as residuals; identification requires income variation orthogonal to these shifters, and the authors follow the precedent of assuming fixed effects are sufficient. Household-level data from Brazil’s Consumer Expenditure Survey (POF) bolster the reduced-form patterns using within-country income variation.

Q3. How are the nutrition transition and agricultural modernization distinguished empirically and in the model?

These are fundamentally different economic mechanisms. The nutrition transition operates through demand: as incomes rise, consumers shift toward food products that, for reasons of taste or nutrition, happen to have higher GHG emissions per calorie. It is a between-product phenomenon captured by non-homothetic income elasticities. Agricultural modernization operates through supply: as wages rise, producers substitute away from labor-intensive traditional technologies toward input-intensive modern technologies (fertilizers, machinery) that emit more GHG per calorie of output, for any given crop. It is a within-product phenomenon captured by the endogenous technology-choice margin in the agricultural production model. In the counterfactual decompositions, the authors shut down each channel independently: the nutrition transition is shut down by setting all within-sector income elasticity parameters (ε_k) equal; agricultural modernization is shut down by fixing the land share in each technology exogenously. Doing so reveals that the nutrition transition accounts for 28% and modernization for 16% of the emission increase from a 10% TFP shock (jointly 42%), with the remainder attributable to scale effects and general equilibrium price responses.

Q4. What is the role of general equilibrium supply responses and why do they matter so much?

A central finding is that ignoring supply-side equilibrium price responses would overstate the emission impact of economic growth by 100%. The mechanism is straightforward: economic growth raises income and thus food demand, which pushes up production prices (because agricultural supply is upward-sloping due to limited land and heterogeneous productivity across grid cells). Higher prices dampen consumption, which partially offsets the demand-driven emission increase. For dietary restriction policies, back-of-the-envelope calculations that simply remove the GHG attributable to banned food products overstate the emission reduction by roughly one-third, because consumers substitute toward other food products and global agricultural production reorganizes. The model’s general equilibrium structure is therefore essential for obtaining credible policy counterfactuals, and a main conclusion of the paper is that the literature’s existing back-of-the-envelope calculations in environmental science substantially overstate both the emission risks from growth and the emission benefits from dietary policies.

Q5. What heterogeneity is documented across countries and products?

Across countries: diet composition varies enormously. Counterfactual calculations show that if all countries adopted the Argentinian diet (holding total calories fixed), the global food share of total emissions would rise to 74%; adopting the Ethiopian diet would lower it to 12%, compared to the factual 30%. The income elasticity of the agricultural sector as a whole is 0.39, close to Comin et al. (2021)’s 0.37. Rich countries have a higher share of modern technology in production, higher fertilizer and energy use per unit of land, higher food GHG per capita, and higher food GHG per calorie. About two-thirds of the cross-sectional gradient in food GHG per capita is attributable to intrinsic preferences rather than income per se. Religion is documented as one driver: Islamic-majority countries show lower preference for pork; Hindu-majority countries show higher preference for lamb, mutton, and poultry relative to other meats. Across products: GHG emissions per 1,000 kcal range from above 35 kg CO₂ for beef and coffee to below 5 kg CO₂ for wheat and rye. Income elasticity parameters (ε_k) range from lowest for staples (yams, sweet potatoes, millet, sorghum, rice) to highest for luxury fruits and vegetables (berries, asparagus, cucumbers, watermelon). Notably, the income-GHG gradient persists after excluding all meat products: vegetables and fruits have higher GHG per calorie than staples, so the nutrition transition is broader than a simple meat-consumption story.

Q6. How do the diet restriction and food trade policy counterfactuals compare on welfare and effectiveness?

Diet restriction (no-beef): global GHG emissions fall 20%, global welfare falls 0.6%. The welfare effect is highly concentrated—Argentina experiences −3–5% welfare loss, Uruguay approximately −4% in the no-beef scenario, because they are large meat producers and exporters. Inequality between rich (Q4) and poor (Q1) countries worsens by 1.0 percentage point. Diet restriction (vegetarian): global GHG emissions fall 30%, global welfare falls 2.8%. Inequality worsens by 6.0 percentage points, indicating developing countries bear more of the cost because a larger share of their income goes to food, and their income sources (agriculture) are more directly affected. Food trade policy (’eat local’, raising trade costs to cut transportation emissions by 75%): global GHG emissions fall 11.9%, but global welfare falls 17.8%—roughly 25–30 times the welfare cost per percentage point of emission reduction compared to dietary policies. Inequality worsens substantially more: Q4/Q1 ratio worsens by 4.9 percentage points. Countries in the bottom GDP quartile face welfare losses up to 41%. The paper concludes that dietary restrictions are both substantially more effective in reducing GHG emissions and far more equitable in their welfare consequences than food trade policies.

In the 2010 data, GHG emissions from food transportation account for approximately 5% of total agricultural GHG (0.7 Gt CO₂ out of approximately 17.2 Gt total). Production accounts for 95% (16.5 Gt CO₂). This has two implications. First, in the economic growth counterfactual, transportation emissions increase by 2.2%, but because transportation is only 5% of total, its contribution to total emission growth (0.004 Gt) is negligible. Second, it implies that policies targeting food ‘food miles’ or local eating are poorly targeted: even a dramatic 75% reduction in transportation emissions only mechanically eliminates 4.6% of total agricultural GHG, and the actual general equilibrium reduction (11.9%) comes mostly from production effects (agricultural trade restrictions reduce global production and consumption), accompanied by very large welfare costs.

Q8. What robustness checks and validation exercises are conducted?

The paper provides several validation exercises. (1) The reduced-form income elasticity regressions are run both with all crops and excluding all meat products (beef, lamb and mutton, pig meat, poultry), yielding nearly identical coefficients of 0.176 and 0.175 (columns 1 and 2 of Table 1), and with country-year and product-year fixed effects (columns 3–4), showing similar results across specifications. (2) The structural income elasticities are compared to the reduced-form estimates, with a cross-method slope of 0.64–0.77 and R² of 0.93–0.95, reassuring given the two methods make different identifying assumptions. (3) Model fit is checked against six untargeted empirical regularities (Figure 6): declining agricultural employment share, rising input cost share, rising modern technology land share, rising food GHG per capita, rising calories per capita, and rising food GHG per calorie—all with GDP per capita. The model matches the sign and approximate magnitude of each relationship. (4) Household-level estimates using Brazil’s POF survey replicate the cross-country finding that higher-GHG products have higher income elasticities, controlling for fixed effects, food price proxies, and excluding meat. (5) The decomposition of the cross-sectional income-emissions gradient shows that equalizing comparative advantage (column 3) or trade costs (column 4) across countries leaves the gradient approximately unchanged, supporting the focus on preferences and technology.

Q9. How does this paper relate to prior work and where does it depart from it?

The paper sits at the intersection of several literatures. It builds on Farrokhi and Pellegrina (2023) for the granular grid-cell production model with technology choice; on Costinot, Donaldson, and Smith (2016) for the agricultural field structure; and on Comin, Lashkari, and Mestieri (2021) for non-homothetic CES preferences and the identification of income elasticities. Key departures: (a) Relative to Comin et al. (2021), the authors extend identification to nested CES preferences and to an open-economy without requiring price data—their method is exact rather than approximate. (b) Relative to the environmental science literature (e.g., Hoolohan et al., 2013; Perignon et al., 2017; Tilman et al., 2011), the paper endogenizes general equilibrium supply responses, which the authors show dramatically attenuate the effect of both income growth and dietary policies on emissions. (c) Relative to prior quantitative spatial models of climate change (e.g., Shapiro 2016 on trade costs and CO₂), this paper focuses on agricultural emissions specifically and introduces nutrition transition and technology choice. (d) The authors claim to be the first to analyze both dietary restrictions and food trade policies on agricultural emissions within quantitative trade models. (e) Relative to Chen et al. (2022), who use a computable general equilibrium model with general equilibrium supply adjustments, this paper includes far more food products (47 vs. their smaller set) and endogenizes technology choice, both of which are quantitatively important for capturing the nutrition transition.

Q10. What is the paper’s mechanism for why vegetable and fruit consumption also raises GHG emissions as income rises, even without meat?

The paper notes in footnote 1 that the positive correlation between income elasticities and GHG emissions per calorie persists even when meat products are excluded from the sample (Table 1, columns 3–4). The reason is that vegetables and fruits—which become more preferred as countries grow richer—emit more GHG per calorie than staple foods such as yams and potatoes. Staples require little processing or refrigeration and are typically produced with traditional, low-input technologies. By contrast, fresh fruits and vegetables (especially high-value items such as berries, asparagus, grapes, and coffee) require more energy-intensive transportation, storage, and sometimes greenhouse production. This means that the nutrition transition generates rising emissions not merely through the beef channel emphasized in much of the public debate, but through a broader shift away from calorie-dense staples toward diverse, lower-calorie-density products that happen to have higher GHG footprints per calorie.

Q11. What does the model imply about the Environmental Kuznets Curve for food emissions?

The paper explicitly tests for and finds no evidence of an Environmental Kuznets Curve (EKC) in food emissions—that is, no inverse-U shape in which emissions per capita eventually decline as countries become very rich, as might be expected if wealthy nations adopt more sustainable diets or stricter environmental regulations. The income-emission relationship is found to be approximately log-linear across all levels of development (footnote 8). This is consistent with the broader empirical literature on the EKC (cited survey by Dinda, 2004). The implication is that there is no automatic ‘greening’ of diets as countries develop; active policy intervention would be needed.

Q12. How is economic development modeled in the policy counterfactuals, and what are the scope conditions?

Economic development is modeled as a uniform 10% increase in TFP for three types of agents: (i) modern agricultural producers, (ii) non-agricultural producers, and (iii) agricultural input producers (fertilizers, machinery, pesticides). Traditional agricultural technology is not subject to productivity growth, following Gollin, Parente, and Rogerson (2007). This creates both income effects (via higher wages) and substitution effects (via changes in relative input prices that favor modern, input-intensive technology). The scope conditions are important: the results apply specifically to a uniform global TFP shock, not to individual-country development. For individual-country TFP shocks, the analytical decomposition (equation 34) shows that general equilibrium income spillovers to foreign countries can attenuate the nutrition transition if foreign incomes fall (e.g., due to terms-of-trade effects). The model does not incorporate dynamics (it is a static model calibrated to 2010), so it cannot directly speak to transition paths or time horizons for emission convergence.

Q13. What are the welfare implications for developing countries under different policies, and why do dietary policies dominate?

Under economic growth (10% TFP shock), global welfare rises 14.9% with a modest increase in Q4/Q1 inequality of 0.4 percentage points, indicating relatively even welfare gains. Under no-beef, global welfare falls 0.6% but inequality worsens by 1.0 pp; under vegetarian, welfare falls 2.8% and inequality worsens by 6.0 pp—developing countries lose more because more of their income is spent on food and the agricultural sector is a larger share of their economy. Under eat-local (food trade restrictions), welfare falls 17.8% and the Q4/Q1 ratio worsens by 4.9 pp, with countries in the bottom GDP quartile facing losses up to 41%. The stark dominance of dietary policies over trade policies reflects two structural features: (a) food trade restrictions reduce the gains from comparative advantage in food production, which are particularly large for food-exporting developing countries; and (b) the welfare cost per unit of GHG reduction is far higher for trade policies because they distort production allocation without addressing the underlying demand-side emissions driver.

Key Concepts

Nutrition Transition: As defined and used in this paper: the demand-side process by which rising income causes consumers to shift their caloric intake away from staple foods (yams, potatoes, rice, millet) toward food products with higher GHG emissions per calorie (meats, fruits, vegetables, coffee). The transition is captured in the model by non-homothetic income elasticity parameters ε_k that are higher for more emissions-intensive products and is operative even after excluding all meat products.

Agricultural Modernization: As defined and used in this paper: the supply-side process by which rising wages induce producers to substitute from traditional, labor-intensive agricultural technology (τ=0, no purchased intermediate inputs) toward modern, input-intensive technology (τ=1, fertilizers, machinery, pesticides), which emits more GHG per calorie of output. This operates within each crop and is captured in the model by endogenous technology choice at the plot level.

Non-Homothetic CES Preferences (Nested): A three-tier preference structure in which the expenditure share of a food product k depends on income through a product-specific parameter ε_k that governs how fast the product’s preference weight grows with utility. Products with higher ε_k have higher income elasticities; the overall income elasticity of the agricultural sector (0.39 in this paper’s calibration) is an expenditure-weighted average of the ε_k values. The nested structure allows the agricultural sector’s income elasticity relative to non-agriculture to be determined separately from the income elasticities of individual food products within agriculture.

Implicit Marshallian Demand: The demand equation derived from non-homothetic CES preferences by substituting out unobservable price indices using a base good, yielding a demand specification that depends on observable expenditure shares and income rather than on prices directly. In this paper’s open-economy extension, trade shares further substitute out unobservable variety price indices, making the estimation equation fully price-data-free.

GHG Emission Intensity (per calorie): In this paper: the parameter φ_k (crop-specific) and φ_τ (technology-specific), where φ_kτ = φ_k × φ_τ is the kg CO₂-equivalent emitted per 1,000 kcal of crop k produced under technology τ. This is the key cross-product heterogeneity that, combined with income elasticity heterogeneity, drives the environmental consequences of the nutrition transition. In the data: ranges from below 5 kg CO₂ per 1,000 kcal for wheat and rye to above 35 kg for beef and coffee.

Grid-Cell Production Model: A representation of the agricultural supply side in which the Earth’s land surface is divided into approximately 1.1 million fields (FAO-GAEZ), each with agro-climatically determined potential yields by crop and technology that are independent of market conditions. Within each field, a continuum of plots is allocated to crops and technologies via Fréchet productivity draws, yielding smooth aggregate supply functions and allowing for realistic specialization patterns and technology gradients across geography.

Back-of-the-Envelope (Demand Mechanism) Benchmark: In this paper: a partial-equilibrium counterfactual calculation that takes observed or baseline food demand quantities and simply attributes changes to them from a policy without allowing supply prices, production, or trade flows to adjust. The paper systematically compares model general equilibrium results against this benchmark (column 9 of Table 4) to quantify how much supply-side adjustments matter, finding that the back-of-the-envelope approach overstates the emission impact of economic growth by approximately three times, and overstates the emission reduction from dietary policies by roughly one-third.

Dispersion Over the Business Cycle: Passthrough, Productivity, and Demand

Thu, 01 Jan 2026 00:00:00 +0000

Layer 1: Overview

Carlsson, Clymo, and Joslin use Swedish manufacturing firm-level microdata for 1998–2013 to separately identify and characterize the cyclical behavior of physical productivity (TFPQ) shocks and demand shocks at the firm level, two forces that are observationally equivalent under the standard CES-demand benchmark. The paper’s central contribution is threefold: it documents new empirical facts about dispersion cyclicality, estimates a non-constant-elasticity (non-CES) demand curve directly from firm-level price and quantity data, and embeds those estimates into a quantitative heterogeneous-firm model to study the aggregate consequences of each type of dispersion shock.

The data combine four Swedish register sources: the Företagens Ekonomi (FEK) survey for bookkeeping variables; the Industrins Varuproduktion (IVP) survey for 8-digit product-level price and quantity data used to construct firm-level price indices; the Konjunkturstatistik för Industrin (KFI) survey for quarterly capacity-utilization data; and additional investment deflators. The unbalanced panel contains 3,181 unique manufacturing firms and 15,044 firm-year observations. TFPQ is measured using a Cobb-Douglas value-added production function with factor utilization adjustment; factor elasticities are estimated via cost shares at the 2-digit sector level, yielding an average labor share of 0.735.

Demand is estimated using the Gopinath-Itskhoki-Rigobon (GIR) flexible demand curve, which nests CES as the limiting case. TFPQ innovations instrument for price in a second-order approximation, following Foster, Haltiwanger, and Syverson (2008). The main-sample estimates yield theta = 2.94 (average elasticity) and eta = 4.27 (super-elasticity), both significant at the 1% level. The second-order price term is statistically significant at the 5% level in all three samples, decisively rejecting CES. These estimates imply that a 5% price increase raises the demand elasticity from 2.94 to 3.74, while a 5% price reduction reduces it to 2.42, creating a “real rigidity” in the sense of Ball and Romer (1990): raising price loses many customers while lowering it gains few.

Incomplete passthrough of TFPQ shocks is a central empirical finding. OLS estimates yield beta_z = -0.124; first-difference estimates yield -0.097. Even in the subsample of firms that adjusted all product-level prices in a given year, TFPQ passthrough remains near -0.10, ruling out Calvo or menu-cost price stickiness as the sole driver. Longer-horizon (two- and three-year) first-difference regressions produce similar estimates, ruling out Rotemberg gradual adjustment as well. The non-CES demand curve alone implies a static-optimal passthrough of theta/(theta + eta) = 3/(3 + 4.3) = 41%, so real rigidity explains most of the incompleteness even before accounting for adjustment costs. Demand shocks pass through to prices at a rate of 0.209-0.235, a non-zero result rationalized in the quantitative model by input adjustment costs.

On cyclicality of dispersion, both TFPQ and demand shock dispersion are countercyclical, but demand dispersion rises by more and is more robust across recession episodes. In 2009 (the Great Recession), the IQR of demand shock growth was 56% above its non-recession average, while the IQR of TFPQ shock growth rose 36%. Sales dispersion rose 58% (IQR) in 2009. A semi-structural variance decomposition shows that demand shocks account for 63% of average sales growth dispersion and approximately 80% of its increase in 2009; TFPQ dispersion contributes only marginally to sales dispersion because the TFPQ variance is shrunk by a factor of roughly 25 on its way to sales growth through the chain of low passthrough and demand elasticity. Demand accounts for about 50% of average price growth dispersion and 40% of its cyclical increase in 2009; TFPQ accounts for about 10% of price dispersion on average.

The quantitative heterogeneous-firm model extends Bloom (2009) and Bloom et al. (2018) to continuous time with both TFPQ and demand shocks, non-CES demand (theta = 3, eta = 4.3 from the estimates), and non-convex input adjustment costs on a composite scale factor covering both capital and labor. The resale loss kappa = 0.3565 is taken from Bloom et al. (2018). The model is calibrated to match IQRs of 0.2 for TFPQ and demand shock log-changes in the low-uncertainty state, consistent with pre-crisis Swedish data. For the high-uncertainty state, the calibration targets the Great Recession peaks: a 30% rise in TFPQ dispersion (sigma_z(2) = 1.38 sigma_z(1)) and a 60% rise in demand dispersion (sigma_epsilon(2) = 1.90 sigma_epsilon(1)), reflecting the empirical finding that demand dispersion increases more.

A simulated transition to the high-uncertainty state causes aggregate output to fall by 3.5%. Decomposing into the Bloom (2009) “volatility effect” (realized shocks drawn from the high-dispersion distribution, firms believe low) and “uncertainty effect” (firms believe high, shocks drawn from low distribution), the paper finds both effects are negative in the non-CES model, in sharp contrast to Bloom (2009) where the volatility effect is positive (the Oi-Hartman-Abel effect). Non-CES demand amplifies the total output decline by approximately 40% relative to the CES model (peak fall 2.5% vs. 1.75%), primarily by reversing the sign of the volatility effect. Increased demand dispersion drives almost all of the first-year output decline and the majority of the uncertainty effect; TFPQ dispersion is the main driver of the negative volatility effect via markup dispersion. The inaction rate among firms jumps from 50% to 95% on impact of the uncertainty shock, then recovers within one year. TFPQ uncertainty induces little wait-and-see behavior because firms optimally adjust inputs by only 23% of the TFPQ shock size (versus 200% under CES), so uncertainty about TFPQ translates mainly into markup uncertainty. Demand uncertainty triggers strong wait-and-see behavior because demand directly maps one-for-one into desired input use.

In depth

Q1. What is the paper’s core identification strategy for separating TFPQ and demand shocks, and what are the main threats?

The authors identify TFPQ from a utilization-adjusted Cobb-Douglas value-added production function, then estimate demand using TFPQ innovations as instruments for price. TFPQ innovations are valid instruments because they shift marginal cost without directly shifting demand, tracing out the demand curve. The utilization adjustment (from the KFI managerial survey) is critical: without it, demand shocks that reduce utilization would appear as negative TFPQ shocks, biasing demand elasticity estimates upward and breaking instrument validity. The paper validates the adjustment by showing that firms reporting ‘insufficient demand’ exhibit 15% lower utilization on average, and 23% lower during the Great Recession. A second threat is quality change in firm-level prices; the authors address this with (a) robustness using the Eslava et al. (2023) CUPI quality-adjusted price index and (b) a single-product-firm subsample. Demand and passthrough results are similar across all three price index approaches. The within-firm focus (demeaning by firm and sector-year fixed effects throughout) mitigates cross-sectional comparability issues but limits misallocation-level analyses analogous to Hsieh and Klenow (2009).

Q2. How is the non-CES demand curve identified, and what exactly does the super-elasticity parameter eta measure?

The GIR demand curve is q = (1 - eta * log p)^(theta/eta). A second-order approximation around the firm’s average price yields log q = -theta * p_hat - (etatheta/2) * p_hat^2 + fixed effects + epsilon, where p_hat is the firm’s demeaned log relative price. Regressing real sales on p_hat and p_hat^2, instrumented by demeaned TFPQ and its square, recovers theta = -b1 and eta = 2b2/b1. Because p_hat is demeaned at the firm level, the estimates capture within-firm nonlinearity in the price-sales relationship, not cross-sectional heterogeneity in elasticity levels. The parameter eta is the ‘super-elasticity’: it measures how much the demand elasticity itself changes with the price. When eta > 0, a firm that raises its price faces an increasingly elastic demand curve (loses customers rapidly), and one that lowers its price faces a less elastic curve (gains customers slowly). The estimated eta = 4.27 in the main sample is roughly half the value of 10 studied (but not estimated) in Klenow and Willis (2016) and larger than the approximately 2 used in Berger and Vavra (2019).

Q3. How does the paper distinguish the ‘volatility effect’ from the ‘uncertainty effect’ in the quantitative model?

Following Bloom (2009), the paper simulates two counterfactuals. The uncertainty effect holds shocks drawn from the low-dispersion distribution (s=1) but lets firms believe that the high-uncertainty state (s=2) has arrived; this isolates the precautionary wait-and-see channel. The volatility effect draws shocks from the high-dispersion distribution (s=2) but lets firms believe they are in the low-uncertainty state; this isolates the direct effect of realizing more extreme shocks on aggregate output. In the non-CES model, both effects are negative. The uncertainty effect is dominated by demand uncertainty because demand shocks directly affect desired input use one-for-one, so uncertainty about future demand creates strong incentives to pause investment. TFPQ uncertainty induces little wait-and-see behavior because the optimal scale adjustment to a TFPQ shock is only 23% of the shock magnitude (vs. 200% under CES). The volatility effect is dominated by TFPQ dispersion because realized TFPQ shocks generate markup dispersion via incomplete passthrough, creating misallocation. Under CES, the volatility effect from TFPQ is positive (OHA effect: convex output-productivity relationship); non-CES demand makes the output-productivity relationship concave for eta large enough, flipping the sign.

Q4. What mechanism makes TFPQ passthrough so low in both the data and the model?

Two mechanisms operate. First, non-CES demand itself: when eta > 0, raising price increases the demand elasticity, and lowering price decreases it. This means the benefit to revenue from a price cut (following a productivity gain that reduces costs) is muted because the firm gains fewer customers than under CES. The static optimal passthrough is theta/(theta + eta) = 3/(7.3) = 41%. Second, non-convex input adjustment costs further reduce passthrough by making firms reluctant to change their scale in response to TFPQ shocks. In the model, the investment threshold is nearly flat across a wide range of TFPQ values (shown in Figure 6, left panel), reflecting that optimal scale barely responds to productivity. Together these mechanisms reproduce TFPQ passthrough of 20-30% in model-simulated data vs. 10-24% in the actual data, both far below the CES benchmark of 100%. The paper also verifies that low passthrough persists in the subsample of flexible-price firm-years, ruling out sticky prices as the primary driver.

Q5. Why does demand shock dispersion, rather than TFPQ dispersion, dominate the variance decompositions of sales and price growth?

The contribution of TFPQ dispersion to sales dispersion is (1-theta)^2 * beta_z^2 * Var(z). With beta_z = -0.097 and theta = 2.99, the TFPQ variance is shrunk by approximately (1-2.99)^2 * (0.097)^2 = 4 * 0.0094 ≈ 0.04, so only about 4% of TFPQ variance propagates to sales variance. This extremely small multiplier reflects two successive attenuation steps: low TFPQ passthrough to prices (beta_z^2 ≈ 0.01) and a small price-to-sales elasticity. Demand shocks, by contrast, affect sales directly through the demand curve without a price intermediary: the contribution is ((1-theta)*beta_epsilon + 1)^2 * Var(epsilon). With beta_epsilon = 0.209 and theta = 2.99, the multiplier is ((1-2.99)*0.209 + 1)^2 = (1 - 0.416)^2 = 0.34, about eight times larger than for TFPQ even though both shocks have similar variance. The cyclical increase is even more skewed toward demand because demand dispersion rises by 56% vs. 36% for TFPQ in 2009.

Q6. How does the paper relate to TFPR dispersion, and what does it say about using TFPR as a sufficient statistic?

TFPR = p * z. For arbitrary passthrough, TFPR growth = beta_epsilon * delta_epsilon + (beta_z + 1) * delta_z. Because passthrough from both shocks is incomplete, TFPR growth reflects a mixture of both underlying shocks. The paper shows via a variance decomposition of TFPR that TFPQ is the main driver of TFPR growth dispersion—accounting for roughly 60% on average—because low passthrough means prices move little, leaving TFPQ changes to dominate TFPR. However, this finding obscures the importance of demand shocks for aggregate outcomes: demand dispersion is the dominant driver of sales growth dispersion and wait-and-see behavior, yet TFPR growth dispersion mostly reflects TFPQ. A researcher relying on TFPR dispersion to infer uncertainty would correctly detect productivity uncertainty but would miss the more cyclically important demand uncertainty channel.

Q7. How do the Oi-Hartman-Abel (OHA) and wait-and-see mechanisms work differently under non-CES vs. CES demand?

Under CES demand, sales of each firm are s = z^(theta-1) * exp(epsilon), and aggregate output is E[z^(theta-1)] which is convex in z, so a mean-preserving spread in TFPQ raises aggregate output (OHA effect). Under the estimated non-CES parameters (theta=3, eta=4.3), the approximate relationship yields output proportional to z^0.82, which is concave, so a mean-preserving spread in TFPQ reduces aggregate output. The mechanism is that under non-CES demand, TFPQ shocks pass through incompletely to prices and thus create markup dispersion: high-productivity firms have high markups, low-productivity firms have low markups, and the resulting misallocation reduces total output even relative to a social planner who would set p=mc. For wait-and-see: under CES, optimal input adjustment to a TFPQ shock equals (theta-1) times the shock, which is 200% for theta=3; under non-CES with eta=4.3, it is only (theta^2/(theta+eta) - 1) * shock = 0.233 * shock = 23%. This means firms adjust scale very little in response to TFPQ uncertainty, dampening the wait-and-see channel for TFPQ. TFPQ uncertainty then causes uncertainty about markups, which is costly but does not trigger large investment adjustments.

Q8. What role do adjustment costs play, and how robust are the results to the structure of those costs?

Non-convex adjustment costs on a composite firm-scale factor x = k^alpha * l^(1-alpha) create an inaction region: firms neither invest nor disinvest until shocks are sufficiently large. In the low-uncertainty state, the model generates a yearly inaction rate of 25.4% (consistent with pre-crisis Swedish data showing roughly 15%). When uncertainty rises, the inaction region widens, the inaction rate jumps to 95% on impact, and firms let their scale shrink via depreciation. The baseline calibration uses the resale loss kappa = 0.3565 from Bloom et al. (2018). The paper also calibrates kappa to the Swedish inaction rate (kappa = 0.1165), which delivers qualitatively identical dynamics but a smaller amplitude recession (1.7pp vs. 3.5pp output fall). The paper also solves a version with adjustment costs only on capital (as in Bachmann and Bayer, 2013): the wait-and-see effect is dampened but the qualitative results hold—demand uncertainty still dominates TFPQ uncertainty in driving wait-and-see, and non-CES demand still reverses the sign of the OHA effect.

Q9. What is the role of the price wedge and time-varying passthrough?

The passthrough equation residual (price wedge, tau) captures price changes unexplained by TFPQ and demand shocks. It could reflect un-modeled shocks (e.g., financial constraints, as Gilchrist et al. (2017) document for Sweden), markup decisions, or measurement error. The price wedge makes a meaningful contribution to both average sales/price dispersion and to the rise in 2009. Time-varying passthrough is also documented: TFPQ passthrough is countercyclical (more negative in recessions), while demand passthrough is procyclical (falls in recessions when firms receive more extreme idiosyncratic demand shocks). Redoing the variance decomposition with year-by-year passthrough estimates makes demand’s contribution to sales dispersion in 2009 even larger, because firms adjust prices less to demand shocks during the recession, leaving more of the demand shock impact in sales.

Q10. What heterogeneity is documented across industries and firm types?

Sectoral demand elasticity estimates from the pooled 22-sector sample yield an average theta of 3.89 and median of 2.73 for the linear CES model; for the non-linear model, average theta is 3.26 and average eta is 7.42, with substantial positive skew. The median non-linear eta of 5.37 is larger than the pooled estimate of 4.27, indicating the pooled estimate is pulled down by some sectors with smaller deviations from CES. Key empirical results (greater cyclicality of demand dispersion, incomplete TFPQ passthrough) hold within each major sector and across balanced panels, the single-product subsample, and the CUPI price-index sample. Time-varying passthrough is also found to be systematically higher by about 25% in the post-2008 period compared to the pre-2008 period, suggesting a structural shift in how demand shocks transmit to prices, though the paper does not investigate the source of this change.

Q11. What robustness checks are run on the demand and passthrough estimates?

Demand estimation robustness: (1) piece-wise linear specification (elasticity of 2 below average price, 4 above average price, significant at 0.1% level); (2) balanced panel; (3) excluding the Great Recession; (4) using Statistics Sweden firm identifiers instead of authors’ own; (5) CUPI price index; (6) single-product firms; (7) sector-by-sector estimation; (8) including firm and sector-year fixed effects directly in the nonlinear regression (rather than pre-demeaning). All exercises confirm statistically significant eta and broadly similar theta. Passthrough robustness: (1) OLS vs. IV (lagged shocks) vs. first-differences; (2) balanced panel; (3) single-product subsample; (4) two-period lagged instruments (beta_z = -0.294, beta_epsilon = 0.249); (5) flexible-price subsample; (6) longer-horizon (two- and three-year) first differences for TFPQ. Corroboration: TFPQ innovations are positively associated with reported process innovations in Eurostat CIS data (7% greater TFPQ growth for process innovators); negative demand shocks are correlated with managers reporting ‘insufficient demand’ in KFI data (8% lower demand growth).

Q12. How does this paper differ from and relate to Bloom (2009) and Bloom et al. (2018)?

Bloom (2009) and Bloom et al. (2018) model a single composite firm-level shock (implicitly TFPR) in a CES-demand economy, finding that uncertainty shocks reduce output through wait-and-see behavior but generate a positive volatility effect (OHA) that partly offsets the uncertainty effect. The present paper adds two departures: (1) it separates TFPQ and demand shocks and shows they have distinct empirical and aggregate implications; (2) it replaces CES demand with an estimated non-CES demand curve. Departure (2) reverses the OHA effect, amplifying the total output decline by around 40% relative to the CES model. Departure (1) shows that the uncertainty channel operates primarily through demand, while TFPQ operates primarily through the volatility channel. The quantitative model uses the same non-convex adjustment cost structure and calibration approach as Bloom et al. (2018) to ensure comparability. The paper also relates to Bachmann and Bayer (2013) and Mongey and Williams (2017), who find smaller aggregate effects with adjustment costs only on capital; the present paper notes that adjustment costs on both capital and labor are needed for large wait-and-see effects, but qualitative conclusions are unchanged with capital-only costs.

Q13. What are the policy and theoretical implications of the findings?

First, policies aimed at reducing firm-level demand uncertainty (e.g., demand stabilization, aggregate demand management) have larger aggregate output effects than policies addressing productivity uncertainty, because demand uncertainty triggers wait-and-see investment behavior while TFPQ uncertainty is largely absorbed in markups without changing investment much. Second, TFPQ dispersion is still harmful but through misallocation: policies that reduce markup dispersion induced by productivity differentials can raise aggregate output without requiring reduced dispersion per se. Third, the finding that TFPR dispersion is a poor proxy for demand shock dispersion has implications for how researchers use TFPR as a measure of misallocation or uncertainty: it conflates two distinct forces with different aggregate implications. Fourth, the estimated super-elasticity provides a data-disciplined input for calibrating models with real rigidities, directly relevant for the Ball-Romer nominal non-neutrality question—higher real rigidities amplify the output effects of monetary policy shocks. The authors flag this as a natural extension. The scope conditions are: Swedish manufacturing, annual data 1998-2013, partial equilibrium model (aggregate price level exogenous), firms with matching price and utilization data (large-firm bias).

Q14. What additional findings are documented regarding the cyclicality of other firm-level variables?

Beyond TFPQ and demand dispersion, the paper documents that dispersion of sales growth, price growth, labor, intermediate goods, and capacity utilization are all countercyclical. The IQR of sales growth was 58% above the non-recession average in 2009 and 9% above in 2001; the IQR of price growth was 83% above in 2009 and 5% above in 2001. The one notable exception is investment, which displays procyclical dispersion (less dispersed during the Great Recession). The paper also documents that roughly 30% of firms report insufficient demand at all their plants in the survey data; average capacity utilization is 88% with median 91% and standard deviation of 14.1%; and about 25% of firm-year observations involve utilization at or above 100%.

Key Concepts

Physical total factor productivity (TFPQ): Firm-level quantity productivity: output per unit of inputs, measured from a utilization-adjusted Cobb-Douglas value-added production function. Distinct from revenue TFP (TFPR = p*z) because it abstracts from demand conditions and price-setting. In this paper, TFPQ is estimated within firm over time using the cost-share approach and a capacity-utilization correction from managerial survey data.

Demand shock (epsilon): The idiosyncratic component of a firm’s demand curve that captures its ability to sell more (or fewer) units at a given price in a given year, reflecting changes in customer base size or customers’ willingness to pay. Estimated as the residual from the GIR demand curve after controlling for firm fixed effects, sector-time fixed effects, and the firm’s own price.

Non-CES demand curve / super-elasticity (eta): A demand specification adapted from Gopinath, Itskhoki, and Rigobon (2010) in which the demand elasticity is not constant but rises with the firm’s price. The parameter eta (estimated at 4.27 in the main sample) governs how fast the elasticity rises with the price: when eta > 0, firms gain few customers by cutting price (elasticity falls as price falls) and lose many customers by raising price (elasticity rises as price rises). This is the source of ‘real rigidity’ that makes incomplete TFPQ passthrough optimal.

Incomplete TFPQ passthrough: The empirical finding that firms reduce their prices by far less than one-for-one in response to a productivity gain (estimated beta_z = -0.097 to -0.124, far from the CES benchmark of -1). The paper attributes this primarily to non-CES demand real rigidity (which implies an optimal static passthrough of only 41% given the estimated parameters) and secondarily to adjustment costs.

Oi-Hartman-Abel (OHA) effect: The positive ‘volatility effect’ in standard CES-demand uncertainty models: because output is a convex function of TFPQ under CES, a mean-preserving spread in productivity raises aggregate output (lucky firms expand more than unlucky firms contract). The paper overturns this result by showing that with non-CES demand (eta sufficiently large), the output-productivity relationship becomes concave, so TFPQ dispersion reduces aggregate output via markup misallocation.

Wait-and-see channel: The mechanism by which uncertainty about future shocks causes firms with non-convex input adjustment costs to pause investment: firms prefer to remain inactive and let inputs depreciate rather than invest or disinvest, at the risk of having to pay an irreversibility cost if the shock turns out to have been in the opposite direction. In this paper, this channel is driven primarily by demand uncertainty because demand shocks determine how many units a firm can sell and hence its desired input level; TFPQ uncertainty does not trigger strong wait-and-see behavior because the optimal scale response to TFPQ shocks is small under non-CES demand.

Markup dispersion / misallocation: Dispersion across firms in the ratio of price to marginal cost, arising in this paper from incomplete TFPQ passthrough: firms with high productivity set high markups rather than passing through productivity gains as price cuts. The resulting wedge between prices and marginal costs means that resources are misallocated (too little output at high-productivity firms relative to the social optimum), reducing aggregate output. This is the channel through which TFPQ dispersion harms the aggregate economy in the model.

Price wedge (tau): The residual from the passthrough regression: the component of firm price changes unexplained by the estimated TFPQ and demand shocks. Interpreted as capturing un-modeled shocks (financial constraints, markup adjustments) and potentially measurement error. The price wedge makes a meaningful contribution to both average sales/price dispersion and to the Great Recession increase in dispersion.

Distributional Consequences of Becoming Climate-Neutral

Thu, 01 Jan 2026 00:00:00 +0000

Layer 1: Overview

This paper investigates how the EU’s Fit-for-55 climate package will affect aggregate output and distribute its costs across the income distribution. The question matters because energy is a necessity good — poorer households devote a larger share of spending to energy — so policies that raise energy prices are regressive in their first-order incidence. Despite a large literature on the aggregate macroeconomics of the green transition, distributional consequences have received limited attention.

The authors build a parsimonious dynamic general-equilibrium model with two infinitely-lived households (rich and poor), a standard output-producing firm that treats energy as a complementary CES input alongside the capital-labor aggregate, and an energy-producing sector that combines a carbon-intensive brown technology with a carbon-free green technology as imperfect substitutes (CES with elasticity of substitution calibrated to 3 following Papageorgiou et al. 2017). The novel feature is Price Independent Generalized Linearity (PIGL) non-homothetic preferences following Boppart (2014), which generate nonlinear Engel curves: the poor agent’s energy expenditure share exceeds the rich agent’s, matching Eurostat Household Finance and Consumption Survey data (2015) showing the bottom income quintile has more than twice the energy expenditure share of the top quintile. The model targets an 18% energy expenditure share for the poor agent and 7.5% for the rich agent. The rich agent holds all financial wealth; the poor agent lives on labor income alone. The government taxes the brown technology and recycles revenue as a green-technology subsidy under a balanced budget, representing the ETS. Agents have perfect foresight. The paper simulates perfect-foresight transitions from an initial steady state to a new climate-neutral steady state, with the transition path endogenously determining the new steady state — a nonstandard feature arising from non-homothetic preferences.

In the baseline scenario (linear tax ramp over 25 years), achieving an 85% reduction in brown energy use requires a 168% tax on the brown technology. This drives the price of energy services up by 49%, GDP down by 9.3% in the new steady state, energy as a production input down by 10.9%, and capital input down by 9.3%, while the real wage falls by roughly 7% and the real interest rate is nearly unchanged (dropping by only 0.02 percentage points transiently). The welfare cost measured in expenditure-equivalent terms is a 10.8% loss for the rich agent and a 16.2% loss for the poor agent — the poor agent suffers approximately 50% more. To finance consumption during the transition the poor agent accumulates debt equal to 38.8% of annual income.

Results are highly sensitive to the brown-green substitution elasticity: raising it from 3 to 5 roughly halves the required tax (to 78.6%) and halves GDP losses (to 4.7%); lowering it to 2 roughly doubles the tax (to 354%) and GDP losses (to 17.7%). Non-homothetic preferences matter quantitatively: switching to homothetic preferences (while preserving different expenditure shares) shrinks aggregate GDP losses by 26% and eliminates nearly all distributional disparity, confirming that the non-homotheticity — not merely different expenditure levels — is the operative distributional mechanism. If the Fit-for-55 energy efficiency improvement target of 1.49% per year is simultaneously achieved, the required tax falls to 136%, the price of energy actually declines by 5.5%, and GDP rises by 1.1% in the new steady state, with the poor agent benefiting slightly more and accumulating assets (4% of annual income) rather than debt.

In depth

Q1. What is the core modeling and calibration strategy, and what are the main threats?

The paper is a quantitative theory exercise with no econometric identification. Calibration targets HFCS Eurostat data (2015) for energy expenditure shares by income quintile, the Papageorgiou et al. (2017) estimate of the brown-green substitution elasticity (ρE = 3), and stylized facts on wealth and income distribution from Krueger, Mitman, and Perri (2016). The main threat is parameter uncertainty around ρE, which the paper acknowledges is poorly identified empirically and which drives the results almost one-for-one. The sensitivity analysis explores ρE ∈ {2, 3, 5}, a range the paper concedes is narrow relative to the literature’s full dispersion.

Q2. What are the main mechanisms generating the distributional gap between rich and poor?

Three reinforcing channels: (1) Non-homothetic preferences give the poor agent a higher energy expenditure share (18% vs. 7.5%), so the 49% energy price increase hits the poor’s budget much harder as a share of income. (2) The poor agent cannot buffer the shock through wealth drawdowns (holding zero net assets initially), forcing it to accumulate debt of 38.8% of annual income. (3) Non-homothetic preferences alter the labor supply response: as expenditures fall, the poor agent’s labor supply declines less than the rich agent’s (the rich agent decreases labor supply by 0.2 percentage points more), reflecting that leisure is a luxury good in this preference system. In the new steady state the rich agent’s consumption of the consumption good drops sharply while the rich agent front-loads consumption at the announcement, immediately jumping 2% higher.

Q3. How are non-homothetic preferences distinguished empirically and in the model from simply having different expenditure shares?

Section 4.4 runs a counterfactual with homothetic preferences (ε = 0) but preserves identical initial expenditure shares for each agent (7.5% and 18%) by making ν agent-specific. Under homotheticity the expenditure shares do not vary with income as the transition unfolds. The comparison shows that GDP losses shrink by 26% (from 9.3% to 6.9%) and the distributional gap nearly vanishes — both agents experience almost identical welfare losses. This decomposition isolates the effect of non-homotheticity itself: it is the income-dependent adjustment of expenditure shares during the transition, not merely the different initial levels, that drives both larger aggregate losses and the distributional disparity.

Q4. What heterogeneity is documented and along what dimensions?

Heterogeneity is modeled along two dimensions: initial wealth (rich holds all assets; poor holds zero) and energy expenditure shares (18% for poor, 7.5% for rich) arising from non-homothetic preferences. The model produces no within-group heterogeneity by construction (two-agent framework). The paper documents the time paths of consumption, expenditures, expenditure equivalents, energy expenditure shares, and wealth shares for each agent separately along the transition, showing that both agents cut energy consumption by roughly 15% while the poor agent cuts consumption-good spending by substantially more than the rich agent.

Q5. What alternative transition timing paths are explored and what do they imply?

Three alternatives supplement the linear baseline: tax introduction after 1 year, after 12.5 years, and after 25 years of the announcement. Key findings: (a) the required final tax rate is nearly insensitive to timing — the 25-year-delayed scenario requires 172% vs. 168% in the baseline; (b) conditional on excluding climate damages, it is always welfare-superior to delay implementation, with the poor agent gaining close to 3.5 percentage points in expenditure equivalent welfare by delaying to 25 years vs. implementing after 1 year; (c) gradual vs. immediate introduction yields similar welfare outcomes in the benchmark without adjustment costs, but with investment adjustment costs (χ = 10) a sudden implementation causes a brief sharp drop in the real interest rate without large quantity effects.

Q6. How does the GDP measure differ from aggregate output in the model?

GDP is defined to exclude the share of final output used as input into energy production. Aggregate output Y falls 7.3% in the new steady state, but GDP falls 9.3%. The gap (approximately 2 percentage points) reflects the increased resource cost of energy production under the green transition: because the brown and green technologies are imperfect substitutes, satisfying the emission reduction target requires devoting a larger share of final output to producing energy services, a real resource drain captured in the GDP definition but excluded from raw output Y.

Q7. What does the energy efficiency scenario imply, and what is its key caveat?

If energy efficiency improves at 1.49% per year over 25 years (a 45% cumulative gain in energy-producing-firm total factor productivity), the required tax falls to 136.3%, the price of energy declines by 5.5% (rather than rising 49%), and GDP rises 1.1% rather than falling 9.3%. The poor agent benefits more from the efficiency gains and accumulates assets worth 4% of annual income rather than debt. The critical caveat is that the efficiency improvement is modeled as purely exogenous and costless. The paper explicitly acknowledges that achieving these efficiency gains may require investment that is not modeled, so the results should be interpreted as an upper bound on the offsetting potential.

Ascari et al. (2025) is the closest related paper (developed independently). Differences: (i) Ascari et al. use a Bewley-type incomplete-markets model generating heterogeneity through random discount factors, whereas this paper uses a two-agent complete-markets construct with exogenously fixed initial wealth; (ii) this paper allows endogenous labor supply, which increases short-run flexibility; (iii) this paper does not consider transfer schemes to redistribute away from distributional consequences. Results are described as broadly consistent. Fried, Novan, and Peterman (2018) and Boehl and Budianto (2024) use OLG models and find inequality implications but focus on inter-generational rather than intra-generational distributional effects.

Q9. What are the policy implications and their scope conditions?

The core implications are: (1) the Fit-for-55 emission tax alone is regressive — the poor bear a welfare loss 50% larger than the rich and end up with 38.8% of annual income in additional debt; (2) delaying tax implementation (with early announcement) is welfare-improving in the absence of climate damage modeling — the welfare difference is nearly 3.5 percentage points for the poor between fastest and latest implementation; (3) if energy efficiency targets are met exogenously, the transition is nearly costless and distributional concerns vanish; (4) the regressive result is conditional on the government recycling tax revenues to green-technology subsidies rather than to household transfers. All these implications are conditional on European economies where climate damages are plausibly small and the model abstracts from open-economy dynamics, endogenous technology, and within-income-group heterogeneity.

Q10. What robustness checks are reported?

Five robustness exercises are reported: (1) investment adjustment costs raised from χ = 0 to χ = 10 — minimal effect on welfare or quantities in the smooth baseline, though sudden tax introduction produces a brief interest-rate plunge; (2) homothetic preferences counterfactual while maintaining initial expenditure shares (Section 4.4); (3) elasticity of substitution between brown and green technology at ρE = 2 and ρE = 5 (Section 4.3, Table 2); (4) alternative transition timing (1 year, 12.5 years, 25 years post-announcement; Section 4.2); (5) simultaneous energy efficiency improvement of 1.49% per year (Section 4.5). A New Keynesian extension with Rotemberg price adjustment costs and a Taylor rule (Appendix B) is also provided for robustness on inflation dynamics.

Q11. What are the main caveats or limitations acknowledged by the authors?

Climate damages are excluded, so the paper understates the case for early action and cannot provide a full welfare comparison between acting early and acting late. Energy efficiency improvement is modeled as exogenous and costless, overstating the net gain from that channel. The two-agent framework abstracts from within-group heterogeneity and overlapping generations. Open-economy dynamics are not modeled; the brown-technology structure serves as a reduced-form for energy imports but does not capture international price feedback. The elasticity of substitution between brown and green technology is uncertain, and results are nearly proportional to this parameter. The model has no endogenous innovation or directed technical change, limiting applicability to long-run transition analysis.

Key Concepts

Non-homothetic PIGL preferences: Preferences of the Price Independent Generalized Linearity class (Boppart 2014) where energy expenditure shares depend on income level, making energy a necessity good (share declining in income) and consumption goods a luxury. Parameter ε ∈ (0,1) controls non-homotheticity; ε = 0 recovers homothetic preferences. The paper calibrates γ = 0.639 from CEX data, implying an elasticity of substitution between consumption and energy goods of approximately 0.4.

Brown vs. green technology: Two imperfectly substitutable technologies for producing energy services within the model’s energy sector. The brown technology converts units of final output into energy services using a carbon-intensive (emission-producing) process; the green technology is emission-free. They enter a CES aggregator for energy production with elasticity ρE calibrated to 3. Imperfect substitutability means the green transition raises the cost of energy services even with subsidies to green technology.

Expenditure equivalent loss: The welfare metric used in the paper: the percentage change in expenditures in the initial steady state (without any tax) that would make an agent indifferent between remaining in the initial steady state and living through the actual transition path. Defined implicitly by equating flow utility at scaled initial expenditures to flow utility along the transition. Baseline results: -10.8% for the rich agent and -16.2% for the poor agent.

Tax on the brown technology: The policy instrument modeled as capturing the essence of EU ETS and national carbon schemes. It raises the unit cost of the emission-intensive energy input; revenue is recycled as a subsidy to the green technology within a balanced government budget rather than distributed to households. A 168% tax achieves the 85% emission reduction target in the baseline, implying fossil fuel prices nearly triple.

Endogenous final steady state: The model’s new steady state after the green transition is not predetermined; it depends on the wealth distribution that emerges endogenously during the transition. Because markets are complete and preferences are non-homothetic, different transition paths generate different terminal wealth distributions and therefore different aggregate outcomes in the new steady state. This prevents backward solution and requires a fully nonlinear transition path solver.

Energy expenditure share by income quintile: The empirical regularity, documented from Eurostat HFCS data (2015), that the bottom income quintile devotes more than twice the fraction of disposable income to energy (electricity, gas, fuels for personal transport) as the top quintile. This fact calibrates the non-homotheticity of preferences (targeting 18% for the poor agent and 7.5% for the rich agent) and motivates the paper’s focus on distributional consequences.

Elasticity of substitution between brown and green technology (ρE): The key production-side parameter governing how easily the energy sector can switch from fossil-fuel to clean inputs. Calibrated to ρE = 3 from Papageorgiou et al. (2017). Results are nearly proportional to this parameter: ρE = 5 halves and ρE = 2 roughly doubles the required tax, GDP losses, and welfare costs. The paper identifies this as the dominant source of quantitative uncertainty.

Forecasting with Feedback

Thu, 01 Jan 2026 00:00:00 +0000

Layer 1: Overview

This paper develops a strategic model of point forecast production in environments where the forecast itself influences the outcome being predicted — what the authors call “forecasting with feedback.” The canonical example is Federal Reserve staff (Greenbook) inflation forecasts: these forecasts guide FOMC interest rate decisions, and those rate decisions in turn affect realized inflation. The central theoretical claim, proved formally, is that even a forecaster with purely quadratic (mean-squared-error) loss will optimally produce biased forecasts in such environments, provided there is some uncertainty about how strongly the decision maker (DM) will react to the forecast. This finding offers a third interpretation of observed forecast biases — beyond the two dominant explanations in the prior literature, namely forecaster irrationality and asymmetric loss functions.

The model has three components. First, an outcome equation: y_{t+1} = theta_t + a_t + epsilon_{t+1}, where theta_t is a private signal (the state of the economy) observed only by the forecaster, a_t is the DM’s action, and epsilon_{t+1} is unforecastable noise. Second, a DM reaction function: a_t = x_t * [y_T - E(theta_t | f_t)], analogous to a Taylor rule, where y_T is a known target, and x_t is a strength-of-reaction multiplier drawn from a distribution with mean mu and variance tau^2; x_t is the DM’s private information. Third, the forecaster minimizes expected squared error, anticipating the DM’s endogenous response. The model is linear and closed-form solutions are derived.

The key mechanism is a bias-variance tradeoff. Because the DM’s action responds to the forecast, the variance of the realized outcome itself becomes a function of the forecast. When the DM’s reaction strength x_t is uncertain (tau^2 > 0), this variance-of-outcome term is not trivially minimized by an unbiased forecast. The forecaster reduces outcome volatility by attenuating the sensitivity of the forecast to the state — shrinking the forecast slope toward zero relative to what an unbiased forecast would require — at the cost of introducing systematic bias. When tau^2 = 0 (no uncertainty about the DM’s reaction), the forecaster can perfectly anticipate and correct for the DM’s response, and the optimal forecast is unbiased. Feedback alone, without uncertainty, does not produce bias.

The paper derives equilibrium forecasts in a Perfect Bayesian Equilibrium where the DM holds correct (rational) beliefs about the forecasting rule. Key analytical results include: (i) the equilibrium exists when tau^2 <= 1/4; (ii) the equilibrium conditional bias equals [(1 - sqrt(1 - 4*tau^2))/2] * (theta_t - y_T), which changes sign depending on whether the state is above or below the target — the forecaster gravitates toward the target; (iii) the Mincer-Zarnowitz (MZ) regression slope (the slope from regressing realized outcomes on forecasts) can be large and positive, close to zero, or even negative, depending on mu and tau^2; (iv) when mu = 1 (the DM on average fully closes the gap to the target), the equilibrium MZ slope is exactly zero for any tau^2 value.

The paper motivates these results with two documented empirical patterns in Greenbook 4-quarter-ahead inflation forecasts from 1980q1 to 2019q4. First, using 40-quarter rolling windows, bias in Greenbook forecasts is persistent but sign-changing over time — a pattern consistent with the model’s prediction that the sign of bias tracks whether the state theta_t is above or below the inflation target y_T. Second, the MZ slope (from 40-quarter rolling-window regressions) hovers near unity in the mid-1980s through early 1990s, returns to unity by the late 1990s, then drops sharply to significantly negative territory by the mid-2000s, before becoming indistinguishable from zero in the final portion of the sample — a pattern consistent with the model’s prediction that the MZ slope shifts radically with changes in mu and tau^2. Both facts are computed using the last revision of the GDP deflator.

The policy and methodological implications are significant. Standard forecast rationality tests (Mincer-Zarnowitz regressions, bias tests) are designed to detect irrationality or asymmetric loss, but in feedback environments these same test statistics can indicate “failure” even when the forecaster is fully rational under quadratic loss. Studies conducting rationality tests or estimating loss functions must either explicitly assume away feedback (and justify that assumption) or account for the feedback mechanism.

In depth

Q1. What is the identification strategy, and what are the main threats to identification?

The paper is primarily theoretical: it derives closed-form equilibrium forecasting rules and forecast statistics from first principles within a stylized game-theoretic model. There is no econometric identification exercise. The Greenbook evidence is descriptive and motivational — rolling-window bias estimates and MZ slope estimates are presented as stylized facts consistent with the theory, not as causal identification. The main caveat the authors themselves make is that the model is not claimed to be an exclusive or exhaustive explanation of the documented GB forecast patterns. Inflation forecasting is complex, and many other factors (learning, structural breaks, regime changes in monetary policy, data revisions) could contribute to the observed patterns. The authors explicitly disclaim any claim to exclusivity.

Q2. What is the core mathematical mechanism, and how does uncertainty play a necessary role?

The forecaster’s MSE decomposes into a conditional variance term and a squared-bias term: MSE = Var[a*(f_t) | theta_t] + bias^2(f_t | theta_t) + sigma^2. The critical insight is that when x_t (the reaction-strength multiplier) is uncertain, the variance of the DM’s action — and hence of the outcome — depends on the level of the forecast itself. Specifically, Var[a*(f_t) | theta_t] = tau^2 * (y_T - f_t/c + b/c)^2. So choosing a larger or smaller forecast changes not just the bias term but also the variance term. The optimal resolution of this tradeoff requires an attenuated (biased) forecast slope. When tau^2 = 0 (no uncertainty), the variance term vanishes entirely and the forecaster can correct for feedback in full by solving a fixed-point problem, producing an unbiased forecast. The paper explicitly proves (taking limits as tau^2 to 0 in the bias and MZ slope formulas) that both return to zero and one respectively, confirming that uncertainty is a necessary condition for bias.

Q3. What is the equilibrium concept and what are its properties?

The equilibrium is a linear Perfect Bayesian Equilibrium (PBE). The DM conjectures that the forecast is a linear function f_t = b + c*theta_t, uses that conjecture to form expectations E(theta_t | f_t) = (f_t - b)/c, and chooses her action optimally. Equilibrium requires that the DM’s conjectured intercept and slope (b, c) coincide with those actually used by the forecaster. The paper shows (Corollary 1) that such a linear PBE exists when tau^2 <= 1/4, and that the equilibrium is fully revealing — the DM can learn the true state theta_t from the forecast because the forecast is a one-to-one function of the state. Two linear equilibria exist: the paper focuses on the Pareto-preferred one (lower forecaster loss, lower absolute bias), which is also the one whose limit as tau^2 approaches 0 corresponds to the natural optimal forecast.

Q4. What sign and magnitude patterns does the equilibrium bias exhibit?

From Corollary 2(a), the conditional equilibrium bias is: E(y_{t+1} - f_t^dagger | theta_t) = [(1 - sqrt(1 - 4tau^2)) / 2] * (theta_t - y_T). The multiplier (1 - sqrt(1 - 4tau^2))/2 is always positive (for tau^2 in (0, 1/4]), so the sign of the bias is determined entirely by the sign of (theta_t - y_T). When theta_t > y_T (state above target), bias is positive — the forecaster underpredicts, shrinking the forecast toward the target. When theta_t < y_T, bias is negative — the forecaster overpredicts, again gravitating toward the target. This sign-change mechanism, driven by changing economic conditions relative to a fixed target, is cited as consistent with the persistent but sign-changing bias observed in Greenbook inflation forecasts from 1980 to 2019.

Q5. What does the model predict about the Mincer-Zarnowitz slope, and how variable can it be?

From Corollary 2(b), the MZ slope in equilibrium is a highly nonlinear function of mu and tau^2. Figure 3 in the paper (discussed in the text) shows that the slope can be large and positive, positive but close to zero, negative, or even very steeply negative, for different combinations of mu and tau^2. A key special case: when mu = 1 (DM fully closes the gap to target on average), E(y_{t+1} | f_t^dagger) = y_T for all values of the forecast, giving an MZ slope of exactly zero and intercept equal to y_T. The authors note that when mu is close to 1 and tau^2 is small, even small deviations of mu from unity can produce large positive or negative MZ slopes. The model can thus account for the dramatic shift in the GB MZ slope documented in the paper — from around unity in the 1980s-1990s, to significantly negative territory in the mid-2000s, to approximately zero thereafter.

Q6. What is the relationship between the DM’s reaction function and the Taylor rule, and how is it microfounded?

The DM’s reaction function is a_t* = x_t * [y_T - E(theta_t | f_t)], directly analogous in spirit to a Taylor rule (Taylor, 1993). Online Appendix A provides a formal microfoundation: if the DM minimizes a quadratic loss in (y_{t+1} - y_T)^2 plus a quadratic adjustment cost w_t * a_t^2 — where w_t is a private, randomly drawn adjustment cost parameter — then the optimal action is precisely a_t* = x_t * [y_T - E(theta_t | f_t)] with x_t = 1/(1 + w_t). This microfoundation connects the model to the literature on central bank optimal control and provides a rational justification for the reaction function structure used throughout the paper.

Q7. How does this paper relate to and differ from the Crawford-Sobel (1982) cheap talk model?

The paper borrows the sender-receiver communication game structure from Crawford and Sobel (1982), with the forecaster as sender and the DM as receiver. However, it departs in two important ways. First, in Crawford-Sobel, the sender’s payoff depends only on the state and the action, not directly on the message (the forecast). In this paper, the forecast enters the forecaster’s loss function directly through the outcome equation (y = theta + a + epsilon, and the forecast determines a which determines y which enters the loss), making it a model of ‘costly talk’ in the sense of Kartik, Ottaviani, and Squintani (2007). Second, in standard communication games the realized outcome is exogenous — the DM’s action affects only her own payoff but not the variable being forecast. Here, the DM’s action causally determines the realized outcome that the forecaster was trying to predict. This feedback causality is absent in the standard setup and is the source of the paper’s novel results.

Q8. How does this paper relate to Bernanke and Woodford (1997)?

Bernanke and Woodford (1997) also study professional inflation forecasts and monetary policy in a rational expectations equilibrium framework, and raise the question of whether an informative equilibrium exists — concluding it may not. This paper differs in three respects: it assumes the forecaster has private information (state theta_t) that the DM cannot directly observe; it works in an environment with uncertainty about the DM’s reaction (x_t is random); and rather than focusing on equilibrium existence, it derives the statistical properties of equilibrium forecasts — the bias formula, MZ regression coefficients — which Bernanke and Woodford do not. The authors describe their work as providing ’the first formal treatment of the statistical properties of forecasts’ in feedback environments.

Q9. What heterogeneity and parameter sensitivity is documented?

The paper documents sensitivity of forecast properties to mu (mean policy reaction strength) and tau^2 (variance of policy reaction strength). The DM’s average aggressiveness mu affects both the sign and magnitude of the MZ slope: for cautious DMs (mu near 0.1), the equilibrium MZ slope is relatively close to unity; for aggressive DMs (mu near 1), the slope can flatten toward zero; for moderate but increasing mu (with tau^2 above a threshold of approximately 0.05), the slope flattens monotonically. A higher tau^2 at given mu generally attenuates the slope toward zero, but the relationship is nonlinear. When mu is precisely one, the MZ slope is exactly zero regardless of tau^2. The equilibrium bias magnitude scales with [(1 - sqrt(1 - 4*tau^2))/2], which increases in tau^2. The sign of bias is determined by the direction of (theta_t - y_T). The paper does not present cross-sectional or time-series panel heterogeneity — the parametric sensitivity analysis in Figure 3 constitutes the heterogeneity exercise.

Q10. What robustness checks are run for the Greenbook empirical patterns?

The authors state (in a footnote) that the documented patterns — persistent but sign-changing bias in 4-quarter-ahead GB inflation forecasts from 1980q1 to 2019q4 — are robust to using the second release of the GDP deflator rather than the last release. The main results use the last release. The choice of 40-quarter (10-year) rolling window is applied uniformly for both the bias plot and the MZ slope plot. No additional robustness checks (alternative window lengths, alternative forecast horizons, formal structural break tests) are explicitly documented in the paper, though the authors cite Rossi and Sekhposyan (2016), who use formal rationality tests and confirm that GB forecast rationality breaks down around 2005 — consistent with the pattern the authors document via the rolling MZ slope.

Q11. What does the model say about the forecaster’s inability to commit, and could commitment help?

In the baseline model, the forecaster cannot commit to a fixed forecasting rule ex ante because the state theta_t is not directly observable by the DM. The authors note in Section 3.3 that modeling forecasters with commitment is a straightforward extension, and that commitment can actually increase forecaster welfare in equilibrium. However, this extension is not formally developed in the paper. The intuition is that if the forecaster could credibly commit to a more informative forecast rule, the DM could react more precisely, reducing the variance of outcomes; but without commitment, the strategic equilibrium involves an attenuated (biased) forecast.

Q12. What are the implications for forecast rationality tests and loss function estimation?

The paper’s central methodological warning is that standard forecast rationality tests (MZ regression tests for zero intercept and unit slope; bias tests) and loss function estimation exercises are contaminated in environments with policy feedback. If feedback is present and x_t is uncertain, a fully rational forecaster with quadratic loss will produce forecasts that fail standard rationality tests — showing nonzero bias, non-unit MZ slopes (potentially even negative), and forecast errors correlated with the forecaster’s own information. Researchers conducting such tests must either: (a) explicitly assume no feedback applies (and justify this assumption in their specific application), or (b) carefully model the feedback mechanism and account for it. Studies that interpret GB forecast irrationality (e.g., Rossi and Sekhposyan 2016) or asymmetric loss (e.g., Capistran 2008) as the explanation for observed GB forecast properties may be confounded by the feedback mechanism identified in this paper.

Q13. What are the conditions under which a linear equilibrium does or does not exist?

From Corollary 1 and Remark 3 following it: a linear PBE exists if and only if tau^2 <= 1/4. When tau^2 > 1/4, the forecaster always wants to attenuate the slope more than the DM expects, so no fixed-point equilibrium in linear strategies exists. The paper also notes a sufficient condition for equilibrium existence: if the support of x_t is contained in [0, 1] (the DM never overreacts and never underreacts by more than half), then tau^2 <= 1/4 is automatically satisfied and an equilibrium always exists. Two linear equilibria exist when tau^2 <= 1/4, but the paper focuses on the Pareto-preferred one, which has lower forecaster loss, lower absolute bias, and a natural limiting behavior as tau^2 approaches 0.

Q14. What scope conditions limit the applicability of the results?

Several scope conditions are made explicit: (1) The outcome equation is linear; nonlinear outcome determination would change quantitative results but the feedback mechanism would persist qualitatively. (2) The model is a single-period (point-in-time) game, not a multi-period learning model — it does not analyze how beliefs about mu and tau^2 evolve over time. (3) The independence assumption between x_t and theta_t is a benchmark; if policy aggressiveness varies with economic conditions, additional effects arise. (4) The focus on linear equilibria rules out non-linear forecasting strategies. (5) The results apply to unconditional forecasts (where the forecaster anticipates the DM’s response); conditional forecasts (conditioned on a pre-specified action) behave differently. (6) The empirical Greenbook evidence is illustrative, not a formal test of the model — the authors explicitly state they do not claim their model provides an exclusive explanation of GB forecast properties.

Key Concepts

Forecasting with feedback: A forecasting environment in which the DM’s action — taken in response to the forecast — causally affects the realized value of the variable being forecast, so that the forecast influences its own target outcome. Distinguished from no-feedback environments (e.g., weather forecasting) where decisions made on the basis of the forecast do not affect the outcome.

Unconditional forecast: A forecast that anticipates and factors in the expected response of the decision maker to the forecast itself, rather than being conditioned on a pre-specified (potentially counterfactual) action. The paper’s model produces unconditional forecasts; conditional forecasts (conditioned on a given policy path) are a distinct and narrower concept.

Bias-variance tradeoff (in feedback forecasting): The tradeoff that arises when the DM’s reaction to the forecast is uncertain: a less informative (attenuated) forecast reduces the variance of the outcome (by inducing a less volatile policy action) but introduces systematic bias. The optimal forecast under quadratic loss resolves this tradeoff by attenuating the forecast slope below what an unbiased forecast would require, producing an optimally biased forecast.

Reaction function (DM’s): The rule by which the decision maker translates a forecast into a policy action: a_t* = x_t * [y_T - E(theta_t | f_t)], analogous to a Taylor rule. The multiplier x_t captures the strength of the policy response and is drawn from a distribution with mean mu and variance tau^2; it is the DM’s private information and a key source of the forecaster’s uncertainty.

Mincer-Zarnowitz (MZ) regression: The linear regression of the realized outcome on the forecast: y_{t+1} = alpha + beta * f_t + error. Under the canonical null of rational forecasting with quadratic loss and no feedback, the intercept alpha should be zero and the slope beta should be one. The paper shows that under optimal forecasting with feedback, alpha and beta can take a wide range of values, including negative beta, even when the forecaster is rational.

Equilibrium forecast slope (c-dagger): The slope of the linear forecasting rule in Perfect Bayesian Equilibrium, given by c^dagger = (1/2) - mu + sqrt(1 - 4*tau^2)/2. This slope is less than one and can be negative depending on mu and tau^2, reflecting the attenuation of the forecast toward the policy target that arises from the bias-variance tradeoff under uncertain DM reactions.

Greenbook (GB) inflation forecasts: Inflation forecasts produced by Federal Reserve staff (now called Tealbook forecasts), used as empirical motivation in the paper. The paper documents two stylized facts for 4-quarter-ahead GB forecasts from 1980q1 to 2019q4: (i) persistent but sign-changing bias in rolling 40-quarter windows, and (ii) a dramatic shift in the rolling MZ slope from approximately unity in the 1980s-1990s to significantly negative in the mid-2000s and approximately zero in the final part of the sample.

Policy feedback (as a confound for rationality tests): The paper’s use of this term to describe the mechanism by which the presence of feedback invalidates the standard interpretation of forecast rationality test outcomes: a forecaster who is fully rational (quadratic loss, no private agenda) and operating in a feedback environment will systematically produce forecasts that fail standard MZ-based rationality tests, not because of irrationality or asymmetric loss, but because of the optimal bias-variance tradeoff induced by uncertain policy reactions.

Global Value Chains and Labor Standards: The Race-to-the-Bottom Problem

Thu, 01 Jan 2026 00:00:00 +0000

Layer 1: Overview

Im and McLaren (2025) ask whether globalization induces governments to weaken labor standards for workers — the so-called “race to the bottom” (RTB) hypothesis. The question has high stakes: advocates point to events such as the 1,136-worker Rana Plaza factory collapse in Bangladesh (2013) and to India’s deregulation campaign after 2014 (associated with approximately 6,500 workplace deaths in 2015–2020) as evidence that competition for global capital systematically erodes safety and working conditions. The paper builds a stylized many-country equilibrium model of labor-market integration adapted from the Grossman and Rossi-Hansberg (2008) tasks framework. Output requires a continuum of tasks z in [0,1], performable in any of N countries; labor requirements per task follow a Weibull distribution (shape parameter nu > 0), independently across tasks and countries. Working conditions (kappa_i) enter the cost function multiplicatively — better conditions reduce worker productivity at the relevant margin. Utility is separable in wages and conditions with both components strictly concave, and Assumption 1 (xxi’(x) and xmu’(x) strictly decreasing) ensures conditions are normal goods and second-order conditions hold. The unregulated equilibrium task allocation is equivalent to CES cost minimization with elasticity of substitution 1/(1-rho) > 1, rho = nu/(1+nu). Governments set minimum standards non-cooperatively in Nash equilibrium.\n\nThe paper’s results fall into two conceptually distinct categories. “Globalization in the large” (autarky vs. open economy): whether standards are market-determined or government-set, integrating two previously autarkic countries raises labor standards in both (Proposition 1). Under autarky, market and government-optimal conditions coincide — all costs of better standards are borne domestically. Under trade, wages rise (income channel: conditions are a normal good), and governments gain a terms-of-trade incentive: tightening kappa_i makes domestic effective labor scarcer and shifts part of the cost onto foreign consumers, inducing government standards to strictly exceed market standards. Formally, for each country i: autarky level = market level under autarky < market level under integration < government level under integration.\n\n"Globalization at the margin" with symmetric countries (Proposition 2): as more identical countries join (N increasing), both market-set and government-set standards rise monotonically. The terms-of-trade motive does not vanish because each country specializes in an increasingly narrow value-chain slice, retaining market power regardless of N. Government standards exceed market standards for every N >= 2 and grow strictly with N — a race to the top — and are shown to be above the social optimum because each country externally imposes part of its improvement costs on others.\n\n"Globalization at the margin" with a North-South structure (Proposition 3): when Southern host countries (i = 2,…,N) have perfectly correlated productivity draws (close substitutes for one another), the result reverses for N > 2. Integration of two countries initially raises Southern standards via both channels. But as additional similar Southern competitors join, competition depresses Southern wages and erodes both the income-based demand for better conditions and the terms-of-trade motive (unilateral tightening redirects demand to competitors without cost-shifting benefit). Both market and government standards fall monotonically as N rises beyond 2. As N approaches infinity, both converge to autarky levels. Critically, however, for any finite N, Southern standards remain strictly above their autarky levels — the race to the bottom, even when operative, never fully materializes while integration is incomplete. The efficiency implication is counter-intuitive: government-set standards are inefficiently strict under GVCs because each country over-provides standards by externalizing costs onto trading partners.

In depth

Q1. What is the model’s formal structure and how does it generate tractable results?

The model adapts Grossman and Rossi-Hansberg (2008). Output requires a unit measure of tasks; labor requirement for task z in country i is A_i * a^i_z, where A_i = bar_A_i * kappa_i, so working conditions raise unit labor costs. Each a^i_z is drawn Weibull(nu, 1) independently. A result (adapted from Anderson et al. 1987, applied by Artuç and McLaren 2015) is that the cost-minimizing task allocation is equivalent to minimizing cost with a CES aggregate of national effective labor supplies, with elasticity of substitution 1/(1-rho) and rho = nu/(1+nu). This reduces the multi-dimensional problem to a standard CES factor-demand problem, yielding closed-form wage equations and tractable Nash equilibrium characterizations.

Q2. What are the two channels driving ‘globalization in the large’ raising standards above autarky?

Two reinforcing channels. First, the income channel: integration raises real wages (gains from specialization), and since working conditions are a normal good under Assumption 1 (utility sufficiently concave), demand for better conditions rises. Second, the terms-of-trade channel: tightening kappa_i makes domestic effective labor more expensive and scarcer; part of the resulting cost increase is borne by foreign consumers and workers via the unit cost identity rather than solely by domestic workers. This cost-shifting gives governments an incentive to tighten standards beyond what the unregulated market sets. The mechanism is formally analogous to the policy externalities in Bagwell and Staiger (2001) and the terms-of-trade motive in Chau and Kanbur (2006), though the latter has no value chains.

Q3. Why does the terms-of-trade motive for over-regulation persist even as the number of symmetric countries approaches infinity?

As more countries join, each specializes in an increasingly narrow slice of the value chain in which it has comparative advantage. This deepening specialization preserves market power: the wage derivative dw_1/d_kappa_1 converges to a limit proportional to rho*w/kappa (strictly greater than the pure autarky productivity effect -w/kappa) rather than to zero. So even in the limit with infinitely many symmetric countries, each country retains some terms-of-trade gain from tightening its standard, and government standards keep rising above market standards.

Q4. Under what precise conditions does the race-to-the-bottom result hold?

The RTB result (Proposition 3) requires that competing host countries be close substitutes for one another. The paper operationalizes this with the extreme case of perfectly correlated productivity draws across Southern countries (a^i_z = a^2_z for all i >= 2 and all tasks z). Under this structure, as N increases from 2 onward, Southern market and government standards fall monotonically toward autarky levels. The mechanism: competition among near-identical countries means unilateral tightening of kappa_2 redirects Northern demand to competitors without generating a terms-of-trade gain for Country 2, so the wage falls and conditions deteriorate. The RTB thus requires high substitutability among competitors, not just trade openness.

Q5. Does the race to the bottom ever drive standards below autarky levels?

No. Proposition 3 parts (i) and (ii) establish that for any finite N >= 2, both market-set and government-set standards in Southern countries remain strictly above their autarky levels. The race is toward (but never below) the autarky benchmark. Only in the limit as N approaches infinity do standards converge to the autarky level (Proposition 3, part iii). For any realistic finite degree of globalization, even the worst-case RTB scenario leaves standards strictly above autarky.

Q6. What is the efficiency implication of Nash equilibrium government-set standards?

Government-set standards under GVCs are inefficiently strict. Each government maximizes domestic welfare ignoring the cost its tightening imposes on foreign consumers and workers. Because tightening kappa_i raises costs partly borne abroad, each government over-provides standards relative to the global social optimum. This is a race to the top that generates a negative international externality — the mirror image of the usual RTB externality. The implication is that international coordination, if it occurred, would likely reduce Nash equilibrium standards toward the optimum, not raise them further.

Q7. How does the paper’s setting differ from prior theoretical work on the race to the bottom?

Prior RTB models (Chau and Kanbur 2006; Felbermayr et al. 2012; Chen and Dar-Brodeur 2020) model countries competing for export markets — competing to sell goods to a common importer — rather than competing to host tasks in global value chains. The current paper frames globalization as an increase in the number of countries that can supply tasks to a common production process, a qualitatively different competitive margin. Prior work also largely takes the degree of globalization as fixed, while this paper explicitly traces out effects as N changes. The distinction between similar versus different competitors as a determinant of the direction of the RTB is also new. The companion paper Im and McLaren (NBER WP 31363) extends the framework to collective-bargaining rights with an empirical component.

Q8. What heterogeneity is documented and what does it imply?

The paper develops two polar cases of country heterogeneity: (1) symmetric countries with independent productivity draws — produces a race to the top as N rises; (2) North-South structure with correlated (identical) Southern productivity draws — produces a race to the bottom as N rises beyond 2. The contrast is the central result: the direction of the marginal effect of globalization on standards depends on the degree of substitutability among competing host countries. The authors connect this to observed patterns — Korean firms relocating only to East Asian affiliates (similar countries) when domestic minimum wages rose, and Chan and Ross (2003) noting that competition is ‘most vicious not between North and South, but among nations of the South.’

Q9. What are the policy implications and their scope conditions?

The core implication is that trade restrictions justified by RTB concerns lack general theoretical support — globalization relative to autarky always raises standards. However, the model validates a targeted RTB concern: when a country faces competition from many similar low-wage countries (e.g., Mexico competing with China in labor-intensive sectors), standards can erode relative to the peak reached under limited integration. The appropriate response in that case is to integrate with structurally different partners (as Mexico did via NAFTA with the US) rather than restrict trade. Since Nash equilibrium standards already exceed the global optimum, international agreements that ratchet standards up further could be welfare-reducing. The paper explicitly cautions that causation is hard to establish in the Mexico-China-NAFTA example, treating it as suggestive illustration rather than proof.

Q10. What are the main limitations and threats to the conclusions?

The paper is entirely theoretical; no empirical test is conducted for working conditions (the authors cite data scarcity as the reason, having a companion empirical paper on collective-bargaining rights instead). Key assumptions include: (a) Weibull, independent task-productivity draws (ensure tractability but are untested); (b) working conditions always reduce productivity at the margin (rules out the many cases where safety improvements also raise output — e.g., Alfaro-Ureña et al. 2021 find no productivity effect of responsible sourcing in Costa Rica, suggesting the trade-off assumption is plausible but not universal); (c) citizen activism, which empirically affects labor standards (Harrison and Scorse 2010; Koenig and Poncet 2019, 2022), is abstracted away; (d) the model has a single final good and no intermediate goods trade beyond the task-allocation interpretation, limiting applicability to multi-sector settings.

Key Concepts

Labor standards (kappa_i): In the paper’s specific sense, the quality of working conditions that (i) raise worker utility holding wages fixed and (ii) increase unit labor costs for employers. Explicitly restricted to improvements that involve a trade-off — e.g., safety provisions, clean bathrooms, break times — excluding complementary improvements that raise both utility and productivity.

Globalization in the large: The paper’s term for the comparison of any open-economy equilibrium (N >= 2 countries integrated) against autarky. Result: labor standards are always strictly higher in the open economy whether market-set or government-set, because income rises and the terms-of-trade motive activates.

Globalization at the margin: The paper’s term for the effect on labor standards of adding one more country to an already-integrated economy (increasing N by 1). This effect is ambiguous: it raises standards when new entrants are dissimilar (symmetric model) and lowers them when new entrants are similar (North-South model).

Terms-of-trade effect (labor-standards channel): The mechanism by which tightening a country’s labor standard (raising kappa_i) reduces domestic effective labor supply, raises the relative price of domestic tasks, and shifts part of the cost improvement onto foreign consumers and workers. This creates an incentive for governments to set standards above the market level and above the global social optimum — producing standards that are too strict from an efficiency standpoint.

Normal good (working conditions): The property implied by Assumption 1 (both xxi’(x) and xmu’(x) strictly decreasing in x) that workers’ marginal valuation of working conditions relative to wages is higher at higher income levels. This ensures that any source of income gains — including gains from trade — mechanically raises equilibrium demand for better working conditions.

Race to the top: The paper’s characterization of the symmetric-countries equilibrium: as N increases, both market-set and government-set labor standards rise monotonically, because market power persists through value-chain specialization and the terms-of-trade motive remains strong. Government standards also exceed the social optimum, making this over-regulation an externality imposed on trading partners.

Race to the bottom (conditional): The result in the North-South model where additional similar Southern host countries erode Southern labor standards as N rises beyond 2. The race is toward autarky levels but never below them for finite N. The RTB requires high substitutability among competing host countries and does not hold as a general consequence of globalization.

Illuminating the Global South

Thu, 01 Jan 2026 00:00:00 +0000

Layer 1: Overview

Satellite nighttime lights (luminosity) are the dominant remote-sensing proxy for local economic conditions in low-income countries, yet their accuracy at fine spatial scales and over time has remained contested. This paper by Chiovelli, Michalopoulos, Papaioannou, and Regan makes two linked contributions. First, it constructs a standardized, annual, global panel of nighttime lights from 1992 to 2023, integrating the legacy DMSP-OLS satellite series (1992–2013) with the higher-quality VIIRS series (2013–onward) after applying three adjustments to the noisier DMSP data: cross-sensor inter-calibration (following Li et al. 2020), top-coding correction (following Bluhm and Krause 2022, using a truncated Pareto distribution to replace pixels with Digital Number ≥ 55), and blooming correction (following Cao et al. 2019, modeling light spillover as spatial decay and subtracting predicted pseudo-light). VIIRS is then downgraded to DMSP-comparable units using an ensemble machine-learning method — extremely randomized trees trained on the single year of full overlap (2013) — yielding an out-of-sample RMSE of 1.50 versus 3.27 for the Li et al. sigmoid approach and 1.57 for the Nechaev et al. convolutional neural network; the F1 score for the binary lit/unlit classification is 0.72 versus 0.51 and 0.71 for those alternatives, with recall = 0.95 and precision = 0.58 against an actual lit-pixel share of only 8.6 percent globally. At the cross-country level — a sample of 173 countries — the adjusted series retains an elasticity of luminosity to GDP of approximately 0.85 and an R² around 0.9 in cross-section; for Africa specifically the elasticity is 0.7 and R² remains around 0.9. In long-difference panel regressions over 1992–2019, the luminosity-GDP elasticity is approximately 0.25–0.24, broadly consistent with Henderson et al. (2012)’s estimate of 0.30–0.33, while at the five-year panel frequency the elasticity is around 0.15–0.17. The second contribution is a systematic validation of the new series against multiple local development proxies across four low-income settings. Using 139 georeferenced DHS surveys from 34 African countries (gridcells of ~28km × 28km), the adjusted series yields cross-sectional coefficients of approximately 0.6 standard deviations for schooling, electricity access, and improved sanitation, and approximately 1 standard deviation for the composite wealth index, between lit and unlit gridcells; in within-gridcell panel regressions, the adjusted log-lights coefficient on schooling is approximately double that of the unadjusted series (~0.02 versus ~0.01), and lit/unlit panel coefficients are statistically significant only with the adjusted series — gridcells turning lit see schooling rise by ~0.05 standard deviations (~0.125 schooling years), wealth index rise by ~0.05 SD, and electricity access rise by ~0.05 SD. In Mozambique, using all post-civil-war censuses (1997, 2007, 2017) across 1,126 admin-4 localities, schooling and non-agricultural employment are at least 0.5 standard deviations higher in lit than unlit localities, equivalent to approximately 0.5 years of schooling and 10 percentage points of non-agricultural employment; within-locality changes in lights co-move significantly with schooling changes, with the difference in schooling gain between localities that turn lit versus stay unlit being about half a year even controlling for admin-3 fixed effects. In Indonesia, panel estimates for public goods across more than 60,000 PODES villages show the adjusted series yields a positive and significant coefficient on the composite wealth index while the unadjusted series yields a counterintuitively negative coefficient. In India, across more than 550,000 SHRUG villages and towns, the adjusted series consistently produces stronger cross-sectional and panel associations with non-farm, manufacturing, and services employment. A key empirical regularity across all settings is that the adjusted series outperforms the unadjusted one most sharply at finer spatial resolutions and in over-time (panel) comparisons, while at coarse aggregation levels (large administrative units or large grid squares) differences between the two series are minor, as spatial averaging attenuates measurement error in the unadjusted data too. Blooming correction delivers most of the improvement in the African context, where top-coding is rare (fewer than 2% of lit DMSP pixels in Africa approach the 63 DN ceiling). The paper also replicates three canonical studies — Michalopoulos and Papaioannou (2013) on precolonial ethnic institutions, Michalopoulos and Papaioannou (2014) on national institutions and split ethnic homelands, and Hodler and Raschky (2014) on regional favoritism — confirming that qualitative conclusions are robust to the data revision while documenting that the adjusted series sharpens several estimates, particularly those exploiting within-region over-time variation.

In depth

Q1. What is the identification strategy and what are the main threats to it?

The paper is a measurement and validation study rather than a causal identification exercise. Its core design is correlational: it regresses local development proxies on nighttime luminosity across gridcells and administrative units, conditioning on country-year fixed effects in cross-section and on unit fixed effects in panel regressions. The main threats are (a) reverse causation (luminosity and development are jointly determined), which the authors acknowledge but do not attempt to address — they are explicit that the goal is proxy validation, not causal estimation; (b) measurement error in both the luminosity variable and the development outcomes (DHS wealth index, census schooling, PODES public goods), which the paper addresses by comparing adjusted versus unadjusted luminosity series and interpreting attenuation bias reduction as evidence of improved measurement; (c) the binary transformation of luminosity (lit/unlit) produces non-classical measurement error — an explicit point drawn from econometric theory (Aigner 1973; Meyer and Mittag 2017) — which partly motivates the adjusted continuous series; and (d) spatial autocorrelation and systematic geographic patterns in prediction error, which the authors check by regressing prediction errors on latitude and longitude and find that the ERT-downgraded series reduces the latitude coefficient to 10% of its magnitude in the unadjusted VIIRS specification for log lights and to 35% for the lit indicator.

Q2. What are the three DMSP deficiencies corrected and what are the specific methods used?

Cross-sensor inter-calibration: DMSP data come from six satellites; Li et al. (2020) supply a cross-calibrated series using a second-order polynomial fitted on overlapping satellite years, which the paper adopts as its ‘unadjusted’ baseline. Top-coding: DMSP records 8-bit Digital Numbers (DN) 0–63, so radiance above a ceiling is truncated. Pixels with DN ≥ 55 are subject to ‘implicit’ top-coding (averages of potentially top-coded sub-readings). The correction uses the radiance-calibrated (RC) vintage available for seven years, ranks the top-coded pixels by the RC series from the nearest year, then replaces them with ‘structural values’ drawn from a truncated Pareto distribution with parameters α = 1.5, L = 55, H = 2000. Blooming: the DMSP sensor stretches edge pixels and can be spatially displaced up to 3 km, causing light spillover. Following Cao et al. (2019), pseudo-light pixels (PLPs) — lit pixels neighboring at least one dark pixel — are identified. An OLS regression of PLP light on the inverse-squared-distance weighted sum of neighbors’ light within a 7 × 7 window is estimated separately for broad global regions. The predicted blooming contribution is subtracted from each lit pixel, negative residuals are set to zero, and a local 3 × 3 mean smoothing is applied. Globally, the blooming correction raises the share of unlit pixels from 92% to 95% in 1992 and from 88% to 91% in 2012.

Q3. How is VIIRS downgraded and harmonized with DMSP, and what does ’extremely randomized trees’ mean?

Because VIIRS records 14-bit DN at 15-arc-second resolution with far superior sensor quality, it is not directly comparable to the 8-bit, 30-arc-second DMSP. The authors’ preferred approach downgrades VIIRS to match the DMSP scale. They use an ensemble machine-learning method called ’extremely randomized trees’ (Geurts et al. 2006), a variant of random forests that, instead of choosing the best splits from the training sample, picks split thresholds randomly, which further reduces variance and improves computational efficiency. Features used to predict DMSP-like values from VIIRS include: pixel statistics (mean, median, min, max of the four VIIRS sub-pixels within each DMSP 30-arc-second cell), statistics of neighboring pixels within windows of 3, 4, 7, 9, 11, 13, 17, and 21 pixel widths, and regional dummies for broad world regions. The model is trained on 2013 (the one full year of DMSP-VIIRS overlap) and its out-of-sample performance is assessed by retraining on 2012 and predicting 2013. Four merged series are produced corresponding to the four versions of DMSP (unadjusted; blooming only; top-coding only; both). The authors’ approach outperforms both the Li et al. (2020) sigmoid-function method (RMSE 3.27 globally vs. 1.50) and the Nechaev et al. (2021) CNN approach (RMSE 1.57), especially in the low-to-middle luminosity range most relevant for low-income countries.

Q4. What development proxies are used in validation and across what samples?

Africa (DHS, 34 countries, 139 surveys, ~28km × 28km gridcells): mean years of schooling (respondents aged 15–39), DHS composite household wealth index, share of households with improved sanitation, share with electricity connection. All outcomes are standardized to mean zero, SD one. Mozambique (Census 1997, 2007, 2017, 1,126 admin-4 localities): mean years of schooling (aged 15–39) and non-agricultural employment (aged 15–24 or 19–24). Indonesia (PODES village census waves 1996–2018, 60,000+ villages): binary measures for garbage disposal, toilet use, drinking water access, gas/electricity for cooking, paved roads, and counts of kindergartens, primary, middle, and secondary schools — aggregated into a first principal component (eigenvalue ~3.5, capturing ~1/3 of variance). India (SHRUG dataset, 550,000+ towns and villages, Population Censuses 1991/2001/2011, Economic Censuses 1990/1998/2005/2013): population count, total non-farm employment, manufacturing employment, services employment.

Q5. What heterogeneity is documented?

Spatial resolution: adjusted series outperforms unadjusted most at fine resolutions (2×2 gridcell blocks, ~56km × 56km at the equator); at coarse levels (12×12 blocks, ~336km × 336km), both series yield similar coefficients, as spatial aggregation attenuates noise in the unadjusted series. Urban vs. rural: cross-sectional estimates are similarly significant in urban and rural DHS samples. Panel estimates are statistically significant only with the adjusted series; urban panel coefficients are consistently larger than rural ones, echoing Asher et al. (2021)’s India finding. The adjustment matters more in rural areas than in urban areas in cross-section. Local variation (spatial RDD / fine fixed effects): with unadjusted series, panel wealth-index coefficients are statistically indistinguishable from zero until spatial fixed effects cover areas at least 7×7 gridcells (~200km × 200km at equator); with the adjusted series, coefficients remain significantly positive at all fixed-effect sizes including the finest 2×2 blocks. Top-coding vs. blooming: most of the improvement in Africa derives from blooming correction; top-coding correction has minor impact because fewer than 2% of lit African DMSP pixels approach the DN ceiling. Country-ethnic homelands (large areas, avg. 25,547 km²): adjustments matter little because spatial averaging already reduces noise. Applications replication: the precolonial institutions result (Michalopoulos and Papaioannou 2013) is robust and essentially unchanged because the units are very large. The national-institutions-at-border result (Michalopoulos and Papaioannou 2014) is strengthened in within-ethnicity specifications (coefficient marginally significant at 90% with adjusted series vs. p ≈ 0.15 with unadjusted); capital-proximity heterogeneity is sharpened. The regional-favoritism result (Hodler and Raschky 2014) strengthens: the log-lights lagged-leader coefficient rises from 0.038 to 0.058, and the lit-probability coefficient rises from ~3 to ~7 percentage points.

Q6. What robustness checks and specification variations are run?

The paper compares four luminosity series (unadjusted Li et al.; blooming only; top-coding only; both combined + VIIRS fusion) to isolate each correction’s contribution. It checks the luminosity-GDP nexus at annual, five-year, and long-difference frequencies. It examines seven African countries’ co-evolution of the harmonized series with electrification share (Kenya, DRC, Ghana, Tanzania, Nigeria, Mozambique, and one other) and finds no discontinuity at the 2012/2013 DMSP-VIIRS transition year. Spatial aggregation robustness: coefficients are computed across aggregation blocks ranging from 2×2 to 12×12 gridcells, showing stability in cross-section (~0.18) and mild size dependence in panel (~0.075, slightly rising with coarser units). Local variation robustness: fixed effects of increasing spatial coverage (2×2 to 12×12 cells) are added while the outcome remains at the gridcell level. Results replicated for schooling and electricity access (Appendix Section B.2) beyond the primary wealth-index outcome. Confounding by latitude in the ML model is assessed via regressions of prediction errors on latitude and longitude with and without country fixed effects. Median regressions confirm the OLS elasticity estimates at the cross-country level. The India analysis is replicated for both towns (urban) and villages (rural) separately.

Henderson et al. (2012): pioneer the use of luminosity as a cross-country GDP proxy and estimate a long-difference elasticity of 0.30–0.33 across 188 countries; this paper estimates 0.25–0.24 over a comparable specification, consistent but slightly lower. Gibson et al. (2021): show that VIIRS is superior to DMSP but find weak GDP-lights correlations outside cities for the early DMSP period in China, Indonesia, and South Africa; this paper addresses the concern by adjusting DMSP and merging it with VIIRS. Asher et al. (2021): validate luminosity as a strong proxy in India and find stronger urban-luminosity links; this paper replicates and extends those findings to Africa, Mozambique, and Indonesia and shows the adjusted series strengthens the Asher et al. patterns. Chen et al. (2024): find strong cross-sectional but weak panel associations; this paper’s adjusted series substantially strengthens panel associations. Bluhm and Krause (2022): provide the top-coding correction method adopted here. Cao et al. (2019): provide the blooming correction method. Nechaev et al. (2021): propose a CNN-based DMSP-VIIRS fusion but apply it to the unadjusted DMSP; this paper outperforms their RMSE slightly (1.50 vs. 1.57) and improves on their F1 score (0.72 vs. 0.71), with greater advantage in low-light regions. Li et al. (2020): propose a sigmoid-based fusion calibrated for high-light pixels; this paper substantially outperforms it (RMSE 1.50 vs. 3.27) particularly in low-luminosity areas. The paper thus synthesizes and extends multiple strands: it unifies the corrections of Bluhm-Krause and Cao et al., pairs them with state-of-the-art ensemble ML fusion, and provides by far the most comprehensive multi-country, multi-context validation of the resulting series.

Q8. What are the policy implications and their scope conditions?

The primary policy implication is methodological: researchers studying development in low-income countries should use the adjusted and harmonized nighttime lights series rather than raw DMSP data, and should be especially careful at fine spatial scales (e.g., spatial regression discontinuity designs, granular village-level analyses) and in panel specifications. The gains from adjustment are largest precisely where applied development research is moving — toward local identification strategies and over-time variation. For practitioners and statistical agencies, the series provides a low-cost annual proxy for local economic conditions in environments with weak administrative data, particularly across sub-Saharan Africa, South Asia, and Southeast Asia. Scope conditions: (a) Correlations are far from perfect — binary lit/unlit classification misses much variation in the many-zeros low-income context. (b) At large aggregate units (admin-1, country-ethnic homelands), the adjustments yield minimal additional improvement since noise averages out. (c) The series does not resolve the fundamental limitation that most of sub-Saharan Africa remains unlit (98.4% of DMSP pixels in Africa in 1992), so it captures variation among already-lit areas better than the development gradient at the zero-light frontier. (d) Future research blending nighttime lights with daytime imagery (traffic, built structures) is flagged as a promising extension, though daytime data are often proprietary.

Q9. What are the main findings from the three replication exercises?

Michalopoulos and Papaioannou (2013) — precolonial ethnic institutions and contemporary development: Replication across 682 country-ethnic homelands confirms that areas with higher precolonial political centralization (as measured by a 0–4 jurisdictional hierarchy index) have significantly higher contemporary luminosity, conditional on country constants and geographic controls. With the adjusted series, the unlit share among homelands rises from 24% to 29% (because blooming correction removes spurious light), but the coefficients on political centralization are still highly significant, somewhat smaller in magnitude, and similar qualitatively. The main conclusion is robust because the units are large and spatial averaging already reduces noise in the raw series. Michalopoulos and Papaioannou (2014) — national institutions and split-border ethnic development: Replication across 38,427 gridcells of 220 systematically partitioned ethnic homelands. Cross-sectional results show a one-point increase in the rule-of-law index (range −2.5 to 2.5) is associated with a ~10 pp higher probability of a gridcell being lit. The within-ethnicity coefficient drops by more than half (~0.025). With the adjusted series, this within-ethnicity coefficient is marginally significant at 90% versus a p-value of ~0.15 with unadjusted. Spatial RDD coefficients remain small and insignificant regardless of adjustment. Capital-proximity heterogeneity: the positive association between rule of law and luminosity is significant only for ethnically split groups where both portions are close to their respective capitals, and this finding is more precisely estimated with the adjusted series; the effect is nil far from capitals in both series. Hodler and Raschky (2014) — regional favoritism: Panel replication across 38,427 subnational regions in 126 countries, 1992–2009. The lagged-leader dummy coefficient (log lights specification) rises from 0.038 to 0.058 with the adjusted series. The linear-probability-model lit indicator rises from ~3 to ~7 percentage points. All specifications with the adjusted series are at least two standard errors above zero, matching or exceeding the precision of the original.

Q10. What are the limitations and caveats acknowledged by the authors?

First, the correlations between luminosity and development are ‘far from perfect’ — the binary lit/unlit transformation in particular fails to capture the significant continuous variation in assets, education, and public goods across regions that are all formally ’lit.’ Second, bottom-coding (under-recording of low-light areas) is acknowledged but not corrected; no existing method addresses it, though the authors note that their corrections nonetheless improve elasticities even in rural African regions with very low light. Third, downgrading VIIRS to DMSP by construction sacrifices some of the VIIRS data quality; the long-difference VIIRS elasticity for Africa (0.4) shrinks to 0.35 in the downgraded series. Fourth, daytime satellite imagery and combinations with nighttime lights (Jean et al. 2016; Yeh et al. 2020; Rossi-Hansberg and Zhang 2025) can better capture local wealth but are often proprietary and not replicable in standard economic research. Fifth, the top-coding correction in Africa is minor because very few pixels approach the DN=63 ceiling (0.98–1.7% of lit pixels in 1992–2012), so the main African improvement comes from blooming; other regions with denser urban cores may benefit more from top-coding correction. Sixth, the cross-sensor inter-calibration step is taken ‘off-the-shelf’ from Li et al. (2020) and further investigation of sensor calibration is left to future work.

Key Concepts

Top coding (DMSP): The truncation of Digital Number values at the 8-bit ceiling of 63 in DMSP-OLS data, caused by sensor calibration for cloud detection. Pixels with DN ≥ 55 also suffer ‘implicit’ top coding because they represent averages of multiple potentially top-coded sub-readings. The paper corrects this by replacing top-coded pixels with structural values drawn from a truncated Pareto distribution, using the radiance-calibrated DMSP vintage to rank pixels.

Blooming (spatial spillover of light): A measurement artifact in DMSP data whereby light from bright pixels spills into neighboring dark areas due to the sensor’s imprecise spatial accuracy and possible displacement of up to 3 km. The paper identifies pseudo-light pixels (lit pixels adjacent to at least one dark pixel), models the spillover as an inverse-squared-distance weighted function of neighboring lights, and subtracts the predicted blooming from each lit pixel. This correction raises the global unlit pixel share from 92% to 95% in 1992.

Extremely randomized trees (ERT): An ensemble machine-learning method used to downgrade VIIRS luminosity data to the DMSP scale. Unlike standard random forests that find the best split thresholds within a random feature subset, ERT selects split thresholds randomly, reducing variance and improving computational efficiency. The authors train it on pixel statistics (mean, median, min, max) and neighborhood statistics within windows of varying sizes to predict DMSP-like values for 2014 onward from VIIRS readings.

Harmonized (adjusted + fused) luminosity series: The authors’ main output: an annual global panel of nighttime lights from 1992 to 2023 that applies inter-sensor calibration, top-coding correction, and blooming correction to DMSP data (1992–2013), then uses the ERT ensemble model to convert post-2013 VIIRS data into DMSP-comparable units, yielding four variants (unadjusted, blooming only, top-coding only, both corrections) merged into a continuous time series at 30-arc-second (~1 km²) resolution.

Pseudo-light pixels (PLPs): In the blooming correction procedure, PLPs are defined as lit pixels (DN > 0) that have at least one dark neighbor (DN = 0). They are the pixels most likely to contain spurious light from neighboring bright areas. PLP light values are regressed on the inverse-squared-distance weighted sum of surrounding pixels to estimate the blooming decay function.

DHS composite wealth index: Used in the validation analysis as a local development proxy: a principal-component aggregation of household characteristics including roof quality and ownership of consumer assets, constructed by the Demographic and Health Surveys program across African countries. The paper standardizes this and other outcomes to mean zero and standard deviation one for cross-outcome coefficient comparisons.

Spatial RDD (regression discontinuity design) using nighttime lights: As applied in Michalopoulos and Papaioannou (2014) and referenced throughout, a design that restricts estimation to gridcells within a narrow band (e.g., 50 km) of a political or administrative border to compare otherwise similar areas on opposite sides, using luminosity as the outcome. The paper notes that such fine-resolution, localized comparisons are exactly the setting where measurement error in the unadjusted DMSP series is most consequential and where the adjusted series yields the largest improvement.

Labour Market Power and the Effects of Fiscal Policy

Thu, 01 Jan 2026 00:00:00 +0000

Layer 1: Overview

This paper proposes a novel fiscal transmission channel through which government spending expansions reduce employer monopsony power in the labor market, generating larger fiscal multipliers and stronger distributional consequences than standard models predict.

Standard New Keynesian models rely on two transmission channels with contested empirical support: a negative wealth effect on labor supply (which moves workers to supply more hours when taxes rise) and countercyclical price markups (which fall in booms, raising labor demand). The evidence on both is ambiguous. This paper introduces a third channel — countercyclical monopsony power — that operates independently of, and interacts with, the other two.

The theoretical framework is a Two-Agent New Keynesian (TANK) model, extending Cantore and Freund (2021). There are two household types: workers (fraction λ = 0.8), who supply labor and have limited financial market access, and capitalists (fraction 1 − λ = 0.2), who earn profit income. Intermediate-good firms compete monopsonistically in local labor markets, paying wages below the marginal revenue product. The wage markdown μ = η/(η+1), where η is the wage elasticity of labor supply to the individual firm. Workers value both pay and non-pay job characteristics (firm location, culture, flexibility), with heterogeneous idiosyncratic preferences drawn from a type-1 extreme value distribution. This differentiation, following Card et al. (2018), gives firms wage-setting power because they cannot observe individual preferences.

The key mechanism is that η depends endogenously on workers’ labor earnings (wt·nt) and their marginal utility of income (uW_c,t): η = θ·uW_c,t·wt·nW_t + 1/φ. When government spending rises, it increases both labor income and — because higher current or future taxes reduce lifetime net income — workers’ marginal valuation of income. Both forces unambiguously raise η, flattening the firm-level labor supply curve, reducing the marginal cost of labor for firms seeking to attract workers, and driving wages up toward the marginal revenue product. Employment and output rise; profits fall and are redistributed toward workers.

In the calibrated baseline (steady-state markdown μ = 2/3, i.e., wages at two-thirds of marginal revenue products, calibrated to Yeh et al. 2022), the impact fiscal multiplier is approximately 0.6 under monopsonistic competition compared to slightly less than 0.4 under perfect competition — a difference attributable entirely to the countercyclical-monopsony channel. The wage markdown rises by approximately 0.3 percentage points on impact following a 1% of GDP government spending shock, roughly twice the response observed when the steady-state markdown is 0.9 rather than 0.67.

The amplification from countercyclical monopsony is strongest when the wealth effect on hours worked is near zero — the baseline calibration consistent with Schmitt-Grohé and Uribe (2012) and Galí et al. (2012). As the wealth elasticity of hours increases, the markdown and output response to spending shocks weaken, because a larger hours response implies a smaller consumption response, which reduces the marginal utility channel. The degree of price stickiness has little effect on the markdown response.

The channel is amplified when workers bear more of the fiscal burden — either through profit redistribution to workers (amplification rises from approximately 0.25 in the no-redistribution baseline to approximately 0.4 when half of profit income is redistributed to workers) or through regressive taxation. Progressively redistributing the tax burden toward capitalists weakens the countercyclical-monopsony channel, which runs counter to the standard cyclical-inequality channel (Bilbiie 2020) that predicts larger multipliers with progressive taxation.

The empirical validation uses an expectations-augmented VAR estimated on quarterly U.S. data from 1981Q3 to 2019Q4 (macroeconomic variables) and 2000Q4 to 2019Q4 (monopsony measure). Government spending shocks are identified via recursive ordering (government spending ordered first), controlling for professional forecasters’ spending growth expectations (following Auerbach-Gorodnichenko 2012), the real interest rate using the Wu-Xia shadow policy rate, and the average tax rate. The inverse monopsony measure — the wage elasticity of worker-firm separations — is estimated by extending Langella and Manning (2021) to quarterly frequency using SIPP microdata, controlling for demographics, industry, occupation, human capital, and time effects via complementary log-log regressions month by month. The VAR impulse responses confirm the model’s central prediction: government spending expansions raise the wage elasticity of separations (reducing employer market power), raise labor income, reduce profits, and generate substantial output increases.

In depth

Q1. What is the identification strategy in the empirical VAR and what are the main threats to it?

The paper uses a recursive (Cholesky) identification scheme with government spending ordered first, following Blanchard and Perotti (2002). The identifying assumption is that government spending does not respond to economic conditions within the same quarter due to decision and implementation lags. Anticipation effects are addressed by including a fiscal news variable — professional forecasters’ one-period-ahead spending growth forecast from the Survey of Professional Forecasters — following Auerbach and Gorodnichenko (2012). The innovation in government spending orthogonal to this forecast is taken as the exogenous surprise shock. The real interest rate (Wu-Xia shadow federal funds rate, which captures unconventional monetary policy at the zero lower bound) and the average tax rate are included to control for monetary policy stance and financing mix. A key threat the paper acknowledges concerns the separation elasticity estimates: the monopsony literature recognizes biases from insufficient controls for alternative wage offers, unobserved heterogeneity, and lack of firm-level exogenous wage variation. The authors follow Langella and Manning (2021) in arguing that these biases are roughly constant over time, so changes in the estimated separation elasticity still reflect changes in true monopsony power.

Q2. What is the key mechanism through which government spending reduces monopsony power?

Two reinforcing forces simultaneously raise the wage elasticity of labor supply to individual firms (η). First, higher government spending raises labor income, which increases the dollar magnitude of pay differences between firms, making workers more responsive to relative pay. Second, higher current or future taxes reduce workers’ lifetime net income, raising their marginal valuation of income (marginal utility of consumption, uW_c,t). Workers facing a tighter budget place greater relative weight on pay versus non-pay job characteristics, further increasing their responsiveness to firm-level wages. Both effects increase η unambiguously for government spending shocks (unlike productivity shocks, where the two forces can offset each other). Higher η flattens the firm-level labor supply curve, compresses the gap between the marginal cost of labor and the wage, and induces firms to raise wages toward the marginal revenue product. Employment and output rise while profits decline, redistributing income from capitalists to workers.

Q3. How is monopsony modeled, and why does the paper use a discrete choice rather than CES approach?

The paper adopts a discrete workplace choice model following Card et al. (2018), where workers draw idiosyncratic preferences over non-pay job characteristics from a type-1 extreme value distribution each period. Firms cannot observe individual preferences and set a posted wage. Standard logit calculations yield the wage elasticity of firm-level labor supply as η = θ·uW_c,t·wt·nW_t + 1/φ, where θ is the inverse importance of non-pay characteristics and 1/φ is the intensive-margin (hours) elasticity. Under CES preferences (used by Berger et al. 2022, Alpanda and Zubairy 2021), the wage markdown is constant in equilibrium — analogous to constant price markups under CES monopolistic competition — which eliminates the time variation in monopsony power that is the paper’s central object of study. The discrete choice framework generates endogenous variation in η through the endogenous terms wt·nW_t and uW_c,t. Berger et al. (2022) show that the CES approach is a special case of the discrete choice model under restrictive assumptions about individual hours responses; the paper intentionally avoids those assumptions.

Q4. What heterogeneity across calibrations is documented regarding the strength of the monopsony channel?

The paper documents several dimensions of heterogeneity: (1) Steady-state markdown: the relationship between the steady-state markdown and the markdown’s response to government spending is hump-shaped (inverted U-shape). At the baseline value of 0.67, the markdown rises by approximately 0.3 percentage points; at a steady-state markdown of 0.9, the response is roughly half as large. Perfect competition (markdown = 1) and maximum monopsony (markdown → 0) both imply no response. (2) Wealth effect on labor supply (χ): as χ increases from zero (baseline, near-GHH preferences) to one (strong wealth effect), the markdown response and the output amplification decline monotonically. With a near-zero wealth effect (baseline), amplification relative to the perfect-competition counterfactual is approximately 0.25 percentage points of steady-state GDP; it diminishes substantially as χ rises. (3) Profit redistribution (φd): output amplification rises from approximately 0.25 (no redistribution, baseline) to approximately 0.4 when half of profits are redistributed to workers. (4) Tax progressivity (φτ): the channel is stronger under regressive taxation (more of the burden falling on workers) and weaker under progressive taxation, in contrast to the cyclical-inequality channel. (5) Degree of tax financing (φg): higher contemporaneous tax financing strengthens the channel because it raises workers’ current marginal valuation of income more directly. (6) Price stickiness (ξ): changing price adjustment costs has little effect on the markdown response and the countercyclical-monopsony amplification.

Q5. How is the separation elasticity measured and linked to the model’s concept of monopsony power?

The separation elasticity γ is the wage elasticity of worker-firm separations: the percentage change in a firm’s separation rate in response to a 1% change in the wage. In the model, γ is shown to be proportional to η − 1/φ (the extensive-margin component of labor supply elasticity to the firm), because firm size and separation rate are linked through a constant elasticity derived from the logit choice structure. Empirically, the paper extends Langella and Manning (2021) to quarterly frequency using SIPP data from 2000Q4 to 2019Q4. Month-by-month complementary log-log regressions of separation dummies on residualized log hourly wages (purged of demographic, industry, occupation, human capital, and time effects) yield time-varying quarterly estimates of γ. A higher γ (less negative, since separations fall with higher wages) indicates lower monopsony power. The VAR incorporates this time-varying series as the inverse monopsony measure.

Q6. How does the countercyclical-monopsony channel interact with the wealth effect and price markup channels?

The three channels interact in both complementary and partially offsetting ways. The wealth effect on hours worked (χ > 0) independently shifts the market labor supply curve rightward when taxes rise, increasing employment. However, a larger hours response implies a smaller consumption response, which reduces the increase in workers’ marginal utility of consumption. Since uW_c,t is a key driver of η, a stronger wealth effect on hours dampens the countercyclical-monopsony channel. Similarly, the countercyclical price markup channel (ξ > 0) raises the marginal revenue product of labor when government spending pushes up demand, boosting employment through an independent channel that also raises labor income — which in turn reinforces η. Yet changing price stickiness has quantitatively little effect on the markdown response in the calibrated model. Income redistribution between agent types mediates the interaction: when capitalists bear most of the tax burden (progressive taxation), workers’ marginal utility of income rises less, weakening the monopsony channel. When workers bear the burden (regressive taxation or profit redistribution), the monopsony channel is strengthened.

Q7. What are the distributional consequences of the countercyclical-monopsony channel?

When government spending rises, the reduction in employer market power forces firms to pay wages closer to the marginal revenue product, increasing labor income and decreasing profits. This redistribution from capitalists (profit recipients) to workers operates through the wage markdown declining (i.e., markup rising toward one). Under monopsonistic competition with endogenous employer market power, this redistribution is stronger than under perfect competition, where only the price markup channel operates. The VAR evidence confirms these distributional predictions: government spending shocks reduce corporate profits (after taxes) and raise labor income in U.S. data. In the model, this redistribution also feeds back into the mechanism: workers facing declining after-tax income (or receiving a portion of declining profits) place greater weight on pay in their workplace choices, further eroding employer market power.

Q8. How does this paper relate to Cantore and Freund (2021) and the TANK literature on fiscal multipliers?

The paper extends the worker-capitalist TANK model of Cantore and Freund (2021), who introduced capitalists that do not participate in the labor market to avoid the criticism (Broer et al. 2019, 2021) that the Bilbiie (2008, 2020) cyclical-inequality channel relies on countercyclical profit income inducing rich households to supply more labor. The Cantore-Freund framework delivers income redistribution between high-MPC workers and low-MPC capitalists without relying on labor supply responses of the rich. This paper adds monopsonistic competition to that framework, introducing a new form of cyclical variation in inequality through time-varying wage markdowns. The interaction with the Bilbiie cyclical-inequality channel is analyzed formally: in particular, tax progressivity has opposing effects under the two channels — progressive taxation amplifies the Bilbiie effect (redistribution to high-MPC workers) but weakens the monopsony channel (capitalists bear more of the tax burden, reducing workers’ marginal valuation of income).

Q9. What robustness is discussed or implied regarding the empirical VAR?

The paper addresses robustness primarily through the following design choices: (1) Use of the Wu-Xia shadow federal funds rate rather than the actual federal funds rate, to capture monetary policy stance during the zero lower bound period; (2) inclusion of the spending growth forecast variable to control for anticipation effects; (3) inclusion of the average tax rate as a control for fiscal financing; (4) detrending all VAR variables as deviations from linear trends. The separation elasticity itself is shown to be robustly procyclical across three detrending methods (linear, linear-quadratic, and HP-filter with λ=1600), with R² values of 49.9%, 43.6%, and 17.1%, respectively, and regression slopes of 1.52, 1.40, and 1.51 in each case. The paper notes that standard biases in separation elasticity estimation (from unobserved heterogeneity, inadequate controls for alternative offers, absence of firm-level exogenous wage variation) are likely roughly constant over time, which validates using changes in the estimated elasticity as changes in true monopsony power, following Langella and Manning (2021, p. 2942). The sample for the monopsony series (2000Q4–2019Q4) is shorter than the macro VAR sample (1981Q3–2019Q4) due to data availability.

Q10. What are the analytical results from the simplified model?

Under flexible prices (no price markup channel), no wealth effect on hours worked (χ = 0), no financial market access for workers (ψW → ∞), full tax financing, and no profit redistribution, the paper derives closed-form expressions for output, labor income, and profits following a government spending shock. Output and labor earnings respond positively to spending only when θ is finite (workers value both pay and non-pay characteristics, so η is endogenous). When θ = ∞ (workers only care about pay → perfect competition with constant η) or θ = 0 (workers only care about non-pay → constant η again), government spending has zero output effect. The parameter Γ = 0 in both limiting cases. For intermediate θ, Γ > 0, government spending raises output and redistributes income from capitalists to workers. This establishes that the countercyclical-monopsony channel is the sole mechanism at work in the simplified model and that it requires intermediate values of workers’ preference for non-pay characteristics.

Q11. What are the policy implications and their scope conditions?

The paper implies that fiscal multipliers may be larger than standard New Keynesian models predict if labor markets exhibit significant employer monopsony power — calibrated to produce a steady-state wage markdown of 2/3 (wages at two-thirds of marginal revenue products), consistent with empirical estimates for the U.S. The countercyclical-monopsony channel provides expansionary effects of government spending even in models where the wealth effect on labor supply is negligible and price markups do not decline. The distributional consequences of fiscal expansions are also stronger under monopsony: income shifts from profit recipients (capitalists) to wage earners more substantially. Scope conditions include: the channel is weaker with stronger wealth effects on hours worked; it is stronger when government spending is financed through current taxes rather than deficit (more tax financing raises workers’ marginal valuation of income more sharply); it is stronger under regressive rather than progressive taxation; and it is stronger when profit income is redistributed to workers. Progressivity of taxation affects the monopsony and cyclical-inequality channels in opposing directions, implying that the optimal tax structure from a fiscal multiplier perspective depends on which channel is quantitatively dominant.

Q12. What prior empirical literature on cyclical monopsony power does this paper build on and extend?

The paper builds on three prior empirical findings. First, substantial employer market power in U.S. labor markets (Berger et al. 2022; Langella and Manning 2021; Yeh et al. 2022). Second, unconditional countercyclicality of employer market power — Hirsch et al. (2018) for Germany, Bassier et al. (2022) for Oregon, and Webber (2022) for the U.S. all document that firms hold more monopsony power in slack labor markets. The paper’s own descriptive analysis confirms this procyclicality of the separation elasticity across multiple detrending methods. Third, Langella and Manning (2021) provide the estimation methodology for the separation elasticity using SIPP data. The paper’s extension is twofold: (a) it extends the Langella-Manning estimates to quarterly frequency and expands the sample to 2019Q4; and (b) it examines the conditional cyclicality of employer market power — specifically, how monopsony power responds to identified government spending shocks — which prior literature had not done.

Key Concepts

Countercyclical monopsony channel: The novel fiscal transmission mechanism proposed by the paper: government spending expansions endogenously reduce employer monopsony power by raising both labor income and workers’ marginal valuation of income, which makes workers more responsive to relative pay differences across firms (higher η), compresses wage markdowns, and raises employment and output. The channel is ‘countercyclical’ in that employer market power falls as spending rises.

Wage markdown (µ): The ratio of the wage paid to workers to the marginal revenue product of labor, defined as µ = η/(η+1), bounded between zero and one. A smaller µ implies a larger wedge between pay and marginal product, i.e., greater monopsony power. Perfect competition corresponds to µ = 1. In the baseline calibration µ = 2/3, meaning wages equal two-thirds of the marginal revenue product.

Wage elasticity of labor supply to the individual firm (η): The key measure of firms’ monopsony power in the model. Defined as η = θ·uW_c,t·wt·nW_t + 1/φ, where 1/φ is the intensive-margin (hours) elasticity. The extensive-margin component θ·uW_c,t·wt·nW_t determines how strongly a firm can attract workers from competitors by raising pay. Higher η means less monopsony power (wages closer to marginal revenue product); lower η means greater power.

Separation elasticity (γ): The empirical proxy for inverse monopsony power: the wage elasticity of worker-firm separations, measuring how steeply a firm’s separation rate falls when it pays higher wages. In the model, γ is proportional to the extensive-margin component of η. Estimated from SIPP microdata via month-by-month complementary log-log regressions of separation dummies on residualized log wages, following Langella and Manning (2021).

New classical (idiosyncrasy) monopsony: The modeling approach used in the paper, following Card et al. (2018), in which monopsony power arises from workers’ heterogeneous preferences over non-pay job characteristics (location, culture, flexibility) rather than from search frictions or geographic isolation. Firms differ in non-pay attributes, and because firms cannot observe individual preferences, they have wage-setting power even with frictionless worker flows between firms.

Cyclical-inequality channel: A fiscal transmission mechanism from the HANK/TANK literature (Bilbiie 2008, 2020): government spending redistributes income from low-MPC capitalists to high-MPC workers, amplifying the fiscal multiplier. The paper shows this channel interacts with the countercyclical-monopsony channel in conflicting ways — progressive taxation strengthens the cyclical-inequality channel but weakens the monopsony channel.

Wealth effect on labor supply (χ): Parameterized via the Jaimovich-Rebelo (2009) utility function, χ governs how strongly a decline in household lifetime income (due to higher taxes) induces workers to supply more hours. The baseline calibration sets χ → 0, consistent with near-GHH preferences and estimates in Schmitt-Grohé and Uribe (2012). A higher χ dampens the countercyclical-monopsony channel by reducing the consumption response and thereby the marginal utility response.

Macroeconomic Effects of Public R&D

Thu, 01 Jan 2026 00:00:00 +0000

Layer 1: Overview

This paper estimates the dynamic macroeconomic effects of US government R&D investment using a Structural Vector Autoregressive (SVAR) framework, with an extension to a Rational Expectations SVAR (RE-SVAR) that explicitly captures private-sector anticipation of public spending decisions. The central questions are: (1) what is the fiscal multiplier of public R&D spending on GDP and private R&D investment, and how does it compare to other government spending categories; (2) does public R&D crowd in or crowd out private R&D; and (3) how much does the private sector’s anticipation of future public R&D commitments amplify these effects?

The dataset covers 1947Q1–2017Q3 and is drawn from the US Bureau of Economic Analysis, deflated to 2009 prices and expressed in per-capita terms. The five-variable system includes government R&D investment (GI), government residual spending (GG), net taxes (T), private R&D investment (GR), and GDP (Y), all modelled in log-levels to preserve cointegrating relationships. The lag length is set to six quarters (chosen by Hannan-Quinn criterion, consistent with the R&D-to-productivity lag literature). Identification rests on three mild contemporaneous restrictions: (i) government R&D decisions are independent of current-quarter GDP, consistent with their long-term, mission-oriented character; (ii) R&D spending can influence all other government expenditures in the same quarter but not vice versa; (iii) taxes affect government spending contemporaneously but not the reverse. An alternative identification (SVAR model B) reverses the within-quarter tax-spending causality and produces very similar results. The RE-SVAR extends the system by including the expected next-period public R&D shock, identified by assuming perfect foresight of one-quarter-ahead government R&D innovations and an additional restriction that public R&D does not respond to lagged GDP or private R&D.

Main quantitative findings from the leading estimation (RE-SVAR model A, full sample):

GDP fiscal multiplier — anticipated shock: within the quarter of implementation (one quarter after the announcement), one dollar of public R&D spending raises GDP by approximately 52 dollars (pure multiplier at t = 0 is 51.59; see Table 2). The multiplier peaks immediately and then declines to roughly 22–24 dollars over a six-year horizon. Critically, this GDP increase is permanent across all SVAR and RE-SVAR specifications, whereas generic government spending produces only a temporary rise.

GDP fiscal multiplier — unanticipated shock: setting aside the anticipation effect, the impact-period multiplier falls to approximately 13–14 dollars (13 dollars in the scenario with no anticipation), which is still substantially larger than the peak multiplier of roughly 0.73–0.76 dollars for residual government spending (Table 1, SVAR model A).

Expectations channel: at t = 0, before the actual spending increase occurs at t = 1, the news alone raises GDP by 16.48 dollars. The total peak GDP effect (55.75 dollars) is nearly double the counterfactual effect without the anticipation component (31.64 dollars). The coefficient on expected next-period public R&D in the private R&D equation is 0.58 (p-value 0.035), confirming a statistically significant anticipation channel for private R&D.

Crowding-in of private R&D: public R&D crowds in private R&D at all horizons. The public-to-private R&D multiplier peaks at 1.81 in the quarter following the news shock (t = 0), and stabilizes at 0.75 after six years — an elasticity of 0.72, close to Moretti et al.’s (2021) estimate of 0.52 from production-function methods. At t = 0, private R&D rises by 0.52 in response to the announcement alone.

Persistence of public spending: a one-dollar public R&D shock keeps GI above 2 dollars six years later, whereas residual government spending returns to baseline within four years. Cumulative total government spending over six years following a one-dollar R&D shock is 220 dollars, versus only 22 dollars for a generic spending increase.

Output elasticity at longer horizons: the GDP multiplier expressed in elasticity terms is 0.34 one year after the anticipated shock, stabilizing between 0.23 and 0.25 over three to six years. The corresponding range for private R&D (GR shock) is 0.18 to 0.16, broadly consistent with cross-country evidence from Coe-Helpman (1995) and Guellec-van Pottelsberghe (2004).

The paper argues that the large short-run multipliers reflect three mechanisms that can materialize quickly: (1) process-innovation cost reductions; (2) early entry of private co-investors seeking first-mover advantage; (3) embodiment of new knowledge in physical capital. At longer horizons, supply-side productivity gains and knowledge spillovers dominate. The policy conclusion is that public R&D is unusually effective both as a demand-side stimulus and as a long-run growth instrument, provided government credibly announces and maintains multi-year funding commitments that stabilize private-sector expectations.

In depth

Q1. What is the identification strategy, and what are the main threats to it?

The baseline SVAR identification (model A) imposes three contemporaneous exclusion restrictions: government R&D decisions are exogenous to same-quarter GDP and to other fiscal variables (because R&D budgets reflect long-term strategic priorities, not countercyclical reactions); GI can influence GG contemporaneously but not vice versa; and taxes affect spending in the same quarter but not the reverse. A key threat is non-fundamentalness: because public R&D programs are announced well in advance, what appears to the econometrician as a surprise shock is actually largely anticipated by the private sector, biasing the SVAR impulse responses. The paper addresses this by extending the SVAR to a Rational Expectations SVAR (RE-SVAR) that adds the expected next-period GI shock to the information set of private agents, identified by the additional assumption that GI does not respond to lagged GDP or private R&D. A secondary threat is the direction of same-period causality between taxes and spending; an alternative model (SVAR model B) reverses this and finds only minor quantitative differences. The Lucas Critique applies to the counterfactual simulation of an unanticipated shock since the model was estimated under a perfect-foresight assumption.

Q2. How does the RE-SVAR separate the anticipation effect from the effect of the actual spending increase?

The RE-SVAR model includes E[GI_{t+1} | Omega_t] — the expectation of next-period public R&D — as a forward-looking right-hand-side variable in the private R&D and GDP equations. Under the perfect-foresight assumption, this expectation equals the realized next-period structural shock. The IRF for an anticipated GI shock therefore starts at t = 0 when the news arrives and the actual spending rise occurs at t = 1. By comparing (i) the full anticipated IRF (news at t = 0 + realization at t = 1) to (ii) a modified version where the news term is removed from the information set (unanticipated shock), the paper isolates the incremental contribution of expectations. At t = 0 the news alone raises GDP by 16.48 and private R&D by 0.52; the total peak GDP effect with anticipation is 55.75, versus 31.64 without it — a difference of roughly 24 dollars at the one-year horizon.

Q3. What are the main mechanisms proposed to explain the unusually large short-run fiscal multiplier?

Three channels are proposed for the large immediate GDP response. First, process innovation can reduce production costs without long lags from the start of R&D investment. Second, anticipatory entry of private co-investors seeking first-mover advantages intensifies investment at the very beginning of a research program, even before results are commercialized. Third, innovation embodied in new physical capital means R&D expenditure is accompanied by complementary investment in physical equipment, amplifying the aggregate demand stimulus. At longer horizons, supply-side productivity gains from knowledge spillovers across firms and sectors become the dominant channel. The paper also notes that public R&D programs are frequently accompanied by large-scale complementary government procurement (e.g., defense agency procurements), further magnifying the total mobilization of public resources.

Q4. What do the multipliers for residual government spending (GG) look like, and how do they compare to public R&D?

From SVAR model A (Table 1), one dollar of residual government spending raises GDP by 0.73 at t = 0 (also its peak), declining to around 0.45 after six years. The peak private R&D multiplier of GG spending is 0.08 (after six years), rising very slowly from near zero. Compared to the GDP multiplier of public R&D (13.68 at t = 0, peak 16.18), the residual spending multiplier is roughly 20 times smaller. Moreover, the GDP increase from GG spending is temporary, reverting to baseline within four years, while the GDP increase from GI spending is permanent. These contrasts hold across both SVAR models A and B and across the RE-SVAR estimations.

Q5. What evidence is there for the crowding-in of private R&D by public R&D?

The paper finds strong, statistically significant crowding-in across all specifications. In the SVAR model A (Table 1), the multiplier of GI on private R&D (GR) reaches its peak of 0.76 after two quarters and remains at 0.41 after six years. In the RE-SVAR model A (Table 2), the anticipated public R&D shock raises private R&D by 1.81 dollars per dollar of public R&D at t = 0, declining to 0.75 after six years, translating to an elasticity of 0.72. Even in the alternative identification (RE-SVAR model B), the result persists, though the peak private R&D multiplier from anticipated GI spending is lower (0.40 after four quarters). The response of private R&D to both its own shock and to public R&D shocks is permanent across all RE-SVAR estimations, supporting the conclusion that public R&D accelerates the total national innovation effort rather than displacing it.

Q6. What mechanisms explain the crowding-in of private R&D?

The paper identifies five complementary channels: (1) Public funding covers large fixed costs (laboratories, human capital), making private research projects profitable that would not otherwise be undertaken. (2) Public R&D removes credit constraints faced by private innovators. (3) Anticipated technological spillovers signal profitable investment opportunities to private firms. (4) The government funding decision itself conveys a signal about the long-run profitability and viability of a research area. (5) The public-private partnership alleviates asymmetric information and the high riskiness that typically deters private R&D. Additionally, transparency in public procurement and entry requirements into publicly funded programs may signal quality, further encouraging private investment.

Q7. What robustness checks are conducted, and what do they show?

Three robustness checks are applied to both the SVAR and RE-SVAR estimations: (i) alternative identification (SVAR model B / RE-SVAR model B) where the contemporaneous causal direction between taxes and government spending is reversed; (ii) a shorter sample excluding the period from the 2008 financial crisis onward (1947Q1–2007Q4); (iii) a longer lag length of eight quarters. For check (i), results are very similar: the GDP multiplier for GI is slightly smaller at short horizons (10.02 vs 13.68 at t = 0 in the SVAR, and 31.19 vs 51.59 at t = 0 in the anticipated RE-SVAR) but converges to similar long-horizon values. For check (ii), the impact of GI on GDP at t = 0 is 15.5 (vs 13.54), with similar hump shape; GI’s impact on GR is slightly lower. For the RE-SVAR robustness checks, the paper reports that the shape, timing, and order of magnitude remain stable, as does the finding that the anticipated GI multiplier considerably exceeds the unanticipated one. The general conclusion is no qualitative variation and only minor quantitative differences.

Q8. What is the RE-SVAR’s handling of the non-fundamentalness problem and how is it justified specifically for public R&D?

Non-fundamentalness arises when the VAR’s implied information set is smaller than that of private agents — i.e., what the econometrician calls a surprise is actually anticipated by the economy, so estimated structural shocks are combinations of current and future structural innovations and the fundamental VAR representation is not identified. The paper argues this problem is particularly severe for public R&D because: (1) R&D budgets are part of long-term plans with detailed technical reports and high-profile public announcements (as documented with historical episodes in Section 2); (2) established procurement links between government agencies and private firms provide early information flows. The RE-SVAR addresses this by explicitly adding E[GI_{t+1} | Omega_t] to the system (Blanchard-Perotti approach applied to a non-causal VAR) and assuming perfect foresight of next-period GI innovations. External forecast measures are unavailable for government R&D spending, making this the only viable route. Perfect foresight is defended as particularly appropriate given the highly public, plan-driven nature of government R&D decisions.

The closest precursors are Deleidi and Mazzucato (2021) and Antolin-Diaz and Surico (2022). Deleidi and Mazzucato use a recursively identified SVAR where defense R&D spending is ordered first and find a first-quarter GDP multiplier of 24 dollars. This paper differs by: (a) using total government R&D (defense + non-defense) rather than only defense R&D; (b) providing a more general and explicitly motivated identification that goes beyond simple recursive ordering; (c) developing the RE-SVAR extension to capture the anticipation channel, which raises the estimated multiplier substantially above 24 dollars. Antolin-Diaz and Surico (2022) study military spending news with a 125-year VAR (60 lags, Bayesian shrinkage) and find a long-run defense spending GDP multiplier of 2.08 and argue that public R&D specifically drives long-run productivity. The present paper uses a shorter but richer five-variable quarterly system with explicit crowding-in measurement. On the crowding-in question, the paper contrasts with earlier work (Goolsbee 1998, Wallsten 2000) finding crowding-out due to inelastic supply of scientists, and aligns with more recent evidence (Becker 2015, Moretti et al. 2021) showing crowding-in once a broader set of mechanisms is accounted for.

Q10. What are the policy implications and their scope conditions?

Three core policy implications are identified. First, public R&D is a highly effective instrument for stimulating long-run technological innovation and economic growth: the permanent GDP response and the strong private R&D crowding-in indicate that public investment substantially elevates the country’s aggregate innovation capacity. Second, fiscal multipliers are class-specific: the multiplier for public R&D dramatically exceeds that for generic government spending, implying that the composition of government expenditure matters greatly for both short-run stabilization and long-run growth. The absence of crowding-out and the large short-run multipliers suggest substantial untapped productive capacity due to market failures in R&D. Third, the anticipation channel is quantitatively important: ignoring private-sector foresight understates the true multiplier, and this implies that the credibility and advance communication of government R&D commitments are themselves policy instruments — long-term, publicly announced programs that stabilize expectations can effectively mobilize private co-investment that would not occur under uncertain or ad hoc spending. Scope conditions: results are estimated on US data 1947Q1–2017Q3, a country with large and heterogeneous federal R&D programs; extrapolation to countries with different institutional settings, R&D compositions, or capital market structures requires caution. The model uses a 1.5-year lag structure that may not fully capture very long-run R&D-to-productivity channels estimated at 5–20 years in micro studies.

Q11. What is the ‘pure fiscal multiplier’ and why does the paper use it instead of the standard multiplier?

Standard fiscal multipliers are calculated by dividing the cumulative IRF of GDP to a unit shock in a given spending category by the cumulative IRF of total government spending to the same shock. The problem is that total spending includes other categories that dynamically respond to the initial shock (e.g., GI shocks cause GG to rise significantly via cross-equation dynamics), so the denominator conflates the effect of GI with the effect of induced GG changes, making multipliers across spending categories incomparable. The paper therefore uses ‘pure multipliers’ (following Perotti 2004): the counterfactual total government spending is calculated from a version of the SVAR where the dynamics of GG are switched off (all coefficients in the GG equation are set to zero), so the denominator captures only the direct mechanical effect of the GI shock on aggregate spending without the induced cross-spending effects. This allows clean apples-to-apples comparison of one average dollar spent across different categories.

Expressed in elasticity terms, the GDP multiplier from an anticipated GI shock is 0.34 one year after implementation and stabilizes at 0.23–0.25 over three to six years. For private R&D (GR shock), the corresponding elasticity is 0.18 after one year, stabilizing at 0.15–0.16. These are broadly consistent with existing cross-country production function estimates: Coe and Helpman (1995) obtain 0.22 for G7 economies; Guellec and van Pottelsberghe (2004) find 0.13 for private and 0.17 for public R&D spending; Ornaghi (2006) finds 0.24 for Spanish firms including spillovers. The paper notes that Jones and Summers (2020) calculate that the social return to innovation can easily generate a GDP effect of 20 dollars per dollar of R&D once the full set of spillovers is captured at the aggregate level, which is consistent with the dollar multipliers obtained here at longer horizons.

Q13. How does private R&D (GR) compare to public R&D (GI) as a GDP stimulus?

In the leading RE-SVAR model A, a unit shock to private R&D raises GDP by 27.65 at t = 0 and reaches a peak of 39.62 after one year, before stabilizing at around 24 dollars after six years. This is slightly below the public R&D effect (peak 55.75 at t = 0, declining to ~38 dollars and eventually ~22 after six years). The short-run superiority of public R&D over private R&D is attributed to: (1) breadth of goals — public programs simultaneously mobilize a wider set of industries; (2) longer planning horizon — reducing uncertainty and encouraging private co-investment; (3) the expectations channel available to public but not private R&D; (4) entry requirements and transparency signaling research quality; (5) government agencies as both funder and user, accelerating knowledge transfer. However, the superiority of public over private R&D is not confirmed in all specifications of the robustness analysis.

Q14. What historical evidence does the paper marshal to motivate the anticipation mechanism?

Section 2 documents several large defense and non-defense R&D programs where public announcements substantially pre-dated actual spending: the Sputnik response (DARPA and NASA created in 1958 following October 1957 Sputnik launch; spending projections published in Business Week months in advance); Nixon’s Strategic Nuclear Doctrine (January–February 1974 announcements of record defense budget of 92.6 billion, with Congress extending Pentagon research commitments in June 1975); Reagan’s Strategic Defense Initiative (publicly announced March 23, 1983; CBO published detailed multi-year cost projections by May 1984); Kennedy’s Moon Mission (announced May 25, 1961; NYT reported cost projections the following day; estimates revised multiple times through 1969); Nixon’s War on Cancer (December 1970 Senate report and May 1971 Nixon speech; National Cancer Act passed December 23, 1971 with pre-specified multi-year budget); Human Genome Initiative (DOE announcement March 1986; Department of Health endorsement April 1987; project ran 1990–2013); Obama’s Climate Action Plan (energy transition plans mooted from 2009; America COMPETES Acts 2007, 2010, 2014). These examples document both the forward-looking nature of R&D budgeting and the detailed public information available to private agents ahead of actual spending.

Key Concepts

Rational Expectations SVAR (RE-SVAR): An extension of the standard SVAR framework that adds a forward-looking expectational variable — specifically the expected next-period public R&D structural shock E[GI_{t+1} | Omega_t] — to the system, allowing the model to capture the influence of private-sector anticipation on current economic outcomes rather than treating all fiscal shocks as surprises.

Non-fundamentalness: A condition arising when the VAR’s implied information set is a strict subset of the actual information set of private agents, causing the reduced-form VAR residuals to be non-invertible linear combinations of current and future structural innovations. For public R&D, this means that what the econometrician identifies as a surprise shock to GI is in fact largely anticipated by the private sector, biasing estimated impulse responses.

Pure fiscal multiplier: A class-specific fiscal multiplier calculated by isolating the GDP response to one dollar spent in a given category of government spending while holding other spending categories constant (switching off their dynamics). Contrasts with the standard multiplier, which conflates the direct effect of the shock with induced changes in other spending categories triggered by dynamic cross-equation correlations.

Mission-oriented spending: Government R&D investment directed at achieving long-term strategic national goals (e.g., space exploration, defense superiority, cancer research, climate transition). Defined by three features that distinguish it from generic government expenditure: (i) long-term policy motivation independent of short-run macroeconomic conditions; (ii) advance public announcements that create private-sector expectations; (iii) potential for permanent productivity-level effects through knowledge spillovers.

Crowding-in: In this paper, the phenomenon whereby an exogenous increase in public R&D investment triggers a statistically significant and persistent increase in private R&D investment — the opposite of the crowding-out (substitution) effect posited when an inelastic supply of scientists and engineers constrains total R&D activity.

Fiscal foresight: The ability of private economic agents to predict future government spending decisions ahead of their actual implementation, arising from legislative lags, public announcements, procurement contracts, and established information channels between policy makers and private co-investors. Fiscal foresight makes standard SVAR fiscal shocks non-fundamental and amplifies the macroeconomic impact of spending by triggering anticipatory private responses before the actual dollar is spent.

Anticipation channel (expectations effect): The component of the macroeconomic response to public R&D spending that is activated at the time of the public announcement rather than at the time of actual spending. In the RE-SVAR model, this channel accounts for the extra GDP boost of approximately 21 dollars at t = 1 and a peak of 24 dollars after one year, relative to the counterfactual scenario of an unanticipated shock.

Manipulation of information in times of crisis: evidence from Covid excess mortality

Thu, 01 Jan 2026 00:00:00 +0000

Layer 1: Overview

Karlinsky and Shayo ask which governments manipulate public information, in which direction, and by how much — questions that are normally intractable because the ground truth is unobservable. The Covid-19 pandemic supplies an unusual opportunity: all countries faced a broadly similar crisis simultaneously, and all-cause mortality — collected by national statistical offices as a routine bureaucratic function independently of Covid — provides a manipulation-resistant benchmark against which officially-reported Covid deaths can be evaluated.

The authors hand-collect all-cause mortality data for 134 countries and territories from national statistical offices, population registries, health ministries, and, in some cases, right-to-information requests facilitated by local journalists. Data span 2015–2021 at weekly, monthly, or annual frequency. Their sample covers 93 percent of countries with at least 75 percent Death Registration Completeness. They compute, for each country, a Misreporting Rate (MRR) defined as estimated Covid deaths minus officially reported Covid deaths, normalised by expected total deaths derived from pre-pandemic trends. Estimated Covid deaths equal excess mortality — itself estimated from a country-specific model with weekly/monthly fixed effects and an annual trend (R² = 0.997 in pre-pandemic prediction) — minus adjustments for excess deaths attributable to conflicts, natural disasters, and other identifiable non-Covid causes. Those adjustments are small: the mean total adjustment across the sample is 0.04 percent of expected deaths.

Six main findings emerge. First, between 45 and 55 percent of the 134 countries misreported Covid deaths. Second, the direction of manipulation is overwhelmingly one-sided: of 131 countries with sufficient data to estimate confidence intervals, 59 reported accurately, 62 significantly underreported, and only 10 overreported. The theoretical prediction that governments might exaggerate a crisis — to rally populations, legitimise repressive measures, or attract foreign aid — finds no empirical support. Third, the magnitude of underreporting is large: the sample reported 5.08 million Covid deaths in 2020–2021 while estimated actual Covid deaths were 12.47 million, nearly 2.5 times the official figure; the implied global MRR is 12.8 percent. Among the 62 underreporting countries, the average MRR is 14.5 percent of expected total deaths and the median is 12 percent. Individual-country MRRs range from above 37 percent (Bolivia, Nicaragua) downward, with Russia at 24 percent. Fourth, state capacity in counting and registering deaths explains some but far from most cross-country variation; the R² of the best capacity-only regression is 0.115. Chile and Russia have virtually identical Death Registration Completeness and Percent Well-Certified Death Registrations, yet Chile accurately reported while Russia’s MRR is 24 percent. Fifth, the extent of underreporting is strongly associated with constraints on governmental power. In individual regressions conditioning on capacity, each of three institutional constraint measures — Clean Elections, Executive Constraints, and Freedom of the Press — is associated with a 0.4–0.5 standard deviation lower MRR per one standard deviation stronger constraint. In a joint model including all 12 factors from four domains (macroeconomic incentives, culture, audience sophistication, institutions), institutional constraints are the strongest predictor (partial R² ≈ 0.11), followed by audience sophistication (partial R² ≈ 0.04–0.06). Macroeconomic incentives — tourism reliance, unemployment, foreign direct investment — are not jointly significant. Cultural factors (trust, individualism, religiosity) lose significance once other factors are controlled. The full model explains more than 50 percent of MRR variation. Sixth, countries with a communist legacy (defined as having had a communist or socialist regime for at least 10 years, covering 34 countries) show significantly higher misreporting even holding current institutional and cultural conditions constant. Countries that held elections during 2020–2021 also show significantly higher misreporting.

The results are robust to alternative expected-mortality models, alternative MRR normalisations, the inclusion of Bangladesh, China, and Indonesia (treated separately due to data quality concerns), year-by-year (2020 vs. 2021) splits, controls for age structure and GDP per capita, and alternative manipulation measures (underdispersion, Benford’s law deviations). The evidence that manipulation cannot be attributed to varying standards for false-positive attribution of cause of death is direct: four pre-pandemic measures of a country’s tendency to use unspecified cause-of-death categories are uncorrelated with MRR and individually account for less than 1 percent of its variation.

The paper’s contribution to the economics of information manipulation is methodological as well as empirical: it provides a comparable, country-level measure of governmental misinformation based on actual observable actions regarding a policy issue of central importance, covering a large and diverse cross-section of countries.

In depth

Q1. What is the identification strategy and what are the main threats to it?

The strategy compares officially-reported Covid deaths (the variable that attracted political attention and over which governments had strong incentives and ability to intervene) with estimated Covid deaths derived from excess all-cause mortality (a statistic collected routinely by national bureaucracies under very different incentive structures, harder to manipulate, and less visible publicly during the pandemic). The identifying assumption is that all-cause mortality data are not themselves systematically manipulated in response to Covid. The authors defend this on four grounds: (1) all-cause mortality has long been collected independently of Covid; (2) ascertaining that someone died is far easier than attributing a cause of death; (3) Covid figures attracted vastly more public attention, making their manipulation more urgent; (4) when governments appear to have discovered the evidential value of excess mortality, their response has been to delay publication of all-cause data rather than to alter it (Belarus is cited as an example). The main remaining threat is that the adjustment for non-Covid excess deaths (conflicts, disasters, traffic accidents, suicides, homicides) is imperfect in countries with poor data on those causes. The authors note this caveat but show mean adjustments are tiny (0.04% of expected deaths) and the largest individual adjustments (Armenia 6.1%, Azerbaijan 3.2%) are driven by the Nagorno-Karabakh war and are handled explicitly.

Q2. How is excess mortality estimated, and how sensitive are the results to modelling choices?

Country-specific models are estimated using 2015–2019 all-cause mortality data, including country-specific weekly or monthly fixed effects and a country-specific annual trend to capture seasonality and long-run factors (population ageing, improvements in health care, etc.). The model achieves R² = 0.997 in predicting pre-pandemic mortality. The authors report in Supplementary Material B that alternative expected-mortality approaches from the literature yield very similar results, as do alternative normalisations of the MRR. Sensitivity to model choice is low because the discrepancies between excess and reported deaths in weak-institution countries are so large that they persist across methodological variants.

Q3. How do the authors distinguish intentional manipulation from limited state capacity?

They use two pre-pandemic, capacity-specific measures: (1) Death Registration Completeness (DRC) — the share of deaths captured by the vital registration system — and (2) Percent of Well-Certified Death Registrations (PWC) — the share with proper cause-of-death attribution. Both are computed before the pandemic so they are not contaminated by Covid-era behaviour. Regressions confirm that capacity predicts MRR negatively (R² up to 0.115), but the residual variation remains large. The clearest illustration is Chile vs. Russia: both have complete DRC and near-identical high PWC, yet Chile reports accurately and Russia has an MRR of 24 percent. All subsequent analysis of correlates conditions on these capacity measures.

Q4. How do the authors rule out the possibility that differences in false-positive aversion (rather than manipulation) explain MRR variation?

They construct four pre-pandemic measures from WHO Mortality Database ICD-10 cause-of-death data: (1) number of ICD codes reported; (2) share of specific-viral deaths among all viral deaths; (3) share of specific-infection deaths among all infection deaths; (4) share of specific-respiratory deaths among all respiratory deaths. A country more averse to false positives would report less specific causes. None of the four measures is significantly associated with MRR, and none accounts for more than 1 percent of its variation. This rules out differences in diagnostic/reporting standards as a driver of the observed discrepancies.

Q5. What is the direction of manipulation and what does this imply for theories of governmental information behaviour?

Of 131 countries with estimable confidence intervals, 62 significantly underreported and only 10 overreported. The four main theoretical channels for overreporting — rally-around-the-flag effects, legitimising repression, attracting foreign aid, and inducing flight-to-safety compliance — find no empirical support. The authors argue that the rally-around-the-flag mechanism requires an outgroup-related threat (Covid, unlike a foreign military attack, was not easily framed this way), that Covid mortality does not signal repressive capacity, and that international economic actors appear sufficiently sophisticated to be sceptical of inflated figures. The pattern is consistent instead with governments downplaying to project competence, reduce accountability, and justify inadequate responses.

Q6. What factors are most strongly associated with misreporting, and how are they ranked?

In joint regressions with all 12 factors from four domains, after conditioning on capacity: (1) Institutional constraints (Clean Elections, Executive Constraints, Freedom of the Press) have the highest partial R² (approximately 0.11 for Executive Constraints alone) and are jointly significant at p < 0.001; each standard deviation of stronger institutional constraint is associated with roughly 0.4–0.5 standard deviations lower MRR. (2) Audience Sophistication (tertiary education, HDI Education Index, internet access) is the second strongest domain (partial R² in the range of 0.04–0.06 per variable; jointly significant at p < 0.05). (3) Cultural factors (trust, individualism, religiosity) are individually significant in bivariate regressions but lose significance when institutional and other factors are controlled. (4) Macroeconomic incentives (tourism, unemployment, net FDI) are not jointly significant in any specification. Specification-curve analysis across all combinations of controls confirms that Executive Constraints is the single most robust predictor, retaining sign, magnitude, and significance across all models. The full model (Table 4, column 1) has R² exceeding 0.50.

Q7. What is the communist legacy finding and how is it interpreted?

Countries defined as having had a communist or socialist regime for at least 10 years (34 countries) show significantly higher MRRs even after conditioning on contemporary institutional constraints, audience sophistication, culture, and capacity. The coefficient is statistically significant at p < 0.05 or better in the main and most robustness specifications. The authors point to Harrison (2017) on the pervasiveness of information manipulation in communist states as a historical precedent, and interpret the finding as a persistent legacy operating through channels not fully captured by current measures. This suggests that historical exposure to a political culture of systematic information manipulation may have durable effects on bureaucratic behaviour or political norms that current V-Dem indices do not fully absorb.

Q8. What is the elections finding?

Countries holding national parliamentary or presidential elections during 2020–2021 (76 of 134 countries) show significantly higher misreporting, consistent with electoral incentive theories of information manipulation. This finding is robust to including controls for GDP per capita, population age structure, and other domains, and is stable across the 2020-only and 2021-only sub-samples.

Q9. What robustness checks are performed?

The authors conduct: (1) specification-curve analysis across all combinations of covariates; (2) a joint model with all 12 individual factors; (3) principal component analysis within each domain to recover common variation and reduce dependence on specific measurement choices; (4) alternative expected-mortality models (Supplementary Material B.1); (5) alternative MRR normalisations (Supplementary Material B.2); (6) separate year-by-year analysis for 2020 and 2021; (7) inclusion of Bangladesh, China, and Indonesia as robustness cases despite lower data reliability; (8) addition of GDP per capita to check whether the institution-misreporting link is proxying for development; (9) analysis using underdispersion (Kobak 2022) and Benford’s law deviations as alternative manipulation measures; (10) exploration of colonial legacy as an additional historical variable (no significant effect found). The primacy of institutional constraints is robust across all of these.

Q10. How do the authors treat China, Bangladesh, and Indonesia?

These three large countries are excluded from the main analysis because their all-cause mortality data come from surveys (Bangladesh, China) rather than vital registration systems, or are very incomplete (Indonesia), making excess mortality estimation unreliable. They are included in a robustness regression (Table 4, column 6) and results are described as qualitatively similar. The authors flag that China’s data may itself be informative as a potential indicator of data suppression.

Q11. How does this paper relate to and differ from prior work?

The paper is closest in spirit to Olken (2007), who uses the gap between reported and actual infrastructure spending to measure corruption, and Martinez (2022), who compares GDP growth to night-time-light-implied growth and finds autocracies overstate growth by more than a third. The authors extend this approach to a different domain (health/mortality) with broader country coverage. Prior Covid-specific work documented anomalies — underdispersion (Kobak 2022) and Benford’s law deviations (Kapoor et al. 2020; Kilani 2021) — and noted that autocratic regimes reported lower-than-expected deaths (Annaka 2021; Cassan and Van Steenvoort 2021), but these studies relied on regime type as the sole or primary explanatory variable and did not systematically rank competing factors. Neumayer and Plümper (2022) and Wigley (2024) used the authors’ own World Mortality Dataset to test data manipulation. This paper is distinctive in that it: (a) provides what the authors describe as the most systematic estimates to date of Covid mortality and misreporting; (b) examines a broad range of factors across four domains without a priori privileging any; (c) directly tests and rejects capacity and false-positive aversion as alternative explanations; and (d) identifies communist legacy and elections as additional significant correlates.

Q12. What are the policy implications and their scope conditions?

Three implications are highlighted. First, unconstrained regimes appear to manipulate not only economic statistics but also health information during the most salient public policy event of the era; travel restrictions and multilateral actions during the pandemic relied on reported Covid figures, so manipulation had direct international externalities. This raises broader questions about the credibility of official data from such governments across domains — foreign aid targeting, climate action, vaccination campaigns. Second, the MRR provides a comparable cross-country measure of institutional quality grounded in actual governmental behaviour, potentially useful as an input to studies of institutions, conflict, electoral outcomes, and economic performance. Third, some countries that score respectably on conventional executive constraint indices — Albania, El Salvador, India, Serbia — show high MRRs, suggesting these rates may be leading indicators of democratic erosion not yet captured by standard measures. The scope condition the authors flag is external validity: if pandemic mortality is an extreme case with unique incentive structures (tourism, investment, aid eligibility), then findings about determinants of manipulation may not generalise beyond crisis settings. The authors argue against this interpretation on the grounds that macroeconomic factors — which would be pandemic-specific — are not significant, while institutional constraints — which reflect general governmental behaviour — are.

Q13. What limitations do the authors acknowledge?

First, the analysis is explicitly descriptive rather than causal; factors are correlates, not proven determinants. Second, the MRR may understate true manipulation if all-cause mortality data are themselves selectively withheld or manipulated; the authors argue this is probably modest but acknowledge it cannot be fully ruled out. Third, important large countries — Pakistan, Nigeria, Ethiopia, Venezuela — cannot be scored because sufficient all-cause mortality data are not publicly available; the authors note this absence may itself be informative but cannot be quantified. Fourth, data on other causes of excess deaths (traffic accidents, suicides, homicides) are patchy in many countries, though the scale of these adjustments is very small. Fifth, some capacity controls (PWC) use data from as early as 2003, introducing measurement error. The paper does not claim to fully separate the channels through which institutions reduce manipulation (electoral accountability, press scrutiny, judicial oversight, professional agency independence), treating them as joint constraints rather than separately identified mechanisms.

Key Concepts

Misreporting Rate (MRR): The paper’s central measure, defined as (estimated Covid deaths minus officially reported Covid deaths) divided by expected total deaths for the country in the same period based on pre-pandemic trends. A positive MRR indicates underreporting; a negative MRR indicates overreporting. Normalising by expected total deaths rather than by reported Covid deaths accounts for differences in population size, age structure, and baseline mortality across countries.

Excess mortality: The number of deaths above and beyond what would have been expected in the absence of the pandemic, estimated from country-specific models with weekly or monthly fixed effects and an annual trend fitted to 2015–2019 data. Used as the primary building block for estimated Covid deaths after subtracting excess deaths due to identified non-Covid causes (conflict, natural disasters, traffic accidents, homicides, suicides).

Death Registration Completeness (DRC): In this paper’s usage, the share of all deaths in a country captured by its vital registration system each year, measured using pre-pandemic data. Treated as the most basic indicator of a country’s capacity to count deaths. Used as a control to separate capacity constraints from intentional manipulation.

Percent of Well-Certified Death Registrations (PWC): The share of death certificates in a country that carry a properly specified cause of death, measured using pre-pandemic data. Used alongside DRC as a second capacity control capturing not just whether deaths are registered but whether causes are correctly attributed.

Informational Autocrat: Following Guriev and Treisman (2022), the paper uses this concept to describe executives in countries where formal and informal checks and balances are weak, who systematically manipulate public information to project competence and reduce accountability. The paper’s empirical results are interpreted as evidence that such executives behave as informational autocrats not only in economic statistics but also in health data.

False-positive aversion: The tendency of some countries to apply a higher evidentiary bar before attributing a death to a specific cause — such as Covid — rather than leaving the cause unspecified, independently of capacity or intention to deceive. The paper operationalises this using pre-pandemic ICD-10 data on specificity of reported causes of death and shows it is uncorrelated with MRR, ruling it out as a driver of observed discrepancies.

Communist legacy: The paper’s binary indicator for countries that had a communist or socialist regime for at least 10 consecutive years (34 countries). The variable captures historical exposure to a political culture of systematic information manipulation and is found to be a significant positive predictor of MRR even after conditioning on current institutional constraints, consistent with persistent norms or bureaucratic practices.

Monetary financing produces neither high inflation nor miraculous fiscal multipliers

Thu, 01 Jan 2026 00:00:00 +0000

Layer 1: Overview

When central banks pay interest on reserves — as the Federal Reserve has done since October 2008 and as is standard operating procedure today — does financing fiscal stimulus by permanently expanding the central bank’s balance sheet produce higher output than debt-financed stimulus? Van der Kwaak (2024) argues the answer is no in most model configurations, and only modestly yes in a specific extension.

The motivation is practical: with government debt at high levels in many advanced economies, the private sector may be unable or unwilling to absorb additional bonds needed to fund fiscal stimuli. One alternative is monetary financing — the central bank permanently purchases the extra bonds issued to fund the stimulus (as proposed by Gali 2020b for COVID-era policy). A prior key paper (Gali 2020a) found money-financed stimuli to be substantially more effective than debt-financed ones, but that result was derived in a model where the central bank does not pay interest on reserves, so the policy rate becomes endogenous under money financing. Van der Kwaak shows this assumption is at odds with how modern central banks operate: post-GFC balance sheet expansions by the Federal Reserve and ECB have been financed almost entirely by interest-bearing reserves, with non-interest-paying currency showing no meaningful deviation from trend.

The paper employs a New Keynesian DSGE model with labor as the sole production factor, a central bank that holds government bonds funded by non-interest-paying money and interest-paying reserves (with the composition endogenous), financial intermediaries subject to a Gertler-Kiyotaki (2010) / Gertler-Karadi (2011) incentive-compatibility leverage constraint on bond holdings, and a standard active Taylor rule bounded by the ZLB. Fiscal stimulus takes the form of either (i) a lump-sum tax cut or (ii) an increase in government spending, each equal to 1% of steady-state output. Money financing is modeled as the central bank acquiring the additionally issued bonds and retaining them permanently in nominal terms.

The central analytical result (Proposition 1) is a proof of “extended Ricardian equivalence”: the consolidated government’s funding mix among money, reserves, government bonds, and lump-sum taxes has zero effect on inflation and the equilibrium allocation in the real economy. This holds whether or not the incentive-compatibility constraint of financial intermediaries is binding — that is, even when bonds and reserves are not perfect substitutes and money financing genuinely reduces the government’s funding costs. The key mechanism: because the central bank pays interest on reserves, the deposit rate equals the policy rate in equilibrium, and the policy rate is the sole endogenous variable on which households’ deposit return depends. As a result, household consumption-savings decisions are completely decoupled from the financing mix; inflation and real quantities are pinned down entirely by the standard NK equilibrium conditions plus the Taylor rule. Proposition 2 further shows that net cash flows between households and the government/financial sector ultimately just finance exogenous government expenditures, so changes in bond prices and lump-sum taxes produce no net wealth effects on households.

This irrelevance result is shown to extend analytically to: (i) the ZLB regime (since the central bank still controls the policy rate under money financing), (ii) any maturity structure of government debt, (iii) the ECB’s two-tiered reserve system (where minimum reserves earn zero and excess reserves earn the policy rate), (iv) ex ante sovereign default risk, (v) an alternative leverage constraint form (deposits capped relative to reserves plus a fraction of bonds), and (vi) a model with physical capital when corporate securities are held by unconstrained households.

The irrelevance breaks only when balance-sheet-constrained financial intermediaries also hold corporate securities financing the physical capital stock (Section 4.2 / Sims-Wu 2021 extension). In that case, central bank bond purchases under money financing compress bond yields, which via the intermediaries’ portfolio-choice condition also compresses expected returns on corporate securities, stimulating investment. The quantitative difference between money- and debt-financed stimuli, measured by the discounted cumulative fiscal multiplier over 1,000 quarters, is 0.26 — substantially smaller than the 0.50 difference found by Gali (2020a). For the spending stimulus, the debt-financed multiplier is 0.9103 and the money-financed multiplier is 1.1719, giving a money-over-debt advantage of 0.2616. For the tax cut, the debt-financed multiplier is -0.0219 and the money-financed multiplier is 0.2397, again a difference of 0.2616. The smaller advantage relative to Gali (2020a) reflects the fact that in Gali’s framework the policy rate is not controlled by the central bank under money financing, so households’ saving return falls endogenously and consumption expands sharply — an effect that is entirely absent here because the central bank retains full control of the policy rate.

The policy implication is that proposals to use monetary financing to achieve “miraculous” multipliers beyond the normal spending multiplier are misguided in modern institutional settings where central banks pay interest on reserves. Money financing avoids increasing private-sector-held debt but does not amplify macroeconomic stimulus relative to conventional debt financing in the baseline case, and offers only a small incremental boost in the more structured extension.

In depth

Q1. What is the key analytical result and what is the formal proposition that establishes it?

Proposition 1 proves ’extended Ricardian equivalence’: the consolidated government’s funding mix among money, reserves, government bonds, and lump-sum taxes has zero impact on inflation and the equilibrium allocation in the real economy. The proof works by exhibiting a self-contained subset of equilibrium conditions — households’ first-order conditions for consumption, labor, and deposits; the Taylor rule; firms’ pricing conditions; and market clearing — that uniquely pins down all real quantities and inflation without including any equation governing the government’s or central bank’s financing mix. Because the deposit rate equals the policy rate in equilibrium (due to reserves not being subject to the incentive-compatibility constraint), households’ saving return depends only on inflation and real variables, so the funding mix drops out entirely.

Q2. Why does the irrelevance result hold even when the incentive-compatibility constraint of financial intermediaries is binding and bonds and reserves are NOT perfect substitutes?

When the constraint binds, reserves earn a lower return than bonds, so the central bank’s bond purchases do increase bond prices and reduce government funding costs — but these price changes generate no net wealth effects on households. Proposition 2 shows formally that all cash flows between households on one side and the government and financial intermediaries on the other ultimately just finance (exogenous) government expenditures on final goods. Changes in bond prices, intermediary dividends, and households’ bond and deposit returns cancel out in the household budget constraint, so W_t = g_t regardless of the financing mix. The intuition is that the financial sector and government together form a closed circuit relative to households, and because government spending is exogenous, the circuit’s net effect on household wealth is always the same.

Q3. How does this result differ from Gali (2020a), and why is the multiplier advantage of money financing larger in that paper?

Gali (2020a) assumes the monetary base consists solely of non-interest-paying money. In that setting, when the central bank permanently expands the monetary base to finance a fiscal stimulus, it cannot simultaneously control the policy rate and the money supply, so the policy rate becomes endogenous and falls relative to a debt-financed stimulus. This endogenous reduction in the rate at which households can save causes a substantial increase in consumption. In van der Kwaak’s framework, the central bank pays interest on reserves and retains full control of the policy rate regardless of whether the stimulus is debt- or money-financed, eliminating this consumption-expansion channel. As a result, Gali finds a money-over-debt multiplier advantage of 0.50, while van der Kwaak finds 0.26 in the one model extension where irrelevance is broken, and zero in the baseline.

Q4. In what model extension is the irrelevance result broken, and what is the mechanism?

The irrelevance breaks when balance-sheet-constrained financial intermediaries hold both government bonds and corporate securities (financing the physical capital stock), as in Sims and Wu (2021) and van der Kwaak (2023). In this configuration, the incentive-compatibility constraint links the expected excess returns on bonds and corporate securities through a fixed ratio lambda_b / lambda_k. When money financing causes the central bank to acquire additional bonds, bond prices rise and expected bond returns fall. Via the portfolio-choice optimality condition, this also compresses expected returns on corporate securities, which encourages investment. A direct link thus emerges from the government’s financing mix to the real economy through the financial sector’s balance sheet. Without this channel — whenever corporate securities are held by unconstrained households, or the model has no physical capital — the irrelevance holds exactly.

Q5. What are the exact quantitative multiplier results from the numerical exercise?

Using the discounted cumulative multiplier formula summed over 1,000 quarters (Table 2): (i) Debt-financed tax cut: -0.0219. (ii) Money-financed tax cut: 0.2397. Difference: 0.2616. (iii) Debt-financed spending stimulus: 0.9103. (iv) Money-financed spending stimulus: 1.1719. Difference: 0.2616. The money-over-debt advantage is identical (0.2616) for both types of stimulus, though the levels differ substantially. The debt-financed tax-cut multiplier is negative because higher bond issuance generates capital losses on intermediaries’ bond portfolios, tightening the incentive-compatibility constraint and reducing credit provision and investment. Money financing mitigates these losses by having the unconstrained central bank absorb the newly issued bonds, raising bond prices and net worth.

Q6. What robustness checks does the paper conduct on the irrelevance result?

The paper proves the irrelevance analytically for: (1) Both binding and slack incentive-compatibility constraints (Section 3.1). (2) Any maturity structure of government debt — the maturity parameter rho drops out of the relevant equilibrium conditions (Section 3.2.1). (3) The ZLB — since the central bank still controls the reserve rate even under money financing (Section 3.2.1). (4) An alternative leverage constraint where deposit capacity depends on reserves plus a discounted fraction of bonds rather than a fixed fraction of bond value (Appendix C.2). (5) The ECB’s two-tiered reserve system, where minimum reserves receive zero interest and excess reserves receive the policy rate; the deposit rate becomes (1-theta)*policy rate instead of the policy rate itself, but is still solely determined by the policy rate (Proposition 3, Section 3.2.2). (6) Models with physical capital when households hold the corporate securities (Proposition 4, Section 4.1). (7) Ex ante sovereign default risk following Corsetti et al. (2013) (Appendix C.1).

Q7. What is ’extended Ricardian equivalence’ as defined by the author, and how does it differ from the original Barro (1974) result?

Barro’s (1974) Ricardian equivalence shows that the funding mix between government debt and lump-sum taxes has zero effect on the real economy. Van der Kwaak extends this to include the monetary base — the funding mix among money, reserves, government bonds, and lump-sum taxes has zero impact on inflation and the real equilibrium. This is a strictly more general result because it covers the substitution of money/reserves for bonds (i.e., monetary financing), not just the substitution of debt for taxes. Crucially, the extension holds even when bonds and reserves are not perfect substitutes (when the incentive-compatibility constraint binds), which is the nontrivial part of the contribution.

Q8. How is ‘money financing’ modeled in the paper?

A money-financed stimulus is modeled as one in which the government bonds newly issued to fund the additional spending or the tax cut are acquired by the central bank and permanently retained on its balance sheet in nominal terms. For a spending stimulus, the parameter kappa_g = 1 means the central bank’s nominal assets expand by the amount of each period’s additional government purchases (g_t - g_bar). For a tax cut, kappa_tau = 1 means the central bank acquires bonds equal to the tax-cut component tau_tilde_t. Debt financing corresponds to kappa_g = 0 or kappa_tau = 0. The central bank’s dividends (profits net of interest on reserves and seigniorage on currency) are returned to the fiscal authority each period, so central bank net worth is zero. The author notes this is consistent with the legal constraints on central banks (Buiter 2014) since it takes the form of permanent QE rather than overt fiscal transfers.

Q9. What is the role of the incentive-compatibility constraint in generating the bond-price spread, and why does the irrelevance result still hold?

The Gertler-Kiyotaki constraint limits the volume of government bonds intermediaries can hold relative to their net worth (chi_t * n_t = lambda_b * q^b_t * s^{b,f}_t when binding). When binding, intermediaries cannot freely expand bond holdings in response to higher bond supply, so an increase in bond supply under a debt-financed stimulus depresses bond prices and creates capital losses. Conversely, the unconstrained central bank buying additional bonds under money financing raises bond prices. So the constraint creates a genuine price and funding-cost differential between money- and debt-financed stimuli. Yet the irrelevance still holds because, as shown in Proposition 2, these bond-price changes, together with changes in intermediary dividends, net out from the household budget constraint — the household sees the same net obligation regardless of financing mix.

Q10. How does Corollary 1 relate to the empirical observation about the monetary base composition?

Corollary 1 proves analytically that any expansion of the monetary base under money financing consists entirely of an expansion in interest-paying reserves — non-interest-paying money holdings are unchanged. This is because, in equilibrium, households’ demand for non-interest-paying money depends only on consumption and the nominal deposit rate (via the money-in-utility first-order condition), neither of which changes under money financing (by the irrelevance result). This directly matches the empirical evidence shown in Figures 1 and 4 for the Federal Reserve and ECB respectively: post-GFC balance-sheet expansions were almost entirely in interest-paying reserves, with currency in circulation showing no deviation from trend.

Q11. What is the tax-cut mechanism under debt financing in the numerical exercise, and why is the multiplier negative?

Under a debt-financed tax cut (kappa_tau = 0), the fiscal authority must issue more bonds to offset the revenue shortfall. Because financial intermediaries’ incentive-compatibility constraint is binding, they cannot perfectly elastically absorb the additional bond supply; bond prices fall, causing capital losses on intermediaries’ existing holdings. This reduces net worth, tightens the constraint further, and forces intermediaries to reduce lending to the real economy. The capital price and investment therefore fall. The trough in output is at most about 0.03% of steady-state output, but the cumulative multiplier is -0.0219 — negative because the adverse financial amplification from falling bond prices more than offsets any direct effect of the lump-sum transfer on households. This mechanism is similar to van der Kwaak and van Wijnbergen (2017).

Q12. What is the calibration strategy, and how closely does it follow Gali (2020a)?

The calibration of the model with financial intermediaries holding corporate securities follows Gali (2020a) for most household and production parameters: discount factor beta = 0.995, risk aversion sigma_c = 1, inverse Frisch elasticity phi = 5, price semi-elasticity of money demand eta = 7, Calvo probability psi_p = 3/4, elasticity of substitution epsilon = 9, labor share = 0.75, steady-state government debt / output = 2.4 (60% of annual GDP), AR(1) for government spending rho_g = 0.5. Deviations from Gali include: government spending share of output set at g_bar/y_bar = 0.2 (consistent with advanced economy averages), steady-state investment share i_bar/y_bar = 0.2, and a monetary base equal to 1/3 of quarterly output (as in Gali) now split into non-interest-paying money (10% of quarterly output) and interest-paying reserves (1.63 times currency). For financial intermediaries: average banker tenure 24 quarters (sigma = 0.9583), adjusted leverage ratio 5, steady-state spread on corporate securities and bonds over deposits = 25 quarterly basis points (100 annual basis points), implying lambda_b = lambda_k. Capital adjustment cost gamma_k = 2.5.

Q13. How does the paper relate to Wallace (1981) and when does the neutrality argument break down?

Wallace (1981) first showed that open-market operations are neutral in complete-markets models where all investors can purchase any asset at market prices without binding constraints. Woodford (2012) distills the key conditions: assets are valued only for pecuniary returns, and all investors face the same market prices with no binding position constraints. Van der Kwaak’s irrelevance extends the Wallace neutrality to incomplete markets with binding leverage constraints on bond holdings, which go beyond Woodford’s conditions. The neutrality breaks only when the binding constraint links together multiple asset classes — specifically when the same constraint covers both government bonds and corporate securities, creating a direct transmission from bond prices to the cost of capital.

Q14. How does the paper relate to Reis and Tenreyro (2022) on helicopter money?

Reis and Tenreyro (2022) study helicopter drops — direct transfers of newly created central bank liabilities to households — and derive an irrelevance result that applies only when bond and reserve interest rates are equal (perfect substitutes). Van der Kwaak’s irrelevance extends to the case where the return on bonds exceeds that on reserves (binding incentive-compatibility constraint). A second difference is that Reis-Tenreyro focus on helicopter money (a liability-side transfer), while van der Kwaak models money financing as permanent QE (an asset-side expansion). Third, van der Kwaak also studies money-financed government spending stimuli, which Reis-Tenreyro do not.

Q15. What are the implications for policy proposals to use monetary financing in high-debt environments?

The core message for policy is nuanced. On the fiscal side, monetary financing does achieve its main stated goal: it prevents private-sector-held government debt from rising, since the additional bonds are absorbed by the central bank. On the stimulus effectiveness side, however, money financing has no macroeconomic advantage over debt financing in the baseline model (and in most extensions). The one setting where there is an advantage — intermediaries holding both bonds and corporate securities — yields only a modest multiplier boost of 0.26 relative to debt financing, compared to the 0.50 suggested by Gali (2020a). This smaller number reflects the fundamental institutional difference: with interest-on-reserves, the policy rate stays fixed under money financing, eliminating the consumption-expansion channel. The paper also implies there is no inflationary danger from money financing in this setup — the irrelevance result holds for inflation as well as real variables — directly contradicting fears that monetary financing inherently produces high inflation.

Q16. What happens to inflation under money financing compared to debt financing in the analytical result?

The extended Ricardian equivalence result covers inflation explicitly: the path of inflation is identical under money financing and debt financing in all the analytical baseline cases. This is because inflation is pinned down by the New Keynesian Phillips curve and the Taylor rule, neither of which depends on the financing mix. The central bank retains full control of the policy rate under money financing (because it pays interest on reserves), so the Taylor rule continues to govern inflation dynamics. This directly contradicts the claim that monetary financing is inherently inflationary; in the model, it is neither inflationary nor expansionary relative to debt financing.

Key Concepts

Extended Ricardian equivalence: The author’s label for the proposition that the consolidated government’s funding mix among money, reserves, government bonds, and lump-sum taxes has zero effect on both inflation and the equilibrium allocation in the real economy. It extends Barro (1974)’s original Ricardian equivalence (which covered only debt vs. taxes) to include the monetary base, and holds even when bonds and reserves are not perfect substitutes due to binding intermediary leverage constraints.

Money-financed fiscal stimulus: In this paper’s modeling: a fiscal stimulus (tax cut or spending increase) in which the additional government bonds issued to fund it are acquired by the central bank and permanently retained on its balance sheet in nominal terms. This is equivalent to a permanent expansion of the monetary base equal to the size of the stimulus, and is distinct from helicopter drops (which involve direct transfers rather than bond purchases).

Incentive-compatibility constraint (binding case): A Gertler-Kiyotaki (2010) / Gertler-Karadi (2011) constraint limiting financial intermediaries’ bond holdings relative to net worth: chi_t * n_t = lambda_b * q^b_t * s^{b,f}_t when binding. When binding, it creates a spread between bond and reserve returns, meaning bonds and reserves are not perfect substitutes. The paper’s irrelevance result holds whether or not this constraint binds, which is the nontrivial analytical contribution.

Interest-paying reserves (interest on reserves): Central bank liabilities that pay a nominal interest rate set by the central bank, distinct from non-interest-paying currency (‘outside money’). The paper argues this is the empirically relevant form of modern monetary base expansion: post-GFC balance-sheet growth by the Fed and ECB was almost entirely in interest-paying reserves. Paying interest on reserves allows the central bank to simultaneously control the policy rate and the size of its balance sheet, which is the feature that drives the irrelevance result.

Cumulative (discounted) fiscal multiplier: As computed in the paper following Gali (2020a): the ratio of the sum of output deviations from steady state over 1,000 quarters to the sum of the fiscal instrument deviations over the same horizon. The relevant multiplier here is the difference between money- and debt-financed versions: 0.26 in the extension with corporate securities held by intermediaries, compared to 0.50 in Gali (2020a).

Two-tiered reserve system: The ECB framework (in operation since July 2023) under which intermediaries must hold minimum reserves equal to a fixed fraction of deposits (currently 1%) at zero interest, while excess reserves earn the policy rate. The paper proves (Proposition 3) that extended Ricardian equivalence carries over to this system: the nominal deposit rate becomes (1-theta)*policy rate, but since the policy rate remains the sole endogenous variable determining the deposit rate, the irrelevance result is unaffected.

Source-text-origin note: The working paper title reads ‘Monetary financing does not produce miraculous fiscal multipliers’; the published EJ title adds ’neither high inflation nor’ — the summary uses the published title as given in the task, which also reflects the paper’s second finding (no inflationary effect).

Nonlinear Monetary Policy Tradeoffs

Thu, 01 Jan 2026 00:00:00 +0000

Layer 1: Overview

This paper measures how the inflation-unemployment tradeoff associated with monetary policy varies with both the sign of the monetary intervention (easing versus tightening) and the state of the business cycle (booms versus recessions) for the US economy over 1973:M1 to 2019:M6. The motivation is that standard linear Phillips-curve estimates implicitly impose a constant tradeoff, yet a flat Phillips curve would simultaneously predict that (i) stimulating activity during a recession costs nothing in terms of inflation and (ii) reducing inflation costs very large amounts of unemployment — both empirically extreme predictions that have very different policy implications. The paper challenges both extremes.

The empirical strategy extends the Proxy-SVAR approach of Mertens-Ravn (2013) and Stock-Watson (2018) to a nonlinear setting. The economy is described by a Vector Moving Average augmented with nonlinear functions of the monetary policy shock — specifically its absolute value (capturing sign dependence) and its interaction with a recession indicator (capturing state dependence). Under a finite-order VARX representation assumption and a linear monetary policy rule assumption, the paper proves (Proposition 1) that even though the underlying VARX is nonlinear, the monetary shock can be recovered as the projection of an external instrument onto residuals of a misspecified linear VAR. Once the shock is recovered, it and its nonlinear functions are used as regressors in a VARX to estimate nonlinear impulse responses. The instrument is the Degasperi-Ricco (2022) extension of Miranda-Agrippino and Ricco (2021), with a baseline span of 1991:M1-2015:M12 extrapolated to the full sample. The VAR contains five variables: the 1-year Treasury bond rate, industrial production growth, the Gilchrist-Zakrajsek excess bond premium, the unemployment rate, and CPI inflation, estimated with 7 lags. The recession indicator equals 1 when average GDP growth over the previous 12 months is negative.

The monetary policy tradeoff is defined analogously to the fiscal multiplier: the ratio of the cumulative average impulse response of inflation (unemployment) to the cumulative average impulse response of unemployment (inflation) over horizons H. In a nonlinear setting the easing tradeoff and tightening tradeoff are no longer inverses of one another and must be treated separately.

The main quantitative findings are as follows. For monetary easing during recessions, the inflation cost of reducing unemployment is small and statistically insignificant: point estimates of T+ range from -0.03 to -0.17 (in absolute value) across horizons H = 12 to H = 48 months, with 68% confidence intervals spanning from approximately -5.3 to +2.8 at H = 12 and -3.4 to +2.7 at H = 48. For monetary tightening during booms, the unemployment cost of reducing inflation is moderate and statistically significant: T- estimates range from -0.51 to -0.61 across H = 12 to H = 48, with 68% confidence intervals entirely below zero (e.g., -1.10 to -0.26 at H = 12 and -1.23 to -0.24 at H = 48). In other words, reducing inflation by 1 percentage point during a boom requires raising unemployment by roughly 0.5 to 0.6 percentage points. These results are qualitatively robust to excluding the post-2008 zero-lower-bound period (pre-2009 subsample) and to alternative specifications. By contrast, monetary tightening during recessions implies a very large and unfavorable tradeoff. Easing during booms is extremely inflationary with virtually no real effect.

A Likelihood Ratio test for the null hypothesis that all nonlinear terms are zero is rejected at the 1% level, confirming the statistical importance of nonlinearities. The null hypothesis of shock invertibility (Assumption A4) is not rejected at the 5% level across all combinations of VAR lags and residual leads tested.

A simple model with downward nominal wage rigidities — in which the wage floor introduces a kink in the aggregate supply curve — provides a theoretical rationale for the sign- and state-dependent tradeoff: an expansionary shock in a full-employment economy raises inflation with no output effect (the economy sits on the vertical AS segment), while a contractionary shock makes the wage rigidity binding and reduces output with no price effect (the horizontal AS segment). Monte Carlo validation using artificial data generated by the calibrated DSGE model shows that the proposed empirical procedure recovers the theoretical nonlinear impulse responses very accurately.

In depth

Q1. What is the identification strategy and what are the main assumptions required?

Identification proceeds in two steps. First, the monetary shock is recovered by projecting an external instrument (Degasperi-Ricco 2022) onto the residuals of a standard linear VAR — this is justified by Proposition 1, which shows that even though the VAR is misspecified (it omits the nonlinear terms), the shock can still be recovered as a linear combination of VAR residuals under four assumptions: (A0) a structural VMA representation in which the shock is orthogonal to past observables and to the remaining structural shocks at all leads and lags; (A1) a finite-order VARX representation; (A2) invertibility of the Wold representation; (A3) a valid instrument (relevance and exogeneity); and (A4) informational sufficiency, meaning the monetary shock can be expressed as a linear combination of current and past observables — a condition implied by a linear monetary policy rule. Second, once the estimated shock and its nonlinear functions (absolute value and interaction with the state dummy) are in hand, they are used as exogenous regressors in a VARX to estimate nonlinear impulse response functions.

Q2. What are the main threats to identification?

Three main threats are acknowledged. (1) Instrument validity: if the instrument (Degasperi-Ricco 2022) is weak or contaminated by information shocks, the first-stage projection may recover a mislabeled shock. The authors note the first-stage F-statistic is adequate per Miranda-Agrippino and Ricco (2021) but acknowledge that the weak-instrument problem in the nonlinear context is non-trivial and left for future research. (2) Assumption A4 (informational sufficiency): if the central bank follows a nonlinear rule or the VAR variables are not sufficient to recover the shock, identification fails. The authors test this using the Forni-Gambetti-Ricco (2023) invertibility test — regressing the instrument on current and future VAR residuals and checking whether future residuals matter — and fail to reject invertibility at 5% across all lag/lead combinations. (3) Model misspecification in the nonlinear VARX: the VARX approximation may not capture all relevant nonlinearities generated by the true DSGE. The Monte Carlo validation on artificial DSGE data provides reassurance that the approach recovers the true nonlinear responses accurately.

Q3. How does the paper distinguish sign dependence from state dependence?

The paper includes two nonlinear terms as regressors in the VARX: the absolute value of the shock |u_t^r|, which captures sign-dependent effects (i.e., whether a tightening and an easing of equal magnitude have asymmetric effects), and the product s_{t-1} * u_t^r, which captures state-dependent effects (i.e., whether the same-sign shock has different effects depending on whether the economy was in a recession before the shock arrived). The two components are estimated simultaneously, allowing their separate contributions to be read off impulse responses in Figure 3. Robustness checks in the Online Appendix report models estimated with only sign dependence and only state dependence in isolation, with results described as qualitatively similar to Barnichon-Matthes (2018) and Tenreyro-Thwaites (2016), respectively.

Q4. What are the key quantitative results on impulse responses?

In the full nonlinear model, monetary tightening generates large and significant effects on real variables (unemployment, industrial production) regardless of the state, while monetary easing has more muted real effects. For prices, sign and state components operate in opposite directions: the largest inflation responses are associated with tightening during expansions. Numerically, the tradeoff estimates from Table 2 show: (a) easing during recessions — T+ point estimates of -0.03 at H=12, -0.12 at H=24, -0.17 at H=36, -0.17 at H=48 months (all statistically insignificant at 68%); (b) tightening during booms — T- point estimates of -0.51 at H=12, -0.61 at H=24, -0.59 at H=36, -0.53 at H=48 months (all statistically significant at 68%). For the pre-2009 subsample (excluding the ZLB period), tightening-in-booms estimates are somewhat larger in absolute value (-0.63 to -0.70) but confidence intervals widen to include zero at longer horizons.

Q5. What is the key implication for ‘pushing on a string’ results in the prior literature?

Tenreyro-Thwaites (2016) and Barnichon-Matthes (2018) document that monetary easing is less effective at stimulating real activity, especially during recessions — an apparent ‘pushing on a string’ result. The current paper accepts that the real effect of easing in recessions is muted, but adds a crucial dimension: price responses are also muted in the same circumstances, so the inflation-unemployment tradeoff is actually favorable even when the absolute size of real effects is small. The policy implication is that central banks can still usefully deploy monetary easing during recessions as long as interventions are sufficiently aggressive to achieve the desired stimulus, since the inflationary cost of doing so is low.

Q6. How does this paper measure the tradeoff differently from Phillips-curve regressions?

The tradeoff is defined as the ratio of the cumulative average impulse response of inflation to the cumulative average impulse response of unemployment (or vice versa) in response to an identified monetary shock, analogous to a fiscal multiplier. This approach avoids three problems that plague standard Phillips-curve estimates: (i) it does not require specifying a structural Phillips-curve equation, reducing misspecification risk; (ii) it does not require data on inflation expectations or the natural rate of unemployment, which are unobserved and introduce measurement error; (iii) identification comes from exogenous monetary shocks rather than OLS variation in unemployment, so the endogeneity problem is avoided.

Q7. What theoretical mechanism rationalizes the nonlinear tradeoffs?

A simple New-Keynesian-style model with downward nominal wage rigidities (Wt >= theta * W_{t-1}) generates a kink in the aggregate supply curve. When the economy operates at full employment and inflation is non-negative, an expansionary monetary shock stimulates demand but the wage rigidity is non-binding, so the economy sits on the vertical segment of the AS curve: output cannot exceed its natural level, and the only effect is higher inflation. By contrast, a contractionary shock makes the wage rigidity binding, pushing the economy onto the flat segment of the AS curve: firms cut employment rather than nominal wages, so output falls but prices are unaffected. More generally, averaging over periods of full employment and periods of involuntary unemployment, tightening has larger real effects and weaker price effects than easing — matching the empirical pattern — because a contractionary shock keeps the economy below full employment for a longer time.

Q8. What robustness checks are conducted?

Three main robustness checks are reported in the main text, each presented with impulse-response figures (Figures 6, 7, 8): (1) replacing the authors’ state dummy (based on 12-month average GDP growth) with NBER recession dates; (2) replacing the 1-year Treasury bond rate with the Federal Funds rate and with the 6-month Treasury Bill rate; (3) replacing the baseline Degasperi-Ricco instrument with the Jarocinski-Karadi (2020) instrument both raw and cleaned (regressed on six lags of VAR variables). In all cases, the qualitative result — tightening in booms produces larger real effects than easing in recessions, while price responses are more muted in recessions — is preserved, and the tradeoff pattern remains favourable for easing in recessions and tightening in booms. The Online Appendix additionally reports results using: the unemployment rate as the state variable (instead of industrial production); the VAR extended with the 10-year Treasury Bill rate and M2 monetary aggregate; models with only sign dependence; models with only state dependence; and an alternative estimation using the instrument directly in place of the estimated shock (which yields implausible results, validating the two-stage procedure).

Q9. What does the Monte Carlo validation using the DSGE model establish?

The paper generates 1000 artificial realizations from a calibrated downward-nominal-wage-rigidity DSGE model (beta=0.99, sigma=1, theta=1, phi_pi=1.5, rho_m=0.5, sigma_r=0.25%, sigma_a=0.45%, solved by nonlinear global projection using Chebyshev polynomials). It then applies the nonlinear Proxy-SVAR procedure to each artificial dataset and compares average estimated impulse responses with average true (model-generated) generalized impulse responses. The two are described as ‘very similar’ (Figure 10), demonstrating that the empirical nonlinear VARX representation accurately approximates the nonlinearities of the DSGE even though the VARX is in principle misspecified relative to the true model. This validates both the econometric procedure and the interpretive link between the empirical findings and the theoretical mechanism.

Q10. Why does the paper estimate the shock from a misspecified linear VAR rather than the VARX directly?

The monetary shock is latent. Proposition 1 shows that, under the stated assumptions, the monetary shock equals (up to a scaling constant) the projection of the external instrument onto the VAR residuals of the linear VAR, even though the VAR omits the nonlinear terms. This is because the linear monetary policy rule implies the shock is a linear combination of current observables, and the VAR residuals span the same space. Using the instrument directly in the VARX instead of going through steps I and II introduces a non-proportional bias in the nonlinear case (unlike the linear case where the attenuation bias from measurement error in the instrument is proportional across units and corrects under normalization). The Online Appendix shows that bypassing the two-stage shock-estimation procedure yields implausible impulse response estimates.

Q11. What is the scope of the empirical findings and what caveats apply?

Three scope conditions are explicitly stated. (1) State uncertainty: the tradeoff varies significantly with the state of the economy, so if the central bank is uncertain about current economic conditions, interventions carry considerable risk — a disinflation during what turns out to be a weaker-than-anticipated economy could incur very large unemployment costs. (2) Historical average: estimates reflect the effects of average monetary interventions over 1973-2019 and may not generalize to unusually large, persistent, or unconventional policy actions. (3) Accompanying fiscal policy: the tradeoff could be influenced by fiscal policy measures that accompanied monetary interventions during the sample period. The sample also excludes the post-2019 inflation surge, so inference about that episode is not direct. The identification requires a valid external instrument, whose strength in the nonlinear context is an open question.

Q12. How does this paper relate to Barnichon-Mesters (2020, 2021) and Gali-Gambetti (2020)?

Barnichon-Mesters (2020, 2021) and Gali-Gambetti (2020) also exploit identified monetary shocks to estimate the conditional inflation-unemployment relationship (the ‘Phillips multiplier’) and to investigate whether the Phillips curve slope has changed over time. The main additional contribution of the present paper is to show that the relationship is not only time-varying but specifically sign- and state-dependent, driven by the direction of monetary intervention and the current phase of the business cycle. The sign- and state-dependent tradeoff framework provides a richer characterization that can explain why a flat aggregate Phillips curve is compatible with moderate costs of disinflation and low inflationary costs of stimulus — something a time-varying-slope model alone does not deliver.

Q13. What does the paper say about the implications for disinflation episodes like 2022-23?

The paper does not directly analyze the 2022-23 episode (the sample ends at 2019:M6 and the paper was written with November 2025 dating for the online appendix). However, the results imply that if the economy is in a boom when disinflation begins — as was broadly the case in 2022 — the unemployment cost of reducing inflation is moderate (roughly 0.5-0.6 percentage points of unemployment per percentage point of inflation at a 24-36 month horizon), substantially less than would be implied by a flat Phillips curve. The authors explicitly note that their results suggest central banks can pursue disinflation without necessarily incurring very large unemployment costs, subject to the caveats about state uncertainty and scale of the intervention.

Key Concepts

Monetary policy tradeoff: In this paper’s usage: the ratio of the cumulative average impulse response of inflation to the cumulative average impulse response of unemployment (for easing) or vice versa (for tightening), in response to an identified monetary shock, averaged over a horizon H. In a linear model easing and tightening tradeoffs are inverses; in the nonlinear model they must be estimated separately. The concept is deliberately defined without assuming a Phillips curve and without requiring inflation expectations or the natural rate.

Sign dependence: The property that a monetary easing and a monetary tightening of equal magnitude have asymmetric effects on inflation and unemployment, not just opposite-signed effects of the same absolute magnitude. Captured in the VARX by including the absolute value of the monetary shock as an exogenous regressor.

State dependence: The property that the effects of a monetary shock of given sign and magnitude differ depending on whether the economy was in a recession or a boom in the period before the shock arrived. Captured in the VARX by including the product of the recession indicator (s_{t-1}) and the monetary shock as an exogenous regressor.

Nonlinear Proxy-SVAR: The paper’s proposed econometric framework: a Vector Moving Average augmented with nonlinear functions of the monetary shock, which admits a VARX representation. Identification extends the standard Proxy-SVAR by showing — via Proposition 1 — that the latent monetary shock can be recovered from the residuals of a misspecified linear VAR, using an external instrument, under a linear monetary policy rule. The estimated shock and its nonlinear functions are then used as exogenous regressors to recover nonlinear impulse response functions.

Downward nominal wage rigidity: A labor market friction, modeled as the constraint W_t >= theta * W_{t-1}, that creates a kink in the aggregate supply curve. When the constraint binds (during downturns), firms respond to contractionary shocks by cutting employment rather than nominal wages, generating unemployment without deflation. When the constraint is non-binding (during expansions), expansionary shocks raise nominal wages and prices without affecting employment beyond full-employment output. In this paper the rigidity is the key mechanism generating a sign- and state-dependent monetary tradeoff.

Informational sufficiency (Assumption A4): The identifying assumption that the monetary policy shock can be expressed as a linear combination of current and past observable variables — equivalently, that the central bank follows a linear monetary policy rule. This allows the shock to be recovered from the residuals of a standard linear VAR even when the true model is nonlinear. Tested empirically via the Forni-Gambetti-Ricco (2023) invertibility test (checking whether the instrument Granger-causes future VAR residuals); not rejected at the 5% level in the authors’ data.

Generalized Impulse Response Function (GIRF): In this nonlinear context, defined as E(x_{t+h} | u_t^r = u-bar) - E(x_{t+h} | u_t^r = 0) for h = 0, 1, …, where u-bar is a given shock size. Unlike linear IRFs, GIRFs depend on the sign and magnitude of the shock and on the state of the economy, and are computed by summing the linear response alpha(L)*u-bar and the nonlinear response Phi(L)*g(u_t^r, …).

Optimal Fiscal Policy in a Climate-Economy Model with Heterogeneous Households

Thu, 01 Jan 2026 00:00:00 +0000

Layer 1: Overview

This paper asks whether inequality and redistributive taxation should make climate policy more or less ambitious, and how optimal carbon taxes interact with optimal income taxes when households differ in productivity, wealth, and energy demand. The motivation is twofold: equity considerations belong at the center of normative climate analysis, and the distributional consequences of environmental policies are increasingly recognized as critical for their political feasibility — as illustrated by the Yellow Vests episode in France. The paper extends Barrage (2020)’s representative-agent dynamic climate-Ramsey model to a heterogeneous-agent setting, using the Werning (2007) technique to characterize the Ramsey optimum in terms of aggregate variables. The government maximizes utilitarian social welfare choosing linear taxes on labor income, capital income, energy, and pollution plus a uniform lump-sum transfer. The climate module is calibrated to DICE 2016 (Nordhaus, 2017). Household heterogeneity is calibrated to US data: ten productivity groups from SCF 2013 hourly wages ranging from $6.44 (bottom decile) to $101.35 (top decile), yielding a model consumption Gini of 0.33, very close to the empirical value of 0.32 (Heathcote et al., 2010). Tax rates are set at effective US rates from Trabandt and Uhlig (2012): capital income tax of 41.1% and labor income tax of 25.5%. The model period is five years beginning in 2015, and the discount factor follows DICE at beta = 1/(1.015) per year, with inverse IES sigma = 1.45. The main quantitative exercise compares optimal policy to a climate-skeptic planner who sets carbon taxes to zero. Key findings: (i) Tax distortions have a negligible effect on the optimal carbon tax in the heterogeneous-agent setting. The second-best carbon tax is initially only 0.5% below the social cost of carbon (SCC) and subsequently fluctuates within about 0.2% above or below it — in sharp contrast to Barrage (2020), who finds tax distortions reduce optimal carbon taxes by 8% in the representative-agent setting. The key mechanism is that, with heterogeneous agents, the government optimally levies distortionary taxes for redistributive purposes (not merely to finance public spending), so the marginal cost of public funds (MCF) averages to 1 over time and its temporal deviations are quantitatively trivial. (ii) Income inequality only slightly reduces the optimal carbon tax: residual consumption inequality after optimal income-tax redistribution lowers the SCC by 3.9% in the baseline. The mechanism is that inequality raises the average marginal utility of consumption (because the marginal utility function is convex), increasing the opportunity cost of abatement; this effect dominates when IES < 1 (sigma > 1 in the calibration). (iii) The optimal carbon tax path starts at $21.7/tCO2 in 2020 and reaches $229.2/tCO2 one century later — levels consistent with Barrage (2020) and Nordhaus (2017/2018) but insufficient to achieve the Paris +2°C target under baseline damages. (iv) Comparing optimal policy to the climate-skeptic baseline, the additional carbon tax revenue is split nearly equally: the present value of labor taxes falls by 0.7% of GDP, while transfers rise by 0.8% of GDP. This violates the weak double-dividend hypothesis, which prescribes using carbon tax revenue exclusively to cut distortionary taxes. (v) The optimal policy has progressive welfare effects in the 21st century, because increased tax progressivity benefits lower-income households. The average discounted welfare gain is 5.8% of consumption under baseline damages. In the long run, gains become regressive because richer households (with IES < 1) are willing to pay proportionally more in consumption to avoid temperature increases. By contrast, a representative-agent double-dividend policy — using all carbon revenue to cut labor taxes — is regressive from the outset, with low-income households bearing a net cost even in the short run. The 3.9% inequality effect on the SCC is robust to changes in fiscal pressure and damage calibration but is sensitive to sigma: with sigma = 2, inequality reduces optimal carbon taxes by 16.2% rather than 3.9%. Extensions with wealth heterogeneity, heterogeneous energy demand (calibrated to CEX), and heterogeneous environmental damage sensitivity confirm that the MCF remains negligible and the inequality effect on carbon taxes remains small in quantitative terms.

In depth

Q1. What is the core theoretical result on the optimal carbon tax and why does it differ from Barrage (2020)?

The optimal carbon tax is approximately Pigouvian — set equal to the social cost of carbon — because the MCF averages to 1 over time with balanced-growth preferences when households are heterogeneous and the government can optimize a uniform lump-sum transfer. In Barrage (2020)’s representative-agent model, the government cannot choose the level of lump-sum taxes or transfers because there is no redistribution motive, so distortionary taxes are the only way to finance public spending and the MCF exceeds 1, reducing optimal carbon taxes by 8%. With heterogeneous agents, the government optimally provides lump-sum transfers for redistribution, so the constraint on transfers is barely binding and the MCF is close to 1 even when the ability to adjust transfers is removed.

Q2. What is the mechanism by which inequality affects the optimal carbon tax, and what is the sign?

Inequality reduces the optimal carbon tax when IES < 1 (sigma > 1). The mechanism operates through the Pigouvian tax formula: pollution abatement reduces aggregate consumption, and the welfare cost of this reduction depends on the social marginal utility of consumption (Vc,t). With inequality, Vc,t is affected by two opposing forces. First, the average marginal utility of consumption is higher because of Jensen’s inequality (convex marginal utility function), increasing the opportunity cost of abatement and pushing the pollution tax down. Second, additional consumption goes disproportionately to richer households with lower marginal utilities, reducing Vc,t and pushing the tax up. When IES < 1, the first (higher average marginal utility) effect dominates, so inequality unambiguously reduces the SCC and hence the optimal pollution tax. When IES = 1, the two effects exactly cancel and inequality has no effect.

Q3. What is the MCF and why does it average to 1 in the heterogeneous-agent setting?

The MCF is defined as the ratio of the public (planner’s Lagrange multiplier on the resource constraint) to the private (aggregate welfare-weighted) marginal utility of consumption. It measures the social cost of transferring resources from the private to the public sector. The MCF averages to 1 because the first-order condition for the uniform lump-sum transfer implies that the sum of the Lagrange multipliers on agents’ implementability constraints is zero. With balanced-growth preferences, this implies the welfare-weighted average MCF equals 1 from period 0. The temporal covariance between type-specific shadow costs (theta_i) and the type-specific implementability term (I_{c,i,t}) averages to zero over time, so while the MCF can deviate temporarily from 1, it is 1 on average.

Q4. What is the double-dividend hypothesis and how does the paper’s optimal policy relate to it?

The weak double-dividend hypothesis holds that it is optimal to use carbon tax revenue exclusively to reduce distortionary taxes, yielding both environmental and efficiency dividends. The paper shows this does not hold with heterogeneous agents: at the optimum, the welfare gain from a marginal reduction in tax distortions equals the welfare loss from increased inequality, so the government splits carbon revenue between cutting distortionary taxes and increasing redistribution. In the baseline quantification, the split is roughly equal: present-value labor taxes fall by 0.7% of GDP and lump-sum transfers rise by 0.8% of GDP. By contrast, following the double-dividend prescription — using all carbon revenue to reduce labor taxes without raising transfers — generates a strongly regressive policy in which low-income households bear net welfare costs even in the short run.

Q5. What is the calibration strategy and how does the model match US inequality data?

The economic side is calibrated to the US, while the climate side uses DICE 2016. The discount factor follows DICE (beta = 1/(1.015) per year), and sigma = 1.45 (IES = 1/1.45). Household productivity is calibrated using SCF 2013 hourly wage deciles, yielding ten equal-sized groups with hourly wages from $6.44 (bottom) to $101.35 (top), normalized so that the productivity-weighted average is 1. Although productivity inequality is directly targeted rather than moments of the consumption distribution, the model correctly predicts the consumption Gini of 0.33, close to the empirical 0.32 (Heathcote et al., 2010). Capital and labor income tax rates are from Trabandt and Uhlig (2012): 41.1% and 25.5% respectively. Government debt-to-GDP is approximately 111% (average 2011-2015, IMF). The Frisch elasticity of labor supply is targeted at 0.75 (Chetty et al., 2011). Production in both sectors is Cobb-Douglas with energy share nu = 0.04 from Golosov et al. (2014).

Q6. What happens to optimal income taxes in the model?

The optimal labor income tax roughly doubles from its calibrated level of 25% to about 50% in the first period and stabilizes there. Revenue from these taxes is rebated via the uniform lump-sum transfer, achieving most of the desired redistribution. Because optimal labor income taxes are approximately constant over time, the associated intertemporal distortions are small, and the optimal capital income tax converges to zero quickly after the second period. The mechanism is that, with access to lump-sum transfers, the only reason to tax capital income is to mitigate intertemporal distortions created by labor income taxation; when labor taxes are roughly constant, this motive is weak.

Q7. What does the sensitivity analysis reveal about the robustness of the 3.9% inequality effect?

The effect of inequality on optimal carbon taxes is robust along several dimensions but sensitive to sigma. Under the high-damage scenario (cubic rather than quadratic damage function, yielding an SCC about four times larger), the inequality effect falls to 2.6% rather than 3.9%, because higher carbon taxes reduce warming and thus the share of utility (rather than production) damages. The effect is roughly proportional to the degree of productivity inequality: half the inequality implies about half the effect on the carbon tax. The effect changes more than proportionally with sigma: with sigma = 2 (IES = 0.5), inequality reduces carbon taxes by 16.2%, versus 3.9% with the DICE value of sigma = 1.45. With sigma = 1, the effect is exactly zero. Government expenditure levels and fiscal pressure have negligible effects on the results. The share of damages entering utility directly matters: if only 10% of damages affect utility directly (versus the baseline 26%), the inequality effect falls to 1.8%; if 40% affect utility directly, it rises to 5.2%.

Q8. What is the role of initial wealth inequality?

Initial wealth inequality (studied in Section 6.1) creates an additional motive for deviating from Pigouvian taxation in period 0 only. Because the planner cannot use the period-0 capital tax to expropriate initial wealth (it is fixed at 41.1%), higher damages would reduce interest rates and thereby partially mitigate wealth inequality (a subtle indirect redistribution mechanism), calling for lower pollution taxes in period 0. Quantitatively, this produces a significant reduction in the initial-period optimal carbon tax. However, from period 1 onward, the optimal tax rules are unaffected by initial wealth heterogeneity, and the effects of MCF and income inequality remain very similar to the baseline. Welfare gains from carbon taxation in the wealth-heterogeneity extension are U-shaped with income but strictly increasing in initial wealth.

Q9. How does energy-demand heterogeneity (Stone-Geary extension) affect the results?

The extension introduces a second dirty consumption good with Stone-Geary preferences, calibrated using CEX data to match the average energy expenditure share of 10.8% and the observed distribution of energy budget shares across and within income groups. Target emissions share from household energy consumption is 30%. The optimal pollution tax formula remains a modified Pigouvian rule (the MCF structure is unchanged), and the MCF effect remains negligible. The inequality effect on carbon taxes stays near 3.9%, rising marginally to 4.1% with identical energy necessity and 4.1% with heterogeneous energy necessity. Theoretically, the optimal excise tax on the energy good is zero when energy preferences are homogeneous; with heterogeneous necessity levels calibrated to the US, the optimal energy excise tax is quantitatively tiny: about -0.4% of energy prices (a small subsidy). The negative sign arises because within-income-group heterogeneity in energy needs means that energy-intensive households (who are valued more by the planner on average) can be partially targeted via a subsidy. Under the double-dividend scenario with energy inequality, regressive effects are magnified: the poorest, most energy-intensive households actually lose in welfare terms even accounting for long-run climate mitigation benefits.

Q10. What does the paper establish theoretically about heterogeneous environmental damages?

Proposition 6 (Section 6.3) shows that with additively separable environmental utility and a utilitarian planner, heterogeneous marginal utility damages from pollution have no effect on the optimal pollution tax: they enter the welfare criterion symmetrically and cancel in the aggregate. The pollution tax increases relative to the utilitarian benchmark only if the planner’s welfare weights are positively correlated with marginal utility damages — that is, if the planner cares relatively more about the households that are more exposed. A Rawlsian planner would set a higher pollution tax if and only if the least-well-off household is also more sensitive to environmental degradation.

Q11. What are third-best policy results when either income tax is fixed?

The paper analyzes policies where either the labor or capital income tax is fixed at its current calibrated level (studied in Appendix E, with results referenced in the main text). These constraints introduce an additional fiscal interaction effect on the optimal carbon tax — the carbon tax is pushed below its second-best Pigouvian level when the fixed tax is set at a sub-optimally low level, and above it when the fixed tax is sub-optimally high. The roles of the MCF and income inequality remain similar to the second-best baseline under these third-best constraints.

Q12. How does the paper relate to and differ from the double-dividend and pollution taxation literatures?

The paper builds on three earlier pillars. First, Pigou (1920) established first-best Pigouvian taxation. Second, a large literature (Sandmo, 1975; Bovenberg and de Mooij, 1994; Bovenberg and Goulder, 1996) showed that in representative-agent second-best settings the MCF exceeds 1 and optimal pollution taxes fall below the Pigouvian level. Barrage (2020) is the closest dynamic general-equilibrium predecessor, finding the 8% reduction from tax distortions. Third, Jacobs and de Mooij (2015) and Jacobs and van der Ploeg (2019) showed in static models with heterogeneous agents and a uniform lump-sum transfer that the MCF equals 1. This paper extends this insight to a fully dynamic climate-economy framework with general equilibrium and a rich model of household heterogeneity. The key innovation relative to Barrage (2020) is agent heterogeneity, which both provides microfoundations for distortionary taxation and significantly changes the quantitative implications for optimal carbon taxes. Relative to Jacobs and de Mooij (2015), the contribution is the dynamic setting, the linkage to the DICE climate module, and the full quantitative characterization including distributional welfare analysis and multiple sources of heterogeneity.

Q13. What are the policy implications and their scope conditions?

The primary policy implication is that a carbon tax should be set approximately equal to the SCC (Pigouvian level) and the associated revenue should be split roughly equally between increasing lump-sum transfers and reducing distortionary labor taxes — rather than following the double-dividend prescription of using all revenue to reduce distortionary taxes. This combination is both more efficient (the MCF argument) and more equitable (progressive in the short run). The scope conditions are: (a) the result applies under a utilitarian welfare criterion with linear income taxes and a uniform lump-sum transfer; (b) it requires that the government can optimize the level of lump-sum transfers for redistribution; (c) the approximately Pigouvian result is quantitatively robust to alternative damage functions, fiscal pressure, and energy demand heterogeneity, but the degree to which inequality lowers the carbon tax depends sensitively on the IES/inequality aversion parameter sigma; (d) the calibration is designed to capture US conditions assuming that the US internalizes the full global impact of its emissions (strategic considerations are abstracted away); (e) heterogeneous environmental damage sensitivity does not affect the utilitarian optimum, but would increase the optimal carbon tax under a more inequality-averse social planner.

Key Concepts

Marginal Cost of Public Funds (MCF): The ratio of the public (planner’s shadow price on the resource constraint) to the private (aggregate welfare-weighted) marginal utility of consumption. In this paper, it captures the divergence between second-best and first-best pollution taxes due to fiscal distortions. With heterogeneous agents and an optimized uniform lump-sum transfer, the MCF averages to 1 over time under balanced-growth preferences, implying that tax distortions do not systematically push the carbon tax below the Pigouvian level — unlike in the representative-agent setting where the MCF exceeds 1.

Pigouvian tax (second-best): In this paper’s context, the Pigouvian tax refers to the pollution tax equal to the social cost of pollution (the discounted present value of marginal production and utility damages), evaluated at the second-best allocation rather than the first-best. When the MCF equals 1 (as it approximately does in the heterogeneous-agent setting), the second-best optimal pollution tax is equal to this second-best Pigouvian level, which may itself differ from the first-best Pigouvian level due to residual consumption inequality.

Social Cost of Carbon (SCC): The present discounted value of marginal climate damages (both production and utility losses) from emitting one additional ton of CO2, converted into consumption units using the social marginal utility of consumption. In the paper, the SCC corresponds to the case where the MCF is set to 1 in every period, and it is affected by consumption inequality through its effect on the social marginal utility of consumption. With sigma > 1, residual inequality raises the opportunity cost of abatement, reducing the SCC by 3.9% in the baseline calibration.

Double-dividend hypothesis (weak): The claim that it is optimal to use the entire proceeds of a carbon tax to reduce existing distortionary taxes, yielding both an environmental dividend (less pollution) and an efficiency dividend (lower tax distortions). The paper shows this does not hold with heterogeneous agents: because distortionary taxes serve a redistributive purpose, reducing them at the margin has a welfare cost (increased inequality), so the planner optimally splits revenue between tax reduction and increased transfers.

Ramsey problem (climate-economy): The government’s optimization problem in this paper: maximizing utilitarian social welfare over an infinite horizon by choosing paths for linear taxes on labor income, capital income, energy, and pollution, plus a uniform lump-sum transfer, subject to households’ optimality conditions (implementability constraints), resource constraints, climate dynamics from DICE, and abatement technology constraints. The approach extends Werning (2007) to a dynamic climate-economy context.

Implementability condition: The constraint in the Ramsey problem that captures each household’s lifetime budget constraint in terms of aggregate variables and market weights. It requires that the present value of a household’s consumption minus labor income equals its initial assets plus its share of the present value of lump-sum transfers, evaluated using the social marginal utilities implied by the planner’s choice of taxes. The shadow cost of this constraint for each household type (theta_i) determines the MCF through its covariance with a fiscal externality term.

Residual inequality: The level of inequality that remains after the planner has optimally set all income taxes and the lump-sum transfer — i.e., the inequality that cannot be eliminated because individualized lump-sum transfers are not feasible and only linear instruments are available. In the paper, it is this residual inequality (not total inequality) that affects the optimal carbon tax: the carbon tax responds to the inequality that income-tax policy cannot address, not to the underlying productivity or wealth dispersion per se.

Balanced-growth preferences: A preference specification of the form u(c, h, Z) = [c(1 - varsigma*h)^gamma]^(1-sigma)/(1-sigma) + u_hat(Z), with 1/sigma the intertemporal elasticity of substitution. This specification ensures that the economy admits a balanced growth path and plays a key role in the paper’s theoretical results: under balanced-growth preferences, the welfare-weighted average MCF equals 1 from period 0, and when IES = 1 (sigma = 1) the MCF is exactly 1 in every period.

Populism and the Skill-Content of Globalization

Thu, 01 Jan 2026 00:00:00 +0000

Layer 1: Overview

This paper investigates how the skill structure of globalization shocks — rather than globalization per se — drives the long-run evolution of populism across countries, making a unified empirical case that what gets imported or who immigrates matters as much as how much.

Research question and motivation. The literature has documented that trade exposure and immigration fuel populist voting, but prior work has studied these channels separately, used narrow time windows, and relied on binary party classifications that cannot capture shifts in populism across the full party landscape. Rodrik’s (2018) widely-cited hypothesis holds that trade shocks drive left-wing populism (as in Latin America) and immigration drives right-wing populism (as in Europe). The authors examine whether this hypothesis survives when skill content is explicitly disaggregated and both channels are studied jointly in a unified long-panel setting.

Data, sample, and empirical strategy. The authors construct a new continuous, time-varying populism score for 3,860 party-election pairs covering 1,206 unique parties across 628 national elections in 55 countries from 1960 to 2018. The score is built from the Manifesto Project Database (MPD) using two dimensions identified in the political-science literature: an anti-establishment stance (AES) and a commitment-to-protect stance (CTP). A two-stage polychoric PCA extracts synthetic indices for each dimension and then combines them into a single populism score. The paper defines populist parties as those scoring more than one standard deviation above the mean (threshold validated by comparison with four external databases — Van Kessel, Swank, PopuList, GPop 1 — with ratios of accurate forecasts ranging from 80 to 91 percent). Two dependent variables are studied: (i) the volume margin of populism, the vote share of classified populist parties, estimated with PPML given many zero observations (about 60 percent of the full sample); and (ii) the mean margin of populism, the vote-weighted average populism score of all parties, estimated with OLS. Globalization regressors are skill-specific: imports of low-skill and high-skill labor-intensive goods (as shares of GDP, sourced from Feenstra et al. 2005 and UN Comtrade) and immigration inflows of low-skill and high-skill workers (from Abel 2018, skill-level imputed from dyadic migrant-stock selection ratios). To address reverse causality — populist governments restrict trade and immigration, biasing OLS downward — the authors implement a gravity-based IV strategy: a zero-stage PPML regression predicts bilateral flows using time-invariant dyadic fixed effects interacted with a post-1990 dummy and origin-country-year fixed effects, then aggregates to the destination level; these predicted flows serve as instruments. For the volume margin, a reduced-form IV approach replaces actual with predicted flows (to avoid the incidental-parameter problem in PPML with fixed effects). For the mean margin, standard 2SLS is used; the Kleibergen-Paap F-statistic is around 10–12, reasonable given four instruments.

Main quantitative findings. (All claims below are with country and year fixed effects throughout; IV results reinforce baseline OLS/PPML results.)

Low-skill labor-intensive imports raise total and right-wing populism along both the volume margin and the mean margin. In the OLS mean-margin specification the coefficient on low-skill imports is approximately 4, implying a 1 percentage-point increase in the import-to-GDP ratio for low-skill goods is associated with a 0.04 increase in the mean margin of populism (scaled in standard deviations of the populism score). The 2SLS coefficient on the total mean margin is approximately 5.0 (significant at 5%), and on the right-wing mean margin approximately 4.1 (significant at 5%). For the volume margin, the reduced-form IV coefficient on low-skill imports is 0.91 (significant at 10%) for total and 1.82 (significant at 5%) for right-wing populism. These effects are larger by a factor of approximately 1.3 when IV is used relative to OLS/PPML, consistent with downward bias from reverse causality. Low-skill imports do not significantly affect left-wing populism in baseline estimates; a left-wing response cannot be ruled out during severe crises, when shocks are persistent, or among EU countries specifically.
High-skill labor-intensive imports reduce the volume of populism, especially right-wing populism. In the reduced-form IV specification the coefficient on high-skill imports is -1.22 (significant at 10%) for total volume and -2.14 (significant at 5%) for right-wing volume. The mean-margin effect of high-skill imports is insignificant.
Low-skill immigration induces a transfer of votes from left-wing to right-wing populist parties, leaving total volume and the mean margin unchanged. The baseline PPML coefficient on low-skill immigration is 1.52 (significant at 1%) for right-wing volume and -1.78 (significant at 1%) for left-wing volume. In the reduced-form IV the right-wing volume coefficient is 1.97 (significant at 1%) and the left-wing coefficient is -1.70 (significant at 10%). The mean margin of total populism is not significantly affected by low-skill immigration in any specification.
High-skill immigration reduces the volume of right-wing populism (PPML coefficient -1.32, significant at 1%; IV coefficient -2.02, significant at 5%) and generates a weak substitution toward left-wing populism in the baseline.
Descriptive findings: populism fluctuated since the 1960s, peaking after major economic crises (the oil shocks of the 1970s, deep crises of the 1990s, and after 2008). Right-wing populism reached an all-time high in the EU after 2005. The share of elections with at least one right-wing populist party rose from about 5 percent to more than 50 percent in EU member states over the study period.

Mechanisms. Decomposing the volume margin into extensive (number of populist parties) and intensive (average vote share per party) sub-margins reveals that: the trade channel operates primarily through the intensive margin (existing populist parties gaining more votes); the immigration channel operates through the extensive margin (new right-wing populist parties with moderate scores entering parliament). Low-skill trade and immigration never increase the populism score of parties that have never been classified as populist, indicating that globalization shifts the composition of the party system rather than radicalizing mainstream parties.

Amplifiers and heterogeneity. The right-wing populism response to low-skill imports is amplified during periods of de-industrialization and when internet coverage is high. Diversity in the origin mix of imported goods dampens the right-wing response. The populism response to low-skill immigration is not amplified by cultural distance between natives and immigrants; if anything, high cultural distance slightly reduces the centrist and left-wing populist responses. The effects on volume margin are primarily driven by EU28 countries.

Scope conditions and caveats. Analysis is at the country level; party-level repositioning dynamics are left for further research. The unified trade-plus-immigration framework is new, but the long panel setting, unbalanced sample, and aggregate data impose limits on identifying specific mechanisms. The finding that globalization does not affect never-populist parties’ scores limits concerns about contamination through party contagion in the short run. These results only partially confirm Rodrik’s (2018) hypothesis — left-wing populism is not robustly driven by trade shocks at the aggregate level, and trade’s effects are not confined to non-European contexts.

In depth

Q1. What is the identification strategy and what are the main threats to it?

The identification relies on a two-stage approach. In the first stage (zero-stage gravity model), the authors predict bilateral flows of low- and high-skill goods and migrants using (i) time-invariant dyadic fixed effects interacted with a post-1990 structural-break dummy and (ii) origin-country-year fixed effects capturing time-varying push factors at the source. Critically, destination-country-time characteristics are excluded from the zero-stage, so the predicted aggregated flows capture only supply-side variation and bilateral connectivity — not demand-side populism dynamics in the destination. These predicted flows are then used as instruments. For the mean margin, standard 2SLS is implemented; for the volume margin, a reduced-form IV approach replaces actual flows with predicted flows to avoid the incidental-parameter problem in a PPML model with many fixed effects. The main threats are: (1) correlated origin shocks — if a push shock in origin country j simultaneously triggers populism in destination i through channels other than trade/migration (e.g., financial contagion), the exclusion restriction is violated; the authors cannot fully rule this out but note that including year fixed effects absorbs common global shocks; (2) the post-1990 structural break is used as an additional source of variation for bilateral dyadic ties, but the Berlin Wall dummy simultaneously captures many unobserved structural changes; (3) imputation of the skill structure of migration flows from census-round selection ratios (1990, 2000, 2010) introduces measurement error, though the authors show robustness to using only the year-2000 ratio; (4) Kleibergen-Paap F-statistics are around 10–12 when all four endogenous variables are instrumented simultaneously, which is modest; the authors show values are substantially larger when instrumenting one or two variables at a time.

Q2. How are trade and immigration distinguished empirically, and how is the skill content measured?

Trade data come from Feenstra et al. (2005) for 1962–2000 and UN Comtrade for 2001–2015. Product categories at the SITC 3-digit level are classified by skill and technology intensity following the Trade and Development Report (2002), yielding five categories: primary commodities, labor-intensive/resource-based, and manufacturing with low-, medium-, and high-skill labor intensity. The baseline uses only the low-skill and high-skill manufacturing ends; medium-skill goods are tested in robustness (their inclusion causes collinearity that kills volume-margin significance while preserving mean-margin results). Migration data come from Abel (2018) — five-year bilateral migration flow estimates interpolated to annual frequency. The skill level of migration flows is imputed by applying census-round skill-selection ratios (ratio of college graduates in the dyadic migrant stock to the native pre-migration population, from the closest available census round of 1990, 2000, or 2010) to the interpolated flows. Both trade and immigration variables enter as percentages — imports as share of GDP, immigration as share of destination population — averaged over the election year and the preceding year.

Q3. What is the difference between the volume margin and the mean margin of populism, and why does it matter?

The volume margin is the aggregate vote share of parties classified as populist (using a binary threshold of one standard deviation above mean in the populism score); it equals zero in elections with no populist party (about 60 percent of observations). The mean margin is the vote-weighted average populism score of all parties — populist and non-populist alike — so it is always defined and continuous. The mean margin captures the average ideological ’exposure’ of voters to populist ideas in a given election, including the spillover of populist ideas into mainstream parties. The distinction matters because globalization can affect the political landscape through multiple channels: it may shift votes toward existing populist parties (intensive margin of the volume margin), it may encourage new populist parties to enter (extensive margin), or it may shift the policy positions of all parties toward more populist stances (captured by the mean margin). The paper finds that low-skill trade raises both margins, but through different mechanisms — the volume effect operates through the intensive margin while the mean-margin effect partly reflects score increases among centrist populist parties. Low-skill immigration raises only the volume margin (through extensive-margin changes, not the mean margin).

Q4. How is the populism score constructed, and how is it validated?

The score is built from the Manifesto Project Database, which counts quasi-sentences associated with specific political topics as shares of party manifestos. Six MPD variables are selected, grouped into two dimensions: anti-establishment stance (AES — political corruption mentions and anti-pluralism/political authority mentions) and commitment-to-protect stance (CTP — protectionism, internationalism, EU institutions, and nationalization). A polychoric PCA within each dimension extracts the first principal component (by Kaiser criterion — eigenvalues above one). The two synthetic indices are then combined into a single populism score by equal weighting. A party is classified as populist if its score exceeds one standard deviation above the mean. This threshold maximizes the partial correlation with three of four external databases and maximizes accurate-forecast rates across all four databases. Probit regressions of existing binary classifications (Van Kessel 2015, Swank 2018, PopuList 2019, GPop 1 2020) on the continuous score yield ratios of accurate forecasts between 80 and 91 percent. OLS correlations with continuous external measures (GPop 2 leader-speech scores, CHES expert survey) are positive and significant. Unsupervised k-means clustering on the (AES, CTP) space confirms that parties above the one-SD threshold cluster distinctly in a well-separated region of the two-dimensional space. Extended scores using more MPD variables do not improve fit, confirming parsimony.

Q5. What heterogeneity across left-wing and right-wing populism is documented?

The paper systematically decomposes results by political orientation (terciles of the RILE left-right index from MPD). Key heterogeneities: (1) Low-skill imports raise total and right-wing populism but not left-wing populism along the volume margin — this holds in baseline PPML and reduced-form IV. The mean-margin result is also concentrated in total and right-wing. (2) Low-skill immigration shifts votes from left-wing to right-wing populism (with opposing-sign PPML coefficients of 1.52 and -1.78, both significant at 1%), leaving total populism unchanged. High-skill immigration reverses this — it reduces right-wing and weakly increases left-wing populism. (3) High-skill imports reduce right-wing populism particularly (PPML -1.30, IV -2.14) and weakly shift votes toward left-wing populism. (4) Descriptively, the average populism score of right-wing populist parties increased since 2005 and reached 1.7 (2.1 standard deviations) in 2018, while left-wing populist parties’ average score declined to 1.4 (1.75 standard deviations) — for the first time since the 1960s, radical-right populism is more intense than radical-left. (5) The volume-margin effects of globalization are primarily driven by EU28 countries. Among non-EU countries or when Latin America is excluded, results are directionally preserved but sometimes less precisely estimated.

Q6. What robustness checks are run?

The authors conduct an extensive battery documented in Appendix D: (1) Lag structure — the globalization variables are redefined using flows at t, t-1, t-2, average of t and t-1 (baseline), and the sum between elections; results on immigration are robust across lags; trade significance holds except at very short (election year) or very long (between elections) windows. (2) Populism threshold — results are preserved at the lax (0.9 SD) threshold and mostly preserved at the strict (1.1 SD) threshold, though some become insignificant when well-known parties like Syriza, M5S, and La France Insoumise exit the classification. (3) Skill imputation for immigration — using only year-2000 selection ratios yields similar results; interactions with migrant-stock quartile dummies are mostly insignificant. (4) Skill content of imports — adding labor-intensive and medium-skill imports does not disturb the baseline; collinearity from medium-skill imports kills volume-margin trade significance. (5) Origin-country income level — positive populism responses are concentrated in flows from low-income countries on the volume margin, but the mean-margin positive response is more driven by North-North movements. (6) Sub-samples — results are not driven by post-1990 years alone (interaction with post-1990 dummy attenuates but does not eliminate effects), not by Latin American countries (exclusion leaves results unchanged), and not by the unbalanced panel structure (restricting to countries present since 1970 confirms results). (7) Turnout — globalization variables do not significantly predict turnout, and results are robust to controlling for turnout. (8) Electoral system — results hold when controlling for electoral system; proportional representation systems show a significant effect of low-skill imports on left-wing populism volume. (9) Exports and emigration — including skill-specific export and emigration flows does not substantially alter the main coefficients; export and emigration effects are less significant and robust than import and immigration effects. (10) Vote-share normalization — results are robust to normalizing vote shares to sum to 100 percent.

Autor, Dorn, Hanson, and Majlesi (2020) study the electoral consequences of the China trade shock in the US, documenting polarization effects concentrated in a specific trade shock and a narrow time frame. The present paper extends this by: (1) spanning 60 years and 55 countries (vs. US-focused short panels); (2) studying trade and immigration jointly in one specification; (3) using continuous populism scores rather than party platforms; (4) distinguishing left- vs. right-wing populism responses; (5) examining skill content rather than origin-country GDP growth. On immigration, Edo et al. (2019) and Moriconi et al. (2022, 2019) document that the skill structure of immigration matters for voting — high-skill immigration reduces far-right votes while low-skill immigration raises them. The present paper confirms these findings in a much larger multi-decade panel and adds the novel result that low-skill immigration does not affect total populism but merely shuffles votes between left-wing and right-wing populism. On Rodrik’s (2018) taxonomy, the paper only partially confirms his hypothesis: left-wing populism is not robustly driven by trade shocks in the cross-country aggregate (only under specific amplifying conditions), and trade’s effects are not confined to non-European settings. A key novelty vs. the entire prior literature is the simultaneous inclusion of skill-specific trade and immigration flows — no prior cross-country long-panel study had done this.

Q8. What are the policy implications and their scope conditions?

The skill-content result implies that globalization’s effect on populism depends critically on whether economic integration predominantly involves low-skill or high-skill goods and workers. Policies that shift the composition of globalization toward high-skill activities — skill-upgrading policies, investment in education and retraining, managed migration policies that attract high-skill workers — could mechanically reduce populist pressures. The finding that low-skill immigration transfers votes from left to right without increasing total populism has a nuanced implication: reducing low-skill immigration may primarily benefit left-wing parties at the expense of right-wing ones rather than reducing aggregate political instability. The amplification by de-industrialization and internet access suggests that the populist dividend of adverse trade shocks is largest precisely when affected regions are also losing manufacturing jobs and when social media spreads grievance discourse. The attenuation by diversity in imported goods suggests that more geographically diversified trade may reduce the cultural-threat salience of any single origin. Scope conditions: the volume-margin effects are largely driven by EU28 countries, so the quantitative magnitudes may not generalize to other institutional contexts with different electoral systems; the analysis is at the country level and abstracts from regional labor-market dynamics; party-level repositioning of mainstream parties is not modeled.

Q9. How does the paper handle the measurement challenge of comparing populism scores across countries and time?

This is a central methodological concern. The authors use party manifestos, which are available consistently across the 55 countries and the full 1960–2018 period in the Manifesto Project Database, allowing a principled content-based scoring without relying on expert surveys (which are available only for limited periods) or dichotomous external classifications (which are time-invariant in some datasets and country-limited in others). The two-stage PCA with polychoric principal components ensures that the dimensions are extracted from the structure of the data without imposing cardinal interpretations on ordinal quasi-sentence counts. The populism score has zero mean by construction with a standard deviation of 0.81, making cross-country and cross-time comparisons meaningful within the sample. The authors validate cross-country comparability by showing that the GPop 1 classification (which spans 1960–2018 for 36 countries) is well predicted by the score even though the score was not calibrated to that dataset specifically. An unsupervised clustering algorithm (k-means on the two dimensions) independently recovers the same set of parties as those above the one-SD threshold, without using any external label. The authors acknowledge that deliberate exclusion of immigration and multiculturalism variables from the score construction prevents mechanical correlation between the populism measure and the globalization regressors, which is an important design choice for the causal analysis.

Q10. What are the trends in the right-left decomposition of populism over the study period?

Descriptively (Section 3): the number of left-wing populist parties (as counted by the extensive margin) increased more than right-wing populist parties in the most recent period, partly because centrist parties are entering the populist bucket. However, the vote share gains (intensive margin) are dominated by right-wing populist parties. The share of elections with at least one left-wing populist party rose from about 15 to 30 percent globally over the study period. The share of elections with at least one right-wing populist party rose from about 5 to more than 50 percent in the EU and from about 10 to 25 percent in the rest of the world. The average populism score of right-wing populist parties increased since 2005, reaching 1.7 (about 2.1 standard deviations) in 2018, while the average score of left-wing populist parties declined to 1.4 (about 1.75 standard deviations). This means that for the first time since the 1960s, right-wing populist parties are on average more populist (by their own score) than left-wing populist parties. The gap between populist and non-populist parties’ average scores has widened since 2008, consistent with the within-country Theil inequality increase after the financial crisis.

Key Concepts

Volume margin of populism: The aggregate vote share obtained by parties classified as populist (those with a populism score exceeding one standard deviation above the mean). Estimated with PPML given the large share of zero observations (about 60 percent of the sample). Captures whether populist parties win more votes.

Mean margin of populism: The vote-weighted average populism score of all parties that obtained at least one seat in an election, regardless of whether they are classified as populist. Captures the average ideological ’exposure’ of voters to populist ideas, including spillovers into mainstream parties. Estimated with OLS.

Anti-establishment stance (AES): One of two dimensions underlying the paper’s populism score. Measured from Manifesto Project Database quasi-sentences on political corruption and anti-pluralism (political authority), capturing the core populist premise that the people are virtuous and the ruling class corrupt, leaving no room for pluralism or minority protection.

Commitment-to-protect stance (CTP): The second dimension underlying the populism score. Measured from Manifesto Project Database quasi-sentences on protectionism, internationalism, EU institutions, and nationalization, capturing populists’ claim to shield ’the people’ from external or alien economic and cultural threats.

Skill-content of globalization: The decomposition of import flows into goods intensive in low-skill vs. high-skill labor (using the SITC 3-digit classification from the Trade and Development Report 2002), and of immigration inflows into low-skill and high-skill workers (using dyadic skill-selection ratios from census rounds). The key empirical innovation of the paper: it is the skill content, not the size, of globalization flows that determines the direction and ideological valence of populist responses.

Gravity-based IV strategy: An instrumentation approach that predicts bilateral skill-specific flows of goods and migrants using a zero-stage PPML regression with time-invariant dyadic fixed effects (interacted with a post-1990 structural-break dummy) and origin-country-year fixed effects, then aggregates predicted flows to the destination level. Excludes destination-country-time characteristics to purge reverse causality (populist governments restricting trade and immigration) and omitted variable bias.

Extensive vs. intensive margin of the volume margin: The decomposition of the total vote share for populist parties into the number of populist parties running (extensive margin) and the average vote share per populist party (intensive margin). Low-skill imports primarily affect the intensive margin (existing populist parties gain more votes); low-skill immigration primarily affects the extensive margin (new right-wing populist parties enter parliament).

Vote-transfer mechanism of low-skill immigration: The paper’s finding that low-skill immigration reallocates votes between left-wing and right-wing populist parties without changing total populism. The authors interpret this as low-skill immigration enabling new right-wing populist parties with moderate populism scores to gain at least one seat in parliament (an extensive-margin effect), while simultaneously reducing the vote share and/or number of left-wing populist parties.

Taxation of Capital: Capital Levies and Commitment

Thu, 01 Jan 2026 00:00:00 +0000

Layer 1: Overview

Barro and Chari (2024) revisit the long-standing debate over optimal capital income taxation, unifying the Chamley-Judd zero-tax result, the Straub-Werning positive-tax amendment, and the Chari-Nicolini-Teles (2020) commitment-based framework into a single coherent analysis centered on the treatment of the “period-zero problem.”

The research question is fundamental: under what commitment assumptions is the optimal long-run tax rate on capital income zero, positive, or negative, and does optimal policy require special treatment of the initial period? The paper operates entirely within a deterministic neoclassical growth model with a representative household whose preferences are time-separable, separable between consumption and labor, and homothetic — the “standard preferences” of Chari et al. (2020). The government’s tax instruments are proportional consumption tax rates (τ_t^c), proportional asset-income tax rates (τ_t^k), and possibly a one-time proportional levy on initial assets (l_0 ≤ 1). No empirical estimation is performed; the contribution is analytical and quantitative through calibrated simulation.

The central theoretical finding is that the transitional dynamics of Chamley-Judd and the fully positive long-run capital taxes of Straub-Werning both derive from the same source: the period-zero Ramsey planner’s incentive to impose capital levies on assets that happen to exist at the start of the optimization. In Chamley et al., direct levies are precluded (l_0 = 0) and the capital-income tax rate is capped at 100%, so the planner engineers indirect levies via positive future τ_t^k (possibly forever, as Straub-Werning show) and time-varying consumption taxes. In the Chari-Nicolini-Teles (2020) formulation, the planner instead faces a constraint that household initial wealth in utility units (W_0) must meet a designated threshold (W̃_0). Under this constraint, the optimal policy features a one-time direct capital levy l_0 in period zero, zero asset-income taxes in all periods (τ_t^k = 0 for t ≥ 0), and a uniform consumption tax for all t ≥ 0. The level of l_0 and the consumption tax rate are jointly determined to satisfy the wealth constraint and the government budget.

The paper’s main contribution is extending the Chari et al. period-zero commitment to all periods, thereby achieving time-consistency and eliminating period zero’s special status. If each period-t policymaker faces a wealth constraint W_t ≥ W̃_t with W̃_t set high enough that the policymaker voluntarily chooses l_t = 0, the full sequence of policies is time-consistent and accords with Woodford’s (1999) “timeless perspective”: period zero is like any other period, capital-income tax rates are always zero, and consumption taxes are constant.

The appendix provides quantitative validation using a U.S.-calibrated model: government consumption = 20% of output, capital-income tax rate = 38% (initial steady state, from Barro-Furman 2018), public debt = 70% of output, labor-income tax rate = 26%, discount factor β = 0.97 (implying a 3% real interest rate), capital share α = 0.34, and depreciation δ = 0.08. Welfare gains from switching to the Ramsey policy (with the wealth-in-utility constraint set to the pre-reform steady-state value) are 0.82% of steady-state consumption under standard preferences, 0.76% under balanced-growth preferences, and 0.62% under zero-wealth-effect preferences. Under balanced-growth preferences, the capital stock rises monotonically to a new steady state approximately 12% higher, government debt rises about 6 percentage points, the labor-income tax rate stays essentially constant at approximately 30% (roughly 4 percentage points above the old steady state), and the capital-income tax rate is approximately 1% in the first period and then drops quickly to zero. Under zero-wealth-effect preferences, the initial capital-income tax rate is slightly higher at approximately 7% before dropping sharply. Under an extreme scenario with the initial capital stock at half its steady-state level and public debt at twice its normal ratio, the capital-income tax rate starts at approximately 3% and gradually approaches zero. In all three cases, constraining the capital-income tax rate to zero and holding the labor-income tax rate constant yields welfare indistinguishable from the unconstrained Ramsey optimum. The paper concludes that zero taxation of capital income is approximately optimal across all three preference specifications, and that the apparent necessity of positive long-run capital taxes in existing literature is an artifact of the period-zero commitment asymmetry.

In depth

Q1. What is the ‘period-zero problem’ and why is it central to the paper’s argument?

The period-zero problem refers to the asymmetry in the standard Ramsey formulation whereby the period-zero policymaker can commit to all future tax rates but is not bound by any commitments made in the past. Because assets already in existence at period zero are inelastically supplied ex post, the planner has a strong incentive to expropriate them via a capital levy — directly (l_0) or indirectly through high early tax rates on asset income or non-constant consumption tax rates. Chamley-Judd and Straub-Werning results, while superficially different, both arise from this same incentive. The Barro-Chari paper argues that period zero is in reality just an arbitrary starting point for analysis, not a date on which commitment ability uniquely materializes, and that correctly accounting for this eliminates the period-zero problem.

Q2. How does the Chari-Nicolini-Teles (2020) formulation differ from Chamley et al., and what does it imply?

Chamley et al. preclude direct capital levies (l_0 = 0) and cap τ_t^k ≤ 1, so the planner engineers indirect capital levies via positive future asset-income taxes and time-varying consumption taxes. Chari et al. (2020) instead constrain the household’s initial wealth in utility units (W_0) to be at least a designated threshold W̃_0, but leave all tax instruments unrestricted. Under this constraint, the optimal policy selects a one-time direct capital levy l_0, zero asset-income taxes forever, and uniform consumption taxes. The critical difference is that when l_0 = 0 is the outcome under the Chari et al. formulation, it is an optimizing response to a high W̃_0 rather than an arbitrary restriction, so there is no incentive for indirect levies.

Q3. How is time-consistency achieved, and what is the ’timeless perspective’?

Time-consistency fails if future policymakers are unconstrained because they will repeat the period-zero capital levy logic for their own ‘initial’ period. The paper shows that introducing a series of per-period wealth constraints — W_t ≥ W̃_t for all t ≥ 0, where W_t is period-t household wealth in utility units — achieves time-consistency if each W̃_t is set high enough that each policymaker voluntarily chooses l_t = 0. The required sequence of W̃_t corresponds exactly to the wealth path generated by the period-0 policymaker’s committed Ramsey plan. When this holds, the analysis conforms to Woodford’s (1999) ’timeless perspective’: each policymaker adopts the program that would have been committed to far in the past, period zero is not special, capital-income taxes are always zero, and consumption taxes are constant.

Q4. What role do restrictions on tax instruments play, and why does the paper prefer wealth constraints over direct instrument restrictions?

Direct instrument restrictions — such as banning capital levies (l_t = 0) or forcing τ_t^k = 0 and constant consumption taxes — are vulnerable to circumvention through other instruments. For example, time-varying labor-income tax rates (τ_t^n) introduce intertemporal wedges equivalent to indirect capital levies, so a prohibition on capital-income taxes can be undone by varying labor taxes. Constraints on household wealth in utility units (Eqs. 7 and 8) are robust to this vulnerability because any tax instrument that reduces household utility-unit wealth below the threshold violates the constraint, regardless of which specific instrument is used.

Q5. What is the ‘partial commitment’ interpretation of the per-period wealth constraints?

The paper offers two interpretations. The first is that the sequence of W̃_t was set at the founding of a country (e.g., 1789 for the United States). The more palatable ‘partial commitment’ interpretation is that each period-t policymaker specifies the wealth commitment W̃_{t+1} for the next policymaker, in exchange for adhering to the commitment W̃_t set by the preceding policymaker. This bilateral exchange generates the same sequence of wealth constraints that would have been set arbitrarily far into the past.

Q6. What happens in the stochastic extension of the model?

In a stochastic setting with fluctuations in government spending, technology, war and peace, etc. (as in Chari et al. 2020, proposition 3), choices of capital levies and tax rates become state-contingent rules, following the Lucas-Stokey (1983) framework. Non-zero direct capital levies are optimal under emergency conditions such as war, pandemic, or major financial crisis, and correspondingly below average during non-emergencies. Consumption and labor-income tax rates follow random-walk-like processes, analogous to the tax-rate smoothing predictions of Barro (1979, 1990) that apply when state-contingent capital levies are unavailable.

Q7. How is the COVID inflation episode interpreted within this framework?

The paper interprets the post-2020 rise in the U.S. price level through the fiscal theory of the price level (Cochrane 2023; Barro-Bianchi 2023; Bianchi-Faccini-Melosi 2023). The surge in ‘unfunded’ government spending during and after the COVID pandemic was financed by the inflation that eroded the real value of nominally-denominated government bonds. This constitutes a state-contingent capital levy on bondholders. A cautionary note is added: the availability of such a mechanism may encourage excessive spending, analogous to Ricardo’s (1820) argument for balanced-budget war finance.

Q8. What is the role of heterogeneity among households in potentially generating commitment?

The paper discusses two sources. First, drawing on Broner-Martin-Ventura (2010), if the government cares about domestic holders of its bonds but not foreign holders, and if bonds can be traded on secondary markets so the two groups cannot be separated, then default becomes unattractive ex post because it harms domestic residents. This gives the government an incentive to promote secondary markets as a commitment device against sovereign default — potentially extensible to capital taxation commitments. Second, the distinction between old and new capital (e.g., via investment tax credits) partially limits the attractiveness of high capital-income taxes by tying the tax rate on old capital to the rate on new capital, which creates investment disincentives. However, as Straub-Werning demonstrate, this commitment may be too weak to drive the optimal capital-income tax to zero.

Q9. What are the calibration targets and preference specifications used in the quantitative experiments?

The model is calibrated to represent the U.S. economy with: government consumption = 20% of output, capital-income tax rate = 38% (from Barro-Furman 2018), public debt = 70% of output, labor fraction of time endowment = 1/3, discount factor β = 0.97 (3% real interest rate), capital share α = 0.34, depreciation δ = 0.08. Three preference specifications are explored: (1) standard preferences (time-separable, separable, homothetic in c and n); (2) balanced-growth preferences with consumption-leisure Cobb-Douglas aggregator and IES = 0.5; (3) zero-wealth-effect preferences. The wealth constraint W̃_0 is set to match the pre-reform steady-state wealth in utility terms.

Q10. What are the detailed quantitative results across preference specifications?

Under standard preferences: capital-income tax rate is always exactly zero, labor-income tax rate is constant, welfare gain = 0.82% of steady-state consumption. Under balanced-growth preferences (IES = 0.5): initial capital-income tax ≈ 1%, quickly drops to zero; capital stock rises ≈ 12% to new SS; government debt rises ≈ 6 pp; labor-income tax ≈ 30% (constant, ≈ 4 pp above old SS of 26%); welfare gain = 0.76%; steady-state public debt under zero-capital-tax policy = 33% of output; initial capital levy l_0 = 0.126; new SS labor tax = 0.297. Under zero-wealth-effect preferences: initial capital-income tax ≈ 7%, drops sharply; welfare gain = 0.62%; l_0 = 0.160; new SS labor tax = 0.301; maximum capital tax rate = 0.070. Under extreme initial conditions (balanced-growth, capital stock at half SS level, debt at twice normal ratio): capital-income tax ≈ 3% initially, approaches zero; l_0 = 0.033; new SS labor tax = 0.400. Across all cases, constraining capital-income tax to zero with constant labor tax yields welfare nearly identical to the unconstrained Ramsey optimum.

Q11. What is the scope of the zero-capital-tax result and what preference conditions support it?

The zero-capital-tax result holds exactly under standard preferences (time-separable, separable between consumption and labor, and homothetic in consumption and labor), which satisfy the Diamond-Mirrlees-Sandmo-Sadka conditions for uniform taxation of goods. Under balanced-growth preferences, it holds with σ = 1 but not necessarily when σ ≠ 1. Under zero-wealth-effect preferences it does not hold if V is strictly concave. However, the quantitative experiments show that deviations from zero are small and short-lived under all three specifications, so zero capital taxation is approximately optimal across the board.

Q12. What is the relationship between the paper’s results and tax-rate smoothing models?

Barro (1979, 1990) showed that optimal income-tax rates follow a random walk when capital levies are unavailable. The present paper shows that, once state-contingent capital levies are available (the Lucas-Stokey stochastic extension), consumption and labor-income tax rates also exhibit random-walk-like behavior, as realizations of spending and technology shocks move the optimal tax rates. This provides a unified framework connecting capital levy theory and tax-rate smoothing.

Q13. What are the survival/institutional arguments for why commitment constraints might exist in practice?

The paper suggests a selection argument: societies that fail to maintain commitments of the form W_t ≥ W̃_t severely under-accumulate capital because anticipating capital levies causes households and firms not to invest, potentially causing the economy to effectively disappear. This selection pressure may explain why functioning market economies tend to develop institutions (constitutions, property rights, secondary markets) that approximate the required commitments. Major regime changes, such as the Bolshevik revolution (100% default on Czarist bonds), can destroy these commitments, but many regime changes (e.g., France after World War II) do not fully repudiate prior obligations.

Q14. How does this paper relate to and differ from the three main antecedents (Chamley-Judd, Straub-Werning, and Chari et al. 2020)?

Chamley (1986) and Judd (1985, 1999) showed zero long-run capital-income tax is optimal under the Ramsey formulation with l_0 = 0 and τ_t^k ≤ 1. Straub-Werning (2020) showed that positive capital-income taxes can be optimal even in the steady state under the same constraints when the IES is below one. Chari et al. (2020) replaced instrument restrictions with a utility-wealth constraint for period zero, obtaining a direct capital levy in period zero plus zero capital-income taxes thereafter. Barro-Chari extend Chari et al.’s period-zero constraint to all periods, achieving time-consistency and removing period zero’s special status. The novel contribution is the multi-period, time-consistent version of the Chari et al. framework and the quantitative demonstration that zero capital taxation is approximately optimal across preference specifications.

Key Concepts

Period-zero problem: The asymmetry in the standard Ramsey formulation in which the period-zero policymaker can commit to all future tax rates but faces no commitments from the past, creating a strong incentive to expropriate existing assets via capital levies (direct or indirect); the paper’s central target of critique.

Capital levy: A proportional confiscation of asset holdings (l_t), distinct from ongoing taxes on the flow of asset income; a direct capital levy takes a fraction of the stock outright, while indirect capital levies are engineered through high asset-income tax rates or time-varying consumption taxes that reduce the real value of existing wealth.

Wealth constraint in utility units (W_t ≥ W̃_t): A commitment device, following Chari-Nicolini-Teles (2020) and Armenter (2008), that requires each period’s policymaker to leave households with at least a threshold level of wealth measured in units of utility rather than goods; instrumental in eliminating the period-zero problem without directly restricting tax instruments.

Timeless perspective: Woodford’s (1999) principle that the policymaker should adopt the behavior that would have been committed to far in the past contingent on current events, rather than optimizing from the current period taking past expectations as given; the paper shows its Ramsey results conform to this principle once per-period wealth constraints are imposed.

Time-consistency (in optimal taxation): The property that a tax plan chosen at date 0 will be voluntarily continued by each subsequent policymaker; fails in the Chari et al. (2020) baseline formulation when future policymakers are unconstrained because each will want to re-impose a ‘period-zero’ capital levy, achieved here only when per-period wealth constraints W_t ≥ W̃_t are sufficient to deter direct levies.

Indirect capital levy: The engineering of a de facto reduction in the real value of existing wealth through policy instruments other than a direct asset levy — specifically positive tax rates on future asset income (τ_t^k > 0) or non-constant consumption tax rates that alter the present value of after-tax consumption; the mechanism underlying both Chamley-Judd transitional dynamics and Straub-Werning permanent positive capital taxes.

Standard preferences: Preferences that are time-separable, separable between consumption and labor, and homothetic in consumption and labor (Eq. 1 in the paper: u(c,n) = [c^{1-σ}/(1-σ)] − η·n^{1+Ψ}); the class under which uniform taxation of consumption at all dates and zero tax rates on asset income are exactly optimal, satisfying Diamond-Mirrlees-Sandmo-Sadka conditions.

State-contingent capital levy: In the stochastic extension (following Lucas-Stokey 1983), a capital levy whose magnitude depends on the realized state of the world (e.g., war, pandemic, financial crisis); optimal under emergencies when emergency government spending must be financed, and below average during normal times — the paper interprets post-2020 U.S. inflation as an implicit state-contingent levy on nominal government bonds via the fiscal theory of the price level.

The (In)effectiveness of Targeted Payroll Tax Reductions

Thu, 01 Jan 2026 00:00:00 +0000

Layer 1: Overview

This paper studies the cost-effectiveness of targeted payroll tax reductions as a tool for stimulating labor demand among marginalized workers, using a natural experiment from Italy. The motivation is policy-relevant: governments routinely deploy targeted payroll tax cuts to combat youth and low-skill unemployment, but such subsidies risk subsidizing inframarginal hiring — employment that would have occurred without the incentive — rather than creating net new jobs. Rigorous evaluation requires two features that are rarely satisfied simultaneously: (1) the subsidy must target genuinely marginalized workers so estimates pertain to the population of interest, and (2) variation in incentives across firms must be quasi-random so firm responses are causally identified. This paper exploits a policy that satisfies both.

The data are confidential matched employer-employee records from the Italian Social Security Institute (INPS), covering the universe of private non-agricultural firms with at least one employee from January 2003 to December 2009. The main analysis sample comprises 1,015,619 firms with policy-relevant firm size between 3 and 15 employees — the stratum containing the policy threshold. The study period spans 84 months.

The policy variation is the Italian 2007 Budget Bill (Law 296/2006), which raised employer social security contributions (SSCs) on apprenticeship contracts from a flat rate of 148 euros per year to 10 percent of annual earnings (approximately 1,200 euros per year for an average apprentice earning 12,000 euros). However, firms with at most 9 full-time-equivalent employees (excluding apprentices) received a graduated discount: 1.5 percent of earnings in the first year (180 euros) and 3 percent in the second year (360 euros). This generated a clean discontinuity in incentives at the 9-employee threshold. The discount is equivalent to roughly two months of earnings per apprentice, or about 8 percent of the cost of a typical 19-month apprenticeship.

The empirical strategy is a difference-in-discontinuities design. For each calendar month, the authors estimate a regression discontinuity specification comparing firms just above and just below the 9-employee threshold, then subtract the estimated baseline discontinuity from January 2006 (before the policy existed). This normalizes away pre-existing size-related differences in outcomes, yielding reduced-form estimates of how the policy-induced difference in SSC costs between small and large firms changed over time. The policy variation is used as an instrument for actual SSC payments to compute IV estimates of jobs supported per euro of foregone revenue.

The main finding is a precise zero: the SSC discount does not increase the number of apprenticeship contracts. The reduced-form estimates of the policy’s effect on apprentice hiring are not statistically different from zero and are tightly estimated. Firms below the threshold pay approximately 25 euros less per month in SSCs than firms above, confirming the policy has fiscal bite (first-stage F-statistic = 230), but this differential generates no detectable behavioral response in employment.

The policy also does not increase the rate at which apprentices are converted to permanent contracts (“transformations”). Firms do not adjust apprentice wages, do not substitute toward other contract types, do not churn through more apprentices, do not re-label existing contracts, and do not lower hiring standards for apprentices.

For cost-effectiveness, the IV estimates imply that each 1 million euros of foregone SSC revenue supports the employment of 29 apprentices for one year — a point estimate not statistically different from zero. The point estimate for supported permanent-contract transformations is negative (point estimate: -2), also indistinguishable from zero. By comparison, directly hiring apprentices at their prevailing wage of 1,050 euros per month would employ 79 apprentices per million euros, making direct hiring 2.7 times more cost-effective than the subsidy. The paper surveys the broader literature and finds that once existing studies’ employment effects are normalized against fiscal costs, targeted subsidies rarely appear cost-effective; hiring credits that require a new hire may outperform payroll tax cuts because they are harder to claim for inframarginal employment.

The underlying mechanism is inelastic labor demand for apprentices. Survey evidence from the RIL firm survey confirms that when firms do not hire apprentices, cost is rarely the stated reason — the most common answer is that they do not need more people. When firms do hire apprentices, the most common reason is to provide training before converting them to permanent employees, not to economize on labor costs.

In depth

Q1. What is the identification strategy and what are the main threats to it?

The identification strategy is a difference-in-discontinuities design. In each month, a regression discontinuity (RD) specification compares firms just above and just below the 9-employee SSC eligibility threshold; the authors then subtract the baseline (January 2006, pre-policy) discontinuity estimate to remove pre-existing size-related level differences. The key identifying assumption is a ‘weak parallel trends’ assumption: the curvature of the conditional expectation function of untreated potential outcomes at the threshold is time-invariant. Threats and the evidence against them: (1) Manipulation of firm size at the threshold — addressed by showing that the CDF of policy-relevant firm size is virtually identical across all 84 months with no bunching at 9 employees before or after the reform; (2) Pre-existing trends — no pre-trends are found in the estimated discontinuity in outcomes for the four years before January 2007; (3) Compositional shifts — covariate balance tests show that firm characteristics (age, type, industry, region) at the threshold do not change over time relative to baseline; the covariate index (predicted apprentice hiring based on time-invariant firm characteristics) fluctuates between -0.0005 and +0.0005 — nearly two orders of magnitude smaller than the employment estimates; (4) Imperfect compliance — handled explicitly: the design estimates an intention-to-treat effect, which is attenuated relative to the treatment on the treated; (5) Measurement error in running variable — addressed by excluding firms within one unit of the threshold in the preferred specification; null results are robust to varying the exclusion window.

Q2. Why is the difference-in-discontinuities design superior to a standard difference-in-differences design in this context?

The paper provides a formal and empirical case that standard difference-in-differences applied to a continuous firm-size running variable produces spurious results. When the conditional expectation function of outcomes with respect to firm size rotates over time (i.e., the slope changes), a DiD estimator that discretizes firms into treated and control groups will detect this rotation as a treatment effect, even if the true policy effect is zero. This is because the DiD constrains the slopes of the conditional expectation function above and below the threshold to be zero, making them implicit omitted variables. In the Italian data, the conditional expectation function of apprentice hiring with respect to firm size rotates clockwise between 2007 and 2009, coinciding with a general slowdown in hiring during the Great Recession. This rotation would cause a naive DiD analysis to conclude, spuriously, that the subsidy supported hiring. The difference-in-discontinuities design controls flexibly for the running variable in each period and isolates only the variation near the threshold, where firm size cannot proxy for trends unrelated to the policy.

Q3. What are the main mechanisms considered for why the subsidy has no employment effect, and how does the paper distinguish among them?

The paper considers and rules out seven alternative explanations before concluding that demand for apprentices is simply inelastic: (1) Measurement error — ruled out because the null holds across specifications with different exclusion windows, and measurement error does not prevent finding significant effects on fiscal outcomes; (2) Subsidy too small — ruled out because the 8% subsidy (960 euros per apprentice per year, up to 1,460 euros at the 95th percentile of earnings) is comparable in magnitude to subsidies that generate large employment effects in Cahuc et al. (2019) and Guo (2024); (3) Low awareness — ruled out because 80% of eligible firms that hire apprentices receive the discount, confirming they must claim it actively; (4) Firms restricting hiring to maintain eligibility — ruled out because apprentices are excluded from policy-relevant firm size, so hiring an apprentice does not risk crossing the threshold; the firm-size distribution also remains stable; (5) Temporary nature of subsidy — ruled out because most apprenticeships last 19 months and the subsidy covers the first two years; moreover, the literature suggests temporary subsidies should be at least as effective as permanent ones; (6) Training requirements — ruled out because training requirements are poorly enforced, and no effects are found even among firms that previously employed apprentices (lower marginal training costs) or firms that rarely cite training costs as a deterrent; (7) Great Recession — ruled out because no effects appear in the year before the recession began, and effects are not larger or smaller for liquidity-constrained firms.

Q4. What heterogeneity analyses are conducted and what do they show?

The authors estimate pooled post-reform difference-in-discontinuities coefficients separately across multiple dimensions and find consistently null effects with no evidence of heterogeneous treatment effects: (1) by industry — estimates across manufacturing, transportation and construction, trading, services, and other sectors are all tightly centered on zero; (2) by region — null across all Italian regions; (3) by baseline apprentice earnings quartile — null across Q1 through Q4 and for firms with no apprentices at baseline; (4) by contemporaneous apprentice earnings quartile — null; (5) by three measures of liquidity constraints (liquid assets to total assets, cash flow to total assets, revenues above/below median) — null in all six groups; and (6) by prior apprenticeship training status — null for both firms that employed at least one apprentice in 2006 and those that did not. The authors note the scope condition: estimates are internally valid for firms in a neighborhood of 9 employees, and effects for substantially larger firms cannot be ruled out to differ.

Q5. What robustness checks are conducted beyond the main heterogeneity analysis?

The main robustness checks are: (1) sensitivity of apprentice hiring effects to the amount of excluded data around the threshold (the ‘donut bandwidth’) — the null holds across all exclusion windows (Appendix Figure A.2); (2) placebo tests using the pre-reform periods (January 2003 through December 2006) — no pre-trends in the estimated discontinuity for any outcome; (3) covariate stability tests — the discontinuity in a covariate index predicting apprentice hiring from time-invariant firm characteristics shows no change over time, with point estimates between -0.0005 and +0.0005 versus employment estimates between -0.01 and +0.01; (4) comparison of results to a standard DiD specification — the DiD produces spurious positive effects driven by rotation of the conditional expectation function, while the difference-in-discontinuities estimate remains precisely zero; (5) examination of other outcomes (contract churn, re-labeling, worker quality, contract type substitution, temporary worker stocks) — all null.

Q6. How is cost-effectiveness formally measured and what does the IV estimate imply?

Cost-effectiveness is defined as the number of jobs supported per unit of foregone revenue: omega = E[L(1) - L(0)] / E[R(0) - R(1)], where L is employment and R is tax payments. Rather than back-of-the-envelope calculation, the authors estimate this with 2SLS, instrumenting for actual SSC payments with the interaction of being below the eligibility threshold and the post-2007 indicator. This allows them to compute standard errors, which back-of-the-envelope methods do not provide. The first-stage F-statistic is 230, confirming instrument strength. Point estimates from Table 4: 29 apprentice-years supported per 1 million euros of foregone SSC (standard error 58, not significant); 647,237 euros of apprentice compensation supported per 1 million euros (standard error 921,320, not significant); and -2 permanent-contract transformations per 1 million euros (standard error 21, not significant). For context, directly hiring apprentices at 1,050 euros per month would generate 79 apprentice-years per million euros — 2.7 times more than the point estimate from the subsidy.

Q7. How does the paper benchmark its cost-effectiveness estimates against the broader literature?

The authors normalize employment effects from nine other studies against their fiscal costs to produce a common metric of jobs or job-years per 1 million dollars of foregone revenue. The studies span payroll tax cuts (Egebark and Kaunitz 2013; Saez, Schoefer, and Seim 2021), hiring credits (Cahuc, Carcillo, and Le Barbanchon 2019; Neumark 2013), and fiscal stimulus programs (Bartik 2001; Bartik and Erickcek 2010; Dupor and Mehkari 2016; Dupor and McCrory 2018; Feyrer and Sacerdote 2011; Wilson 2012). The conclusion is that most wage subsidies, including those that generate positive reduced-form employment effects, produce very high costs per job. With two exceptions (Bartik 2001 and Cahuc et al. 2019), cost-effectiveness estimates across the literature are extremely low. The paper argues that hiring credits may be more cost-effective than payroll tax cuts because the requirement to make a new hire makes it harder to subsidize inframarginal employment. Importantly, the Italian study’s cost-effectiveness estimates — though imprecisely estimated — are broadly consistent with the cross-study pattern once fiscal costs are accounted for.

Q8. What are the welfare and public finance implications of the null employment effects?

Because the behavioral response is zero and the fiscal cost is non-zero, the policy functions as a pure transfer from the government to firms. The paper invokes the framework of Hendren and Sprung-Keyser (2020) to note that the marginal value of public funds is essentially 1 — there is no distortion introduced but also no welfare gain from resource reallocation. This interpretation cuts in two directions: (1) the pre-reform apprentice SSC subsidies (which were larger than the post-2007 discount) were also essentially transfers with large fiscal costs and no employment-creation value; and (2) the SSC increase imposed on larger firms (those with more than 9 employees) effectively raised revenue without causing meaningful employment losses, since labor demand for apprentices is inelastic. The policy is thus deemed inefficient in the sense that taxpayer revenue is lost without generating the intended social return of increasing employment of marginalized workers.

Q9. What are the scope conditions and limitations of the estimates?

The difference-in-discontinuities design provides internally valid estimates only for firms in a neighborhood of 9 employees, which in Italy means firms with 3 to 15 employees (90% of Italian firms and 65% of all apprentices). The paper cannot rule out that larger firms respond differently to similar subsidies. The analysis is partial equilibrium: it cannot measure spillovers, general equilibrium effects on wage-setting across the firm-size distribution, or displacement effects between firms. Cost-effectiveness estimates reflect only the direct fiscal cost of foregone SSCs and do not include fiscal externalities (e.g., effects on income tax revenues or social insurance outlays) or administrative and political costs. The exclusion of workers from the public sector means the results pertain solely to private-sector apprenticeships.

Q10. How does this paper relate to prior studies on payroll tax cuts, and what distinguishes it methodologically?

Prior national studies (e.g., Saez et al. 2019, 2012, 2021; Egebark and Kaunitz 2013; Huttunen et al. 2013; Bozio et al. 2020; Rubolino 2021) estimate labor demand responses by comparing employment of targeted versus untargeted workers, which can overstate policy effectiveness if firms substitute targeted for untargeted workers (a SUTVA violation that would not be detected by parallel pre-trend tests). Cross-regional studies (e.g., Bennmarker et al. 2009; Benzarti and Harju 2021a; Bohm and Lind 1993; Guo 2024) study firms but typically do not target genuinely marginalized workers, so estimates reflect average rather than marginal labor demand. This paper satisfies both requirements simultaneously: the discontinuity in incentives provides quasi-random variation across firms (avoiding SUTVA), and the policy specifically targets apprentices — a non-random, marginalized group — so the estimated elasticities pertain to the actual population of interest. The paper is also the first (to the authors’ knowledge) to use a formal IV strategy to estimate cost-effectiveness with standard errors, enabling statistical precision comparisons across the distribution of estimates.

Q11. What does survey evidence from the RIL data contribute to the interpretation?

The RIL (Rilevazione Longitudinale su Imprese e Lavoro), a representative firm survey collected in 2005, provides direct evidence on firms’ stated reasons for their apprenticeship hiring decisions. Among firms that do not hire apprentices, the most common reason by far is ‘we don’t need more people,’ with cost cited rarely. Among firms that do hire apprentices, the dominant reason is to train workers prior to hiring them as permanent employees; ’lower labor costs’ is a secondary consideration. This corroborates the paper’s interpretation that demand for apprentices is driven by training-for-retention motives rather than cost arbitrage, which explains why a cost reduction leaves hiring behavior unchanged.

Q12. What is the policy recommendation and its scope?

The paper urges caution in using payroll tax credits to stimulate employment, particularly for targeted groups with inherently low or inelastic labor demand. The results suggest that, for apprentices, firms hire based on training-and-conversion needs rather than cost considerations, so subsidizing cost does not expand hiring. More broadly, the cross-study cost-effectiveness comparison suggests that hiring credits — which require a new hire as a prerequisite for receiving the subsidy — may be more efficient than payroll tax cuts precisely because they screen out inframarginal firms. The paper does not rule out effectiveness for other worker types or for much larger subsidies, but the documented uniformity of null effects across industries, regions, and firm types suggests the inelasticity finding is robust within the studied population.

Key Concepts

Inframarginal hiring: Employment that would occur absent the subsidy; when a policy subsidizes inframarginal hiring, it transfers resources to firms without generating net new jobs, making it fiscally costly but behaviorally inert.

Difference-in-discontinuities: An empirical design that combines regression discontinuity with difference-in-differences: in each period a discontinuity at the policy threshold is estimated, and the pre-policy baseline discontinuity is subtracted to remove pre-existing size-related level differences and time-invariant non-linearities in the conditional expectation function.

Policy-relevant firm size: As defined by INPS under the 2007 Budget Bill: total full-time equivalent employment minus apprentices, temporary agency workers, workers on leave (unless replaced), and workers on specific on-the-job training contracts; this is the running variable determining SSC eligibility.

Cost-effectiveness (jobs per foregone revenue): The number of job-years supported per unit of foregone tax revenue (here, per 1 million euros of lost SSCs), formally estimated via instrumental variables to allow statistical inference — as opposed to back-of-the-envelope calculations that provide no standard errors.

Inelastic labor demand for apprentices: In this paper’s sense: firms’ demand for apprenticeship contracts does not respond to changes in their labor cost, because hiring decisions are driven by training-and-conversion motives (hiring to eventually retain as permanent employees) rather than by cost minimization at the margin.

Rotation of the conditional expectation function: A change over time in the slope of the relationship between an outcome (e.g., apprentice hiring) and the running variable (firm size); when the slope changes, standard DiD specifications that discretize firms into treated/control groups will spuriously detect a treatment effect even when the true policy effect is zero.

Transformation (apprentice to permanent contract): The event of a firm converting an existing apprenticeship contract into an open-ended (permanent) employment contract at the end of the apprenticeship; used as an alternative outcome to evaluate whether the subsidy increased the ultimate goal of permanent employment, not just temporary apprenticeships.

The macroeconomics of automation

Thu, 01 Jan 2026 00:00:00 +0000

Layer 1: Overview

This paper asks a foundational question: can the economy-wide degree of automation be measured coherently from standard macroeconomic data, without relying on technology-specific proxies such as robot counts or AI investment surveys? Existing micro-level proxies are fragmented across technologies and difficult to aggregate, leaving it unclear how automation evolves at the macro level or how it relates to capital deepening, factor shares, and productivity growth. The authors, Hideki Nakamura, Masakatsu Nakamura, and Shota Moriwaki, address this by developing a task-based general equilibrium framework in which the aggregate degree of automation emerges endogenously and is fully identified from observable macroeconomic aggregates.

The theoretical architecture begins with a continuum of tasks, each exhibiting Leontief technology at the task level. Within each task, capital and labor are perfectly substitutable, but firms choose the least-cost input given factor prices. Tasks are ordered by the relative efficiency of capital to labor; as the wage-to-capital-service-price ratio rises with capital deepening, capital performs an expanding range of tasks. Aggregating task-level Leontief decisions over a firm generates a global (envelope) production function. The paper’s first main theorem shows that under a mild regularity condition on task efficiency orderings, this aggregation delivers a standard neoclassical production function. Its second set of results identifies the precise efficiency structure under which the aggregate function takes the CES form: that structure corresponds to a Pareto cumulative distribution of input efficiencies. This Pareto structure yields a clean closed-form relationship: the degree of automation is determined entirely by the capital-labor ratio (in efficiency units) and the elasticity of substitution. When the elasticity exceeds one, the degree of automation equals the capital income share; when the elasticity falls below one, it equals the labor income share. Neutral technical progress leaves the degree of automation unchanged at a given capital-labor ratio; capital-augmenting progress raises it; labor-augmenting progress lowers it.

The empirical application uses panel data from the 2023 Japan Industrial Productivity (JIP) database covering 52 manufacturing industries from 1994 to 2020 (N = 1,404 industry-year observations; two industries excluded for data quality). The CES production function is estimated via GMM using first-differenced factor-share equations derived from the normalized CES system (de La Grandville 1989 normalization), with five sets of instrumental variables drawn from lagged factor prices, information stock and its price, trade openness, workforce age composition, and part-time employment shares.

The main quantitative findings are as follows. Under the assumption of neutral technical progress, the elasticity of substitution sigma is significantly above one but close to one, ranging from 1.049 to 1.102 across the five IV sets (all significant at least at the 10 percent level). Under the assumption of capital-augmenting technical progress (gK > 0, gL = 0), sigma ranges from 1.035 to 1.068, again robustly greater than one. Capital-augmenting technical progress is statistically significant across all specifications; labor-augmenting technical progress cannot be confirmed in any specification. The average estimated degree of automation across the 52 industries over the full sample period is 0.417 (standard deviation 0.171, minimum 0.138, maximum 0.811). The average rises steadily from 0.407 in 1994 to 0.426 in 2020, temporarily declining around the 2008 financial crisis before recovering. Substantial heterogeneity persists across industries throughout the sample. The distribution shifts rightward over time but retains a fat left tail, with the mode just above 0.3 and several industries exceeding 0.7.

The two-level CES extension decomposes aggregate capital into industrial robots and other capital, exploiting a purpose-built robot capital stock constructed via the RAS and perpetual inventory methods (initial year 1985). Industrial robots account for only 0.44 percent of aggregate capital stock on average. The two-level estimation yields higher elasticities (sigma-a between 1.191 and 1.346 across IV sets for the composite-labor margin; sigma-b between 1.049 and 1.096 for the robots-other-capital margin). The degree of automation for the composite rises from 0.398 to 0.430 over the sample, a more pronounced increase than the standard CES estimate, reflecting robots’ amplifying role in automation.

The paper benchmarks three automation measures against an internal consistency criterion: the squared distance between the automation degree inferred from the capital-labor ratio and that inferred from output per worker, given the same CES structure. The Pareto-based measure (the paper’s preferred measure) achieves a distance of 0.0000319, far below the Cobb-Douglas alternative (0.002484) and the continuity-preserving alternative (0.00999), validating the Pareto efficiency-distribution assumption. The Cobb-Douglas alternative yields a mean automation of 0.500 rising from 0.454 to 0.529; the continuity alternative rises more sharply from 0.208 to 0.589 but is discontinuous and sometimes falls outside the unit interval.

For policy and theory, the paper’s framework implies that Japan’s sustained capital accumulation during its prolonged stagnation after 1990 translated into rising automation even without commensurate TFP growth, connecting automation dynamics to the “productivity paradox.” The model also shows that automation can rise alongside an increasing labor income share when sigma is below one, caution against interpreting a stable or rising labor share as evidence against ongoing automation. The degree of automation provides a unified lens connecting capital deepening, factor shares, and productivity in a single theory-consistent measure.

In depth

Q1. What is the core identification strategy and what observables are used to infer the degree of automation?

The degree of automation is identified from the first-order conditions of the CES production function. Under the Pareto efficiency-distribution assumption, the CES structure implies a one-to-one mapping from the aggregate capital-labor ratio (in efficiency units), the share parameter s, and the elasticity of substitution rho to the degree of automation (Theorem 4, Eq. 25 and 31). In practice, the authors estimate the CES production function via GMM on first-differenced factor-share equations, recover rho and gK, and plug those into the formula for the degree of automation. No direct observation of tasks, robots (in the standard CES step), or technology-specific adoption decisions is required.

Q2. What are the main threats to identification and how do the authors address them?

The main threats are endogeneity of the output-to-labor and output-to-capital ratios (both simultaneously determined with factor prices) and measurement error in the capital-labor ratio (arising from industry classification changes and the RAS procedure used to construct robot data). The authors address endogeneity via GMM estimation using five distinct IV sets that include lagged factor prices, information stock and its price, trade openness, and workforce composition variables. They report that elasticity estimates are stable across all five IV sets and across alternative sample windows (including a longer 1973-2011 sample from pre-SNA-revision data), and conclude that measurement error is unlikely to drive the results. The overidentification test is not rejected for any IV set in the baseline CES specification (and for most in the two-level specification).

Q3. What theoretical result connects the degree of automation to factor income shares?

Corollary 1 establishes that under the Pareto efficiency structure (Eq. 22) with competitive factor markets, the degree of automation equals the capital income share when sigma > 1, and equals the labor income share when sigma < 1. This makes the degree of automation directly readable from income-share data in the theoretically preferred case (sigma > 1 for Japan). The empirical results are consistent with this: the average degree of automation across manufacturing industries is close to the average capital income share over the sample, providing a cross-check for Corollary 1.

Q4. Why does the paper use a Leontief production function at the task level while obtaining a CES function at the aggregate level?

The Leontief specification at the task level reflects the idea of a bottleneck in production: within a single narrowly-defined task, only capital or labor is used (once a task is automated, capital fully replaces labor in that task). Perfect substitutability between capital and labor operates at the extensive margin (which tasks are automated) rather than within a task. The aggregate (envelope) function, formed by varying the automation cutoff as the capital-labor ratio changes, generates any elasticity of substitution from zero to infinity. The Pareto efficiency-distribution assumption pins down the specific case of a CES aggregate.

Q5. How does the two-level CES extension work, and what does it add?

The two-level CES nests industrial robots and other capital into a capital composite at the inner level (robots vs. other capital, with elasticity sigma-b), then combines that composite with labor at the outer level (composite vs. labor, with elasticity sigma-a). Robot data for 52 industries are constructed via the RAS and perpetual inventory methods with an initial year of 1985. Because robots account for only 0.44 percent of aggregate capital on average, they have a small direct weight, but the two-level decomposition isolates their specific contribution to the automation margin. The two-level CES estimates sigma-a between 1.191 and 1.346 (higher than the standard CES estimates), and finds that the test of equality between sigma-a and sigma-b is rejected for three of five IV sets, suggesting the two elasticities genuinely differ. The average degree of automation rises more steeply under the two-level estimate (0.398 to 0.430) than under the standard CES estimate (0.407 to 0.426), indicating that explicitly accounting for robots reveals a more pronounced automation trend.

Q6. What is the paper’s internal consistency criterion, and how does it rank alternative automation measures?

Internal consistency is defined as the mean squared gap between the degree of automation inferred from the capital-labor ratio (Eq. 37, the paper’s preferred measure) and the degree of automation implied by observed output per worker given the same CES structure (Eq. 41). A smaller gap means the measure is more coherent with the CES framework from which it is derived. The Pareto-based measure achieves a distance of 0.0000319, more than seventy times smaller than the Cobb-Douglas alternative (0.002484) and over three hundred times smaller than the continuity-preserving alternative (0.00999). The authors therefore select the Pareto-based measure as most internally consistent with CES production.

Q7. What is documented about heterogeneity in automation across industries?

The degree of automation varies substantially across the 52 manufacturing industries, with a standard deviation of 0.171 and a range from 0.138 to 0.811 in the standard CES estimation. The kernel density in 1994 has a fat left tail with a mode just above 0.3, and several industries already exceed 0.7. The distribution shifts rightward by 2020 but remains dispersed. The authors split industries into those with an increasing capital income share (34 industries) and those with a decreasing share (18 industries) and test whether the elasticity of substitution differs between groups; they find no statistically significant difference for any IV set, implying the CES structure is uniform across industries even though automation levels differ.

Q8. How does the paper connect automation to TFP and the productivity paradox?

The theoretical framework shows that automation via task reallocation shifts the production function in a northeast direction in (k, y) space but does not shift it upward in a way that registers as TFP growth. Formally, increasing automation does not appear to impact TFP growth (citing Nakamura and Nakamura, 2008). The empirical finding that the degree of automation rose from 0.407 to 0.426 during Japan’s prolonged stagnation (1994-2020), a period of slow output-per-worker growth, is consistent with this: capital accumulation drove automation forward even though measured TFP growth was subdued. The paper thus links automation dynamics to Japan’s productivity paradox and implies that standard TFP accounting may understate the technological transformation underway.

The CES framework implies that when sigma > 1 (capital and labor more substitutable), capital accumulation raises the capital income share and lowers the labor share; the degree of automation equals the capital income share. When sigma < 1, capital accumulation raises the wage-to-rental ratio by more, increasing the labor income share; the degree of automation equals the labor income share. In both cases automation rises with capital deepening. A key implication is that observing a stable or rising labor income share does not rule out rising automation when sigma is below one or close to one. The authors’ estimate of sigma slightly above one for Japanese manufacturing implies a slightly rising capital share, consistent with the panel-estimated trend (b-hat = 0.00102, t-value = 6.84).

Q10. What are the robustness checks and how stable are the estimates?

Robustness checks include: (1) five distinct IV sets spanning different combinations of lagged wages, capital rental prices, information stock, trade openness, and workforce composition; (2) estimation under both neutral and capital-augmenting technical progress assumptions; (3) estimation using a longer sample (1973-2011 using pre-SNA-revision data), which yields a sigma still significantly above one and close to one, with slightly larger capital-augmenting technical progress reflecting higher growth in that period; (4) estimation of the full CES production function equation simultaneously with the two FOC equations (Appendix E.2), yielding similar elasticity estimates; (5) a structural change test splitting industries by capital-share trend, finding no significant difference in elasticity between subgroups. Unit root tests (Harris-Tzavalis and augmented Dickey-Fuller) confirm stationarity of all key variables except the part-time ratio, which also passes the ADF test.

Q11. What are the caveats and acknowledged limitations?

The authors acknowledge several limitations. First, three conditions cannot be simultaneously satisfied: a CES aggregate, the degree of automation lying in the unit interval, and continuity of the automation measure at unit elasticity (sigma = 1). The preferred measure prioritizes the unit-interval restriction and sacrifices continuity at sigma = 1, making direct comparisons across the sigma < 1 and sigma > 1 cases problematic (an alternative continuous measure is derived in Appendix C but may fall outside the unit interval). Second, the framework abstracts from the creation of new tasks; changes in the total number of tasks over time would affect the automation measure. Third, the paper does not decompose automation by skill level; the observed differences between skilled and unskilled labor in automation suggest a need for nested CES structures in future work. Fourth, the two-level CES nesting (robots within capital composite) is dictated by data availability; alternative nestings, such as grouping robots and labor at the first level, are not separately identifiable.

Q12. How does this paper differ from and improve upon the prior literature?

The paper improves on micro-proxy approaches (robot counts, AI investment, task-exposure indices from Acemoglu-Restrepo 2020, Adachi 2025, etc.) by providing an aggregate, theory-consistent measure that does not require technology-specific data. It extends prior CES microfoundation work (Jones 2005 Pareto-Cobb-Douglas result, Growiec 2008 Weibull-CES results) by deriving the Pareto efficiency structure that yields CES specifically from task-level automation decisions. It improves on the authors’ own prior work (Nakamura and Nakamura 2008, Nakamura 2009, 2010) by providing a complete theoretical justification for input efficiencies, a full treatment of the elasticity of substitution, and an empirical implementation. Relative to Artuc et al. (2023) and Adachi (2025), which use Frechet distributions for task productivity, this paper uses a deterministic framework with Pareto-distributed input efficiencies and emphasizes aggregate-level identification rather than cross-occupational substitution.

Q13. What are the policy implications?

The paper does not make direct policy prescriptions, but its framework has several implications. First, policymakers tracking automation can use standard national accounts data (capital stock, labor input, output, factor shares) rather than waiting for technology-specific surveys, enabling faster and more comprehensive monitoring. Second, the result that automation can advance during periods of slow TFP growth suggests that technology policy focused solely on productivity metrics may underestimate the pace of labor displacement. Third, the finding that Japan’s capital accumulation drove automation even through prolonged stagnation implies that capital subsidies or policies encouraging investment could accelerate automation independent of TFP. Fourth, the model’s prediction that automation rises alongside increasing labor shares under low substitutability (sigma < 1) warns against complacency: labor-income gains and technology-driven labor displacement can coexist. Fifth, the need for future work on skill heterogeneity and task creation suggests that the framework can be extended to inform distributional policies.

Key Concepts

Degree of automation: In this paper, the share of the unit task continuum performed by capital rather than labor, denoted a_t, ranging from 0 to 1. It is determined endogenously in equilibrium by relative factor prices and increases with the capital-labor ratio. It is distinct from any technology-specific proxy and emerges as a function of aggregate macroeconomic observables.

Task-based production framework: A model in which output requires completing a continuum of tasks, each exhibiting Leontief technology at the task level (capital and labor are perfectly substitutable within a task, but the firm either fully automates a task or uses labor exclusively). Tasks are ordered by the relative efficiency of capital to labor, and firms choose the automation cutoff that minimizes cost given factor prices.

Pareto efficiency distribution: The specific parametric form of aggregate capital- and labor-input efficiency functions (Eq. 22) under which the task-level aggregation yields a CES production function at the macro level. The relationship between the degree of automation and aggregate input efficiencies follows a Pareto cumulative distribution, which also delivers the highest internal consistency among automation measures tested.

Internal consistency criterion: A criterion for selecting among automation measures, defined as the mean squared gap between the automation degree inferred from the capital-labor relationship and the automation degree implied by the output-per-worker relationship, within the same CES structure (Eq. 42). A smaller gap indicates that the measure is more coherent with the CES production framework from which it is derived.

Capital-augmenting technical progress: An exogenous shift in the efficiency of capital inputs (A_K,t) that raises the effective capital-labor ratio and therefore the degree of automation at any given physical capital-labor ratio. Distinguished from labor-augmenting and neutral technical progress. In the empirical estimation, capital-augmenting technical progress is statistically significant across all specifications, while labor-augmenting technical progress cannot be confirmed.

Two-level CES production function: An extension of the standard CES that nests industrial robots and other capital into a capital composite at the inner level (with substitution elasticity sigma-b), then combines the composite with labor at the outer level (with elasticity sigma-a). Allows separate identification of the automation role of robots versus other capital, yielding a more pronounced increase in the degree of automation than the standard CES when robots are explicitly accounted for.

Automation frontier: The marginal task at which the cost of capital use exactly equals the cost of labor use, i.e., the task a_t at which lambda(a_t)/theta(a_t) = w_t/R_t. Tasks with indices below this frontier are automated; tasks above are performed by labor. As the wage-to-rental ratio rises, the frontier expands (more tasks become automated), capturing the central mechanism by which capital deepening drives automation.

The Unequal Costs of Carbon Pricing: Economic and Political Effects Across European Regions

Thu, 01 Jan 2026 00:00:00 +0000

Layer 1: Overview

This paper asks whether carbon pricing through the EU Emissions Trading System (EU ETS) imposes economic costs that are unequally distributed across European regions, and whether those economic costs translate into political costs in the form of votes for extremist and populist parties. The motivation is both practical — political opposition has blocked or rolled back climate policies in several countries — and analytical: no prior study had systematically estimated the political consequences of carbon pricing at the subnational level.

The authors build a panel dataset covering 224 NUTS2 regions from 20 European countries (covering 97% of EU GDP, plus Norway) over 2000–2019. Economic data come from the European Commission’s ARDECO database; emission data from EDGAR (aggregate GHG) and the EU ETS Transaction Log (verified ETS emissions from regulated installations, mapped to NUTS2 via zip codes); voting data from the EU-NED dataset with party classifications from The PopuList. Household expectations are measured from 34 Eurobarometer survey waves (2004–2019). The dataset spans 114 elections (110 national, four European Parliament).

Identification rests on the carbon policy shocks of Kanzig (2023), constructed from high-frequency movements in EU carbon allowance futures prices around 126 regulatory events between 2005 and 2019, instrumented in a monthly VAR and aggregated to annual frequency. These shocks are orthogonal to contemporaneous economic conditions by construction, and are normalized so that the on-impact effect equals a 1% rise in Euro Area HICP energy prices. The main estimator is Jorda (2005) local projections in a panel with region fixed effects, lagged controls, and Driscoll-Kraay standard errors, estimated over a four-year horizon.

Main economic findings (average region): A 1%-energy-price-equivalent carbon shock reduces real GDP by approximately 0.7% — a contraction that persists for four years. Employment, real net disposable household income, real GVA, real compensation, real investment, and hours worked all decline significantly and persistently. GHG emissions fall by roughly 1% one year after the shock, confirming the policy’s effectiveness.

Main political findings: The combined extremist vote share (far-left plus far-right) rises by 0.3 to 0.4 percentage points two years after the shock and remains elevated. Populist and Eurosceptic vote shares also rise significantly in the medium term. Political fragmentation (1 minus the HHI) increases persistently. The shift is primarily toward far-right parties.

Survey-based expectations: The share of respondents citing environmental issues as a top concern falls by approximately 2 percentage points and remains depressed for four years. Respondents become significantly more pessimistic about national economic and employment prospects and their own financial situation.

Role of the economic channel: Using the Holm-Paul-Tischbirek (2021) decomposition, up to two thirds of the total rise in the extremist vote share over the four-year horizon is attributed to the decline in GDP, employment, and household income. The first year is more dominated by non-economic attribution effects (roughly 25% of the effect is explained by the economic channel at h=1), consistent with voters initially blaming the government’s policy choice rather than responding to realized economic deterioration.

Regional heterogeneity and inequality: Regions one standard deviation above mean ETS emission intensity experience a meaningfully larger output contraction and a 20–50% larger and more persistent rise in the extremist vote share relative to the average region. Regions receiving fewer free ETS allowances face analogously larger economic and political costs. The within-country 90–10 ratio of real disposable household income rises by approximately 0.05 percentage points, with widening concentrated at the lower tail (the median-to-10th-percentile gap), meaning poorer regions bear disproportionate costs. These heterogeneous effects imply that carbon pricing contributes to regional inequality within countries.

Policy implication: The EU ETS lacks direct redistribution mechanisms. The authors argue that progressive revenue recycling — household rebates calibrated to income — is necessary to cushion vulnerable regions, limit inequality, and rebuild public support for climate policy. These concerns are especially pressing given the EU ETS’s scheduled expansion to buildings and transportation in 2027.

In depth

Q1. What is the identification strategy and what are the main threats to it?

The key identifying assumption is that the carbon policy shocks of Kanzig (2023) are exogenous with respect to regional economic conditions. The shocks are constructed from high-frequency daily movements in EU carbon allowance futures prices on days of regulatory announcements, relative to wholesale electricity prices on the prior day; the narrow event window ensures that confounding macroeconomic factors are already priced in. The shocks are then instrumented in a monthly VAR to extract structural shocks with a higher signal-to-noise ratio before being aggregated to annual frequency. The main threat would be if major regulatory announcements coincidentally coincided with other economic news. The authors defend against this by showing robustness to controlling for unemployment, stock market indices, monetary policy rates, oil prices, and a global financial crisis dummy. For the heterogeneity analysis, ETS intensity and free allowance share are fixed at their pre-sample values (end of ETS pilot phase, 2008) to rule out reverse causality from carbon pricing to the exposure measures.

Q2. How is the economic voting channel distinguished empirically from other channels?

The authors use the decomposition approach of Holm, Paul, and Tischbirek (2021). They re-estimate the extremist vote share local projection while controlling for the contemporaneous path of GDP, employment, and household income over the same h-year horizon. The residual coefficient on the carbon shock captures voting effects not attributable to economic deterioration. Comparing the controlled and uncontrolled responses shows that over the full four-year horizon, roughly two thirds of the voting increase is explained by economic variables. In the first year, the economic channel explains only about 25% of the response, consistent with non-economic attribution effects — voters blaming a government policy choice rather than an exogenous shock — being more prominent early on.

Q3. What additional evidence distinguishes ETS-driven political effects from other energy price effects?

Two benchmarks are used. First, national carbon taxes, which prior literature shows have muted economic effects, produce no statistically significant response in either real GDP or the extremist vote share (Appendix A.2), consistent with the economic channel being essential for the political response. Second, oil supply news shocks (Kanzig, 2021), constructed with a comparable high-frequency methodology and producing a similarly sized GDP decline, generate a statistically significantly smaller increase in the extremist vote share over the first two years (Appendix A.3). The excess political response to carbon shocks over oil shocks is interpreted as reflecting voters attributing policy-driven economic pain to the government, analogously to Gabriel, Klein, and Pessoa (2023) finding that austerity-induced recessions elicit stronger political responses than general downturns.

Q4. What heterogeneity across regions is documented and how is it measured?

Two exposure dimensions are explored. First, ETS emission intensity (verified ETS emissions scaled by GDP) captures direct agglomeration of installations covered by the carbon market. Second, the share of freely allocated ETS allowances relative to verified emissions captures the effective carbon price faced by firms in the region. Regions one standard deviation above mean ETS intensity experience meaningfully larger output and employment contractions, and 20–50% larger and more persistent increases in the extremist vote share. Regions with fewer free allowances bear analogously larger costs. Results hold when GHG intensity (covering non-ETS sectors) replaces ETS intensity, and when sectoral composition is controlled in the free allowance analysis. A country-level inequality analysis using local projections on the 90–10 ratio of regional household income shows that carbon pricing raises within-country dispersion by approximately 0.05 percentage points, driven primarily by widening of the lower tail (50th to 10th percentile gap), indicating that poorer regions suffer most.

Q5. What robustness checks are run?

Vote share results are robust to: (a) excluding parties coded as borderline by The PopuList; (b) excluding European Parliament elections and using only national elections; (c) averaging national and European election outcomes in years when both occur; (d) a minimal control set of only lagged dependent variable and region fixed effects; (e) an expanded control set adding country-level unemployment rate, stock market index, monetary policy rate, Brent oil price, and a GFC dummy variable. The inequality results are robust to using the 75–25 ratio and the Gini coefficient in addition to the 90–10 ratio. The heterogeneity results are robust to including time fixed effects, which absorb the aggregate carbon shock but preserve cross-sectional variation, confirming that heterogeneous responses are not driven by aggregate confounders. Driscoll-Kraay standard errors are used throughout to allow for cross-sectional and serial dependence; clustering at region-year level delivers nearly identical results.

Most directly related is Mangiante (2024), which documents that regions in poorer Euro Area countries are more exposed to carbon policy shocks. The present paper complements this by identifying within-country variation driven by ETS intensity and free allowance allocation, and by adding the political dimension. Kanzig and Konradt (2024) establish country-level economic effects of EU ETS shocks; this paper confirms those findings carry to the regional level and confirms comparable magnitudes. Gabriel, Klein, and Pessoa (2023) use the same econometric approach to study the political costs of austerity in European regions; the present paper finds analogous results for carbon pricing and attributes the political response similarly to economic deterioration. The finding that national carbon taxes lack economic or political bite echoes Metcalf and Stock (2023) and Konradt and Weder di Mauro (2023). The paper adds to the globalization-and-populism literature (Funke et al., 2016; Pastor and Veronesi, 2021; Colantone and Stanig, 2018) by identifying carbon pricing as another channel through which economic shocks drive extremist voting.

Q7. What is the direction of the political shift — toward far right or far left?

The decomposition in Appendix A.2 shows the increase in the combined extremist vote share is driven primarily by far-right parties. The far-right vote share rises significantly, while the far-left vote share shows a smaller and less precisely estimated increase. This is consistent with prior literature (Funke, Schularick, and Trebesch, 2016) documenting that far-right parties disproportionately benefit from recessions. A small decline in voter turnout is also documented, which may amplify measured increases in extremist vote shares by reducing the denominator (valid votes).

Q8. What do the results imply for environmental concern and the political sustainability of climate policy?

Eurobarometer data show that the share of respondents ranking environmental issues among the two most important problems facing their country falls by approximately 2 percentage points following a carbon policy shock, a persistent decline lasting four years. The authors interpret this as a self-interest crowding-out effect: when carbon pricing imposes economic costs, concern for the environment is displaced by concern for living standards, consistent with Douenne and Fabre (2022). This creates a potential self-undermining dynamic: carbon pricing erodes the popular support needed to sustain and strengthen climate policy over time, particularly given that carbon-intensive regions — which suffer most economically — also see the largest decline in public support for environmental issues.

Q9. What are the scope conditions on the policy implications?

The findings pertain to ETS-style cap-and-trade pricing based on regulatory-driven supply restriction, not to national carbon taxes, which the paper shows have much smaller economic and political footprints. The sample covers 20 European countries with NUTS2 regional data over 2000–2019. The carbon policy shocks are derived from EU ETS regulatory events and are specific to that institutional context; generalization outside the EU ETS requires caution. Political effects operate primarily over a two-to-four-year horizon coinciding with electoral cycles. The paper’s redistribution prescription (progressive revenue recycling) presupposes a policy instrument capable of targeting household income; the EU ETS currently lacks such a mechanism, which is precisely the gap the authors flag as most urgent given the ETS expansion to buildings and transportation scheduled for 2027.

Key Concepts

Carbon policy shock: A series of exogenous regulatory surprises in EU ETS carbon allowance markets, constructed by Kanzig (2023) from high-frequency futures price movements around 126 regulatory events (2005–2019), instrumented in a monthly VAR, and normalized to produce a 1% on-impact increase in Euro Area HICP energy prices. Distinct from carbon price levels or oil shocks; isolates policy-driven changes in the supply of emission allowances, orthogonal to contemporaneous economic conditions.

ETS emission intensity: Verified ETS emissions from regulated industrial installations in a NUTS2 region, scaled by regional GDP. The primary measure of a region’s direct exposure to EU carbon pricing; regions with higher ETS intensity experience larger economic contractions and larger shifts toward extremist parties when carbon prices rise.

Share of free allowances: The ratio of freely allocated ETS emission permits to a region’s verified ETS emissions, used as a second regional exposure measure. A higher share implies a lower effective carbon price faced by firms; regions with fewer free allowances bear larger economic and political costs from carbon policy shocks. Free allowances were originally granted to protect energy- and trade-intensive sectors from rapid cost increases.

Extremist vote share: The combined vote share of far-left and far-right parties in a region-election observation, using party classifications from The PopuList expert-coding database. The primary political outcome variable in the paper; empirically driven mainly by the far-right component in response to carbon policy shocks.

Political fragmentation: Defined in the paper as one minus the Herfindahl-Hirschman Index computed over all parties’ vote shares in an election (1 − sum of squared vote shares). Captures the dispersion of votes across parties beyond the extremist vote share; used as a summary indicator of political polarization.

Economic voting channel: The mechanism by which voters respond to carbon-pricing-induced economic deterioration — falling GDP, employment, and household income — by shifting support away from mainstream parties toward extremist alternatives. Isolated empirically via the Holm-Paul-Tischbirek (2021) decomposition; accounts for approximately two thirds of the total extremist voting response over the four-year impulse response horizon.

Regional inequality (90–10 ratio): Within-country dispersion of regional real disposable household income (or employee compensation) measured as the difference between the 90th and 10th percentile NUTS2 regions. Carbon pricing raises this measure persistently, with widening concentrated at the lower tail (the median-to-10th-percentile gap), indicating that poorer regions bear disproportionate economic costs.

The Winners and Losers of Climate Policies: A Sufficient Statistics Approach

Thu, 01 Jan 2026 00:00:00 +0000

Layer 1: Overview

This paper asks who wins and loses from climate policies — carbon taxes, renewable subsidies, and carbon tariffs — across 193 heterogeneous countries, and by how much. The motivation is that the standard IAM literature aggregates welfare into a global number, obscuring the distributional structure that determines political feasibility. Without knowing which countries gain and lose, and through which channels, it is impossible to understand why international cooperation is so difficult or which club structures can sustain themselves.

The authors build a static Integrated Assessment Model (IAM) with heterogeneous countries, international trade in goods (Armington CES), international trade in fluid fossil (oil and gas), locally traded coal, and locally supplied renewables. Production uses a nested CES combining labour with a composite of three energy types. A reduced-form climate system maps world emissions linearly to global temperature, then to country-specific local temperatures, which damage TFP through a quadratic damage function. The key methodological contribution is a first-order (log-linear) decomposition of welfare around the current equilibrium, which expresses welfare changes analytically as a function of five observable sufficient statistics: (i) direct TFP damage, (ii) export terms-of-trade, (iii) import price index, (iv) energy cost effects (change in energy prices faced by producers), and (v) energy rent effects (change in profits of domestic fossil and renewable producers). This decomposition requires no model simulation; it reads off welfare directly from observables and a small set of elasticities.

Two sets of structural parameters are estimated. First, a structural damage function is estimated using bilateral trade data from the ITPD-E dataset (2000–2016, 169 countries) via a Poisson pseudo-maximum-likelihood gravity regression that instruments temperature shocks against within-trading-partner variation in import penetration, controlling for energy market effects. The preferred specification recovers a global peak temperature of T* = 14.02°C and a damage slope parameter γ = 0.012. This strategy is designed to be robust to the Lucas critique: unlike reduced-form GDP regressions, it nets out general-equilibrium spillovers through trade and energy channels. Second, country-specific energy supply elasticities for oil-gas and coal are estimated from time-series variation in fossil rent shares and international prices (1985–2019 data), using OLS country-by-country and then an empirical Bayes shrinkage procedure with a truncated-normal prior that enforces positive elasticities. Coal is found to be substantially more elastically supplied than oil-gas; OPEC nations (e.g., Saudi Arabia) have near-inelastic oil-gas supply, while the US has relatively elastic supply.

Key quantitative results from the policy experiments follow. (1) Business-as-usual: a 3°C warming by 2100 generates a 17% loss in consumption-equivalent world welfare under utilitarian weights, implying a Social Cost of Carbon of $203/tCO₂ at the current equilibrium point-of-approximation, rising to $302/tCO₂ if computed at 3°C of warming. Under Negishi (income-proportional) weights, the SCC falls to $3.31, reflecting that damages are concentrated in low-income countries with high marginal utility. Winners include Canada and Russia; losers are concentrated in Africa, Latin America, and South-East Asia. (2) Unilateral carbon tax (China, $50/tonne): global emissions rise by less than 0.07% (not fall) because China’s carbon tax shifts its energy mix from coal toward oil-gas (coal is ~1.44× dirtier per unit of energy), raising the international oil-gas price by approximately 5%, which boosts fossil exporters’ rents and induces other countries to substitute back to coal. Global utilitarian welfare falls by 0.2%. China itself gains on net through falling coal prices and improved terms of trade. EU nations lose from higher energy import costs. (3) Unilateral carbon tax (USA, $50/tonne): global emissions fall by 0.8%; US welfare effects are small but positive (energy cost increases largely offset by terms-of-trade gains with Canada and Europe). (4) Renewable subsidies (42.6%, calibrated to produce the same average relative-price shift as a $50 carbon tax): on average substantially less effective than carbon taxation and more harmful to welfare because subsidies push countries up their upward-sloping domestic renewable supply curves, wasting resources on costly domestic generation (especially in countries with high baseline renewable shares such as France). (5) EU climate club ($50 carbon tax + CBAM tariffs): global emissions fall by 3%; global utilitarian welfare rises by around 5% (1% under Negishi weights), but the EU itself is a net loser — only Southern Europe (Spain, Portugal, Italy) gains; Germany and Scandinavian nations lose both from direct policy costs and from cooling that harms countries that benefit from warming. Oil-gas price falls by 4.6% within the club. (6) ASEAN climate club (same structure): global emissions fall by 0.5%; global utilitarian welfare rises by about 0.8% (0.2% Negishi); ASEAN members broadly benefit because they are already losers from climate change and the carbon-reduction benefit outweighs policy costs. Oil-gas price falls by 0.6%. (7) Global $50 carbon tax (all 193 countries): global emissions fall by 3.82%; global oil-gas price rises by 0.96% (substitution from coal toward oil-gas under a global carbon tax); global utilitarian welfare rises by about 6% (1% Negishi). Most of the utilitarian gain reflects reduced international inequality, since benefits concentrate in low-income tropical countries. Fossil exporters such as Saudi Arabia and Nigeria see energy rents rise as coal is substituted for by oil-gas globally.

The central mechanism finding is that leakage operates primarily through energy trade, not goods trade: energy market effects are consistently larger than goods-market terms-of-trade effects across all policy experiments. This quantifies why unilateral climate policy is so limited in effectiveness. International coordination through climate clubs overcomes leakage but creates winners and losers within member coalitions depending on each member’s energy mix, trade exposure, and baseline climate damage.

In depth

Q1. What is the identification strategy for the structural damage function and what are the main threats to it?

The authors estimate the damage function using a Poisson pseudo-maximum-likelihood gravity regression on bilateral import penetration ratios (Xij/Xii) as a function of temperature differences between exporters and importers (and their squares), with country-pair fixed effects and year fixed effects. Controls for GDP/capita (polynomial), oil rent share, and renewable energy share proxy for the time-varying component of factory-gate prices driven by energy prices and wages. The key identifying assumption is that conditional on these controls and fixed effects, temperature shocks are uncorrelated with time-varying bilateral preference or cost shifters. Threats include: (1) confounding time-varying bilateral shocks correlated with temperature, such as ENSO events or specific geopolitical shocks; (2) the possibility that global (rather than local) temperature drives damages, which the paper cannot address given limited time-series variation and potential spurious correlation concerns (following Goulet Coulombe and Klieber, 2025); (3) the treatment of θ = 5 as a known parameter in computing γ from the regression coefficient, which propagates calibration error. The authors argue their strategy is robust to the Lucas critique because it nets out general-equilibrium effects on GDP that would contaminate GDP-based damage regressions.

Q2. How does the paper’s welfare decomposition work and what are its five channels?

The welfare decomposition is a first-order log-linearisation of the indirect utility around the current equilibrium. Changes in consumption-equivalent welfare for country i decompose into: (i) direct climate TFP damage (change in Dy_i); (ii) export terms-of-trade effect (change in domestic good price p_i); (iii) import price-index effect (change in price index P_i); (iv) energy cost effects (changes in oil-gas price q^f, coal price q^c_i, and renewable price q^r_i weighted by their shares in production); and (v) energy rent effects (changes in profits from fossil, coal, and renewable extraction weighted by their shares in household income). The key insight is that none of these five terms requires solving the full model; each can be computed from observable data moments (energy mix, energy rent shares, trade shares) and a small number of estimated or calibrated elasticities.

Q3. What heterogeneity in climate damages is documented and what drives it?

Winners from climate change (3°C warming) are primarily cold countries: Canada, Russia, Scandinavian nations. Losers are concentrated in Africa (Djibouti, Niger, Burkina Faso, Sudan), Latin America, and South-East Asia. The heterogeneity arises from: (1) differences in baseline temperature relative to the estimated global peak productivity temperature T* = 14.02°C; countries hotter than T* lose productivity with further warming, while colder countries gain; (2) partial local adaptation (αT = 0.5) so each country’s effective peak temperature is halfway between T* and its current local temperature; (3) indirect effects through trade networks — cold, open economies can lose if major trading partners are damaged; (4) energy rent effects — fossil exporters lose energy rents as warming reduces global energy demand, partially offsetting their direct productivity gains.

Q4. Why does China’s unilateral carbon tax at $50/tonne raise global emissions rather than lower them?

China relies heavily on coal, which has a carbon concentration ratio of approximately ξc/ξf ≈ 1.44 (coal is ~44% dirtier per unit energy than oil-gas). A carbon tax on both fuels raises the effective cost of coal more than oil-gas, inducing China to substitute toward oil-gas imports. This raises the international oil-gas price by approximately 5%, which: (1) increases energy rents for fossil exporters (Gulf states, Russia) and (2) makes oil-gas costlier for other countries, incentivising them to substitute back toward coal. The net effect on global emissions is a slight increase of less than 0.07%, rather than a decline. This is the carbon leakage effect operating through energy trade.

Q5. Why are renewable subsidies substantially less effective than carbon taxes?

Several mechanisms distinguish the two policies. First, a carbon tax directly raises the relative price of all fossil fuels versus renewables and pushes production up the upward-sloping renewable supply curve only modestly. A renewable subsidy instead directly subsidises a reduction in the cost of renewables, which expands renewable supply — but this requires moving up the domestic renewable supply curve, wasting real resources in countries where the marginal renewable site is expensive (e.g., France with over 40% baseline renewable share). Second, a carbon tax creates a reallocation from coal to oil-gas (since the tax raises the coal price more per unit of energy), which can inadvertently raise oil-gas prices and redistribute income to exporters. A renewable subsidy does not have this feature in the same way. Third, the lump-sum financing of subsidies has a direct income cost, while carbon tax revenues are rebated, so only general equilibrium price effects matter for welfare. On average across countries, renewable subsidies cause more harm and generate smaller emission reductions per dollar.

Q6. What is the distinction between the EU and ASEAN climate clubs, and why do outcomes differ so substantially?

The EU club ($50 carbon tax + CBAM on imports from non-members) reduces global emissions by 3%, raises global utilitarian welfare by about 5%, but makes EU members net losers on average. The reason is that EU countries include many cold nations (Germany, Scandinavia) that benefit from warming; by cooling the climate, the policy harms them. Additionally, energy cost effects within the EU are heterogeneous — energy costs rise in France but fall in Poland and Germany — and Ireland is harmed through goods trade with Great Britain. The ASEAN club reduces global emissions by only 0.5% (ASEAN is smaller and less fossil-intensive in global terms), raises global utilitarian welfare by 0.8%, and ASEAN members broadly benefit because: (1) all ASEAN members are in the tropical/sub-tropical zone and thus lose from warming; (2) reducing global temperature yields direct productivity gains for members; (3) the energy rent loss for fossil exporters within ASEAN (Brunei, Indonesia) is outweighed by the climate benefit for others. The key structural difference is that the ASEAN club’s members are already losers from warming and hence have aligned incentives for carbon reduction.

Under utilitarian Pareto weights (ωi = 1, equal weight per person) and a 3°C warming by 2100, the global consumption-equivalent welfare loss is 17%, implying SCC = $203/tCO₂ at the current baseline temperature. Changing the point of linearisation to the 3°C warmer world raises the SCC to $302/tCO₂, indicating that damages accelerate as warming progresses and that the baseline approximation understates future costs. Under Negishi weights (proportional to income, ωi ∝ 1/u’(ci)), the SCC falls dramatically to $3.31/tCO₂, because damages are concentrated in low-income countries which receive little weight under income-proportional welfare aggregation. The authors note their static, log-linearised model provides a lower bound: fully dynamic IAMs with nonlinearities, uncertainty, or catastrophic-tail risks would further raise the SCC.

Q8. How does the paper estimate energy supply elasticities and what are the key findings?

The authors regress changes in the oil-gas rent share of GDP on changes in the international oil-gas price (and changes in GDP as a control) country-by-country using first differences, recovering country-specific supply elasticities. Because some OLS estimates are noisy, negative, or below 1 (implying negative supply elasticity, inconsistent with theory), the authors apply an empirical Bayes shrinkage procedure: they impose a truncated-normal prior (truncated below 1) whose hyperparameters come from a pooled regression, and compute the posterior mean for each country. Key findings: oil-gas supply is nearly inelastic in OPEC nations (Saudi Arabia) and Russia and China, consistent with market power compressing effective supply elasticity; the US has relatively elastic oil-gas supply. Coal supply is substantially more elastic on average than oil-gas; the US and India have relatively inelastic coal supply; Russia and China have more elastic coal supply. Coal rents never exceed 1% of GDP even in the largest producers, consistent with near-competitive flat supply curves. These spatial patterns matter significantly for which countries gain or lose from energy price changes induced by climate policy.

Q9. What is the main mechanism through which leakage operates — energy trade or goods trade — and how is this established?

The paper establishes that energy market effects are consistently larger in magnitude than goods-market terms-of-trade effects across all policy experiments (see Appendix Table A3). Leakage through energy trade operates because: (1) a domestic carbon tax reduces domestic demand for fossil fuels, lowering the international price of oil-gas (for small countries) or shifting demand between fuels; (2) lower oil-gas prices benefit importing countries and encourage them to use more fossil fuels, partially offsetting the original emission reduction. Goods-market leakage (productivity and competitiveness effects through the trade network) exists but is secondary. This finding has implications for policy: carbon border adjustment mechanisms (CBAMs) target goods trade leakage, but the model suggests the larger channel — energy trade leakage — is not addressed by CBAM alone.

Q10. What robustness checks or sensitivity analyses does the paper report?

The paper reports several robustness exercises: (1) The damage function estimation reports results under OLS (Columns 1-2) and Poisson (Columns 3-4), with separate or restricted coefficients on importer and exporter temperatures; the preferred Poisson specification with restricted coefficients yields T* = 14.02 and γ = 0.012, and the separate-coefficient specification yields statistically indistinguishable estimates. (2) The SCC is computed at two points of approximation — the current baseline and a 3°C warmer world — yielding $203 and $302/tCO₂ respectively, giving a sense of nonlinearity bias from log-linearisation. (3) Welfare is reported under both utilitarian (ωi = 1) and Negishi (ωi ∝ 1/u’(ci)) weights throughout, and the results differ sharply, highlighting how inequality weighting matters. (4) The partial local adaptation parameter αT = 0.5 nests pure global peak (αT = 1) and pure local baseline (αT = 0) damage specifications. (5) Appendix Table A3 provides a comprehensive decomposition of welfare into climate, energy, and trade effects for all six policy scenarios (BAU, global carbon tax, China tax, US tax, EU club, ASEAN club), enabling consistency checks across experiments.

Q11. How does this paper relate to the broader literature on IAMs and sufficient statistics?

The paper makes three connections. First, it is related to the large IAM literature (Nordhaus and Yang 1996; Barrage and Nordhaus 2024; Cruz and Rossi-Hansberg 2024) but differs by explicitly decomposing welfare into observable sufficient statistics, avoiding the need to solve a large dynamic system. Second, it is related to the sufficient statistics literature in trade (Lashkaripour 2021 on trade wars; Baqaee and Farhi 2024 on trade barriers; Kleinman, Liu, and Redding 2024 on productivity shocks in trade models) — the paper extends this approach to a broad set of climate instruments in a model with detailed energy markets. Third, it differs from Bourany (2025) — a companion paper by one author — which solves for optimal climate agreement design; the present paper instead uses sufficient statistics to evaluate many given policies, trading optimality for analytical tractability and decomposability. The paper also distinguishes from Krusell and Smith (2022), which does not allow cross-border energy trade, and from Cruz and Rossi-Hansberg (2024), which does not model heterogeneous energy rents across space.

Q12. What are the scope conditions and limitations of the approach?

Scope conditions and limitations are significant. (1) The model is static, so it cannot capture dynamic considerations: optimal intertemporal extraction paths, green paradox effects (whether carbon taxes accelerate fossil extraction), directed innovation toward renewables, adaptation capital accumulation, or dynamic leakage in energy markets. (2) The first-order log-linearisation abstracts from nonlinearities in the climate system, making the results most relevant as marginal effects near the current equilibrium rather than for large climate-policy changes or for evaluating policies at future, warmer states of the world. (3) The paper does not model market power in international energy markets (OPEC behaviour), abstracting from strategic behaviour by fossil exporters. (4) Labour is internationally immobile, so migration as a margin of adaptation is excluded. (5) Utility damages from climate change (mortality, amenity loss) are excluded — only productivity (TFP) damages are modelled; including utility damages would amplify gains and losses proportionally. (6) The framework cannot evaluate dynamic policy environments such as climate coordination with commitment problems or intergenerational redistribution from carbon taxation.

Q13. What are the policy implications of the paper’s findings?

Several policy implications follow from the paper’s results, with important scope conditions. (1) Unilateral climate policy is largely ineffective for reducing global emissions and can even increase them (as in China’s carbon tax case); the standard free-rider analysis understates the problem because energy-market leakage can reverse the direction of emissions. (2) Renewable energy subsidies are generally a worse policy instrument than carbon taxes, because they push countries up costly domestic supply curves rather than reallocating away from fossil fuels through price signals; policy prescriptions that favour subsidies (such as the US Inflation Reduction Act) should account for this comparative inefficiency. (3) Climate clubs with both a domestic carbon tax and carbon tariffs (CBAMs) can overcome leakage effects and yield positive global welfare gains, but impose net costs on members whose composition makes them net losers from cooling (cold, energy-exporting member nations). This suggests club membership incentives are heterogeneous even within a bloc and require side payments or complementary redistribution to be stable. (4) ASEAN-style clubs where all members are hot-country losers from warming can achieve a Pareto-improvement for members while also improving global welfare, making them potentially more robust to free-riding than clubs like the EU where some members prefer a warmer climate. (5) The SCC estimated under utilitarian weights ($203/tCO₂) is substantially higher than under Negishi weights ($3.31/tCO₂), implying that the appropriate SCC for policy depends critically on how inequality across countries is weighted in the social welfare function.

Key Concepts

Sufficient statistics (for climate policy): In this paper’s sense, a set of observable data moments and estimable elasticities — specifically nations’ energy mix (shares of oil-gas, coal, renewables), energy rent shares of GDP, bilateral trade shares, energy supply and demand elasticities, and damage parameters — that fully characterise, to the first order, the welfare impact of a climate policy change without requiring the full model to be solved. The approach follows Chetty (2009) and extends it from tax incidence to climate policy in an IAM with trade.

Carbon leakage: In this paper’s framework, the phenomenon by which a unilateral domestic carbon tax reduces domestic fossil demand and lowers the international price of oil-gas, inducing countries outside the policy to increase their fossil fuel consumption, partly or fully offsetting the original emission reduction. The paper shows leakage operates primarily through energy trade (oil-gas price channel) rather than through goods trade competitiveness effects, with energy effects consistently dominating in magnitude across all policy experiments.

Local Cost of Carbon (LCC): The country-specific welfare cost of an additional unit of global carbon emissions, measured in monetary units as the negative of the partial derivative of country i’s welfare with respect to aggregate emissions, divided by the marginal utility of consumption. Distinct from the global Social Cost of Carbon (SCC), which aggregates LCCs across countries with Pareto weights. Countries whose productivity is harmed more by warming have a higher LCC; cold countries may have a negative LCC (they benefit from marginal warming).

Structural damage function: The function Dy_i(E) mapping world cumulative emissions E to country i’s TFP via a quadratic temperature-productivity relationship with peak temperature T* and slope parameter γ, estimated in this paper from bilateral trade data (import penetration ratios and temperature differences) rather than from GDP-temperature regressions. The estimation is designed to be robust to the Lucas critique by netting out general-equilibrium propagation through trade and energy markets that would bias GDP-based estimates.

Climate club: In this paper’s usage (following Nordhaus 2015), a coalition of countries that jointly impose a domestic carbon tax on their own emissions and levy carbon tariffs (carbon border adjustment mechanism, CBAM) on imports from non-member countries scaled by the carbon intensity of those imports. The paper studies EU and ASEAN climate clubs and finds they differ sharply in welfare distribution: the EU club creates net losers among members (because some EU countries benefit from warming), while the ASEAN club delivers welfare gains for all members because all are hot-country losers from climate change.

Energy rent effect: The component of the welfare decomposition arising from changes in profits of domestic energy producers (fossil extractors, coal producers, renewable firms) due to changes in energy prices. Captured in the sufficient statistics formula as the profit share of GDP weighted by the relevant price change. Fossil-fuel-exporting countries have large positive exposure to oil-gas price increases (gains from price rises) and are harmed when global carbon policy reduces the fossil price — this is a key redistribution channel distinct from both climate damages and goods trade.

Empirical Bayes shrinkage (energy supply elasticities): In this paper, a procedure that estimates country-specific fossil and coal supply elasticities by first running OLS regressions of rent share changes on price changes country-by-country, then shrinking noisy or negative estimates toward a pooled mean by imposing a truncated-normal prior (truncated below 1 to enforce positive elasticities) and computing posterior means. Used because country-level time series are short and noisy, while the prior encodes the theoretical constraint that supply must be upward-sloping.

Negishi weights vs. utilitarian weights: Two distinct social welfare aggregation methods used throughout the paper to aggregate country-level welfare changes into global welfare. Utilitarian weights (ωi = 1 per person) put equal importance on each person globally, so welfare gains in low-income tropical countries count fully; this yields high SCCs ($203/tCO₂) and large global welfare gains from carbon taxation. Negishi weights (ωi ∝ 1/u’(ci), proportional to income) downweight poor countries and upweight rich ones, yielding dramatically lower SCCs ($3.31/tCO₂) and smaller measured global welfare gains because damages concentrate in low-income countries that receive little weight.

Train to Opportunity: the Effect of Infrastructure on Intergenerational Mobility

Thu, 01 Jan 2026 00:00:00 +0000

Layer 1: Overview

This paper asks whether proximity to transport infrastructure can sever the occupational tie between parents and children — a question with direct bearing on the debate over place-based versus people-based policies. The authors exploit the nineteenth-century expansion of the railroad network across England and Wales, a setting where the First and Second Industrial Revolutions were remaking the occupational structure at the same time that the railroad was knitting together local labor markets and enabling geographic mobility.

The empirical strategy centers on a novel dataset of close to 980,848 father-son pairs constructed from the full digitized population censuses of England and Wales in 1851, 1881, and 1911 (I-CeM project). Individuals are tracked across consecutive censuses using the Abramitzky-Mill-Perez (2019) linking procedure, which achieves match rates of 43–50% for men aged 40–52. Crucially, each individual is geolocated to the street level by matching census addresses to the GB1900 gazetteer, allowing railroad access to be measured as the straight-line distance from the childhood residence to the nearest train station — a finer measure than the district-level presence indicators used in prior work. Sons’ occupations are observed at ages 40–52; fathers’ occupations are measured 30 years earlier when sons were aged 10–22. Occupational mobility uses two complementary scales: HISCO categories (farming, laborer, services, sales, clerical, managerial, professional) and the continuous HISCAM social-interaction-distance ranking (scores 28–99, mean 50, SD 10).

The key endogeneity problem is that railroad companies targeted low-density, cheap land, and that wealthy landowners and local politicians influenced station placement. To isolate exogenous variation, the authors construct a dynamic least-cost path (DLCP) network connecting 53 major towns identified by their 1801 populations (top 10% of the population distribution, threshold 9,172 inhabitants). The DLCP assigns slope costs to 50x50 meter grid cells and finds the minimum-cost path between every town pair. Lines are ranked by betweenness centrality to separate “early” 1851 lines from “late” 1881 lines, giving a time-varying instrument. Proximity to the nearest DLCP line is used as the instrument for proximity to the nearest actual train station, with standard errors clustered at the parish level. Controls include county and census-year fixed effects, distance to the nearest 1801 major town and its population, distance to Roman roads, ancient ports, and navigable waterways, plus household characteristics (number of servants as a wealth proxy, household size, and father’s foreign birth).

Main results (preferred IV specification with full controls): sons who grew up one standard deviation — approximately 5 km, or about one hour’s walk — closer to a train station were 11 percentage points more likely to work in an occupation category different from their father’s. They were 5 percentage points more likely to be upwardly mobile, defined as a son’s HISCAM score exceeding his father’s by more than one standard deviation of the son’s distribution. The downward mobility estimate is 3 percentage points — positive but smaller in magnitude — indicating that railroad access raises occupational churn asymmetrically, predominantly upward. First-stage F-statistics exceed the Staiger-Stock threshold comfortably (135–414 across specifications). OLS estimates are uniformly smaller than IV estimates, consistent with historical evidence that the railroad targeted areas with weaker growth trajectories.

The occupational transitions underlying these results run strongly out of farming and into professional, clerical, sales, and services categories, regardless of the father’s own occupation (Table IV). Sons growing up closer to the railroad were 19 percentage points less likely to work in a declining occupation and 16 percentage points more likely to work in a growing occupation. The distributional pattern shows an inverted-U relationship with father’s occupational decile for occupation-category switching and rank divergence, with the greatest gains concentrated among sons of middle-ranking fathers. For upward mobility specifically, the benefits diminish monotonically as father’s rank rises — sons from blue-collar backgrounds gained more (upward mobility coefficient 0.064) than sons from white-collar backgrounds (0.031).

The authors decompose the total railroad effect on intergenerational mobility into three channels using a structural decomposition applied to a sample of 342,715 brothers: (1) changes in local labor-market opportunities, estimated as the effect on mobility for stayers; (2) changes in the returns to spatial mobility, estimated via a within-family comparison of brothers who moved versus stayed; and (3) changes in the rate of spatial mobility itself. Better railroad access raised the probability of moving away from the birth county by 15 percentage points. However, the estimated return to spatial mobility — the extra boost from actually moving — was reduced by railroad access (negative interaction between proximity and mover status), meaning the railroad decreased the relative advantage of leaving. The decomposition (Table C.6) shows that changes in local opportunities account for the great majority of the total mobility effect. Parish-level evidence confirms the local opportunity mechanism: better-connected parishes saw population growth, more industrial chimneys, more entrepreneurs, higher shares of skilled and literate workers, higher Gini coefficients, and higher median occupational ranks — consistent with agglomeration, industrialization, and skill-biased structural change.

The policy implication is that transport infrastructure investment can reduce intergenerational persistence in occupational status, primarily by restructuring the local labor market rather than by enabling workers to exit. The caveat is that these gains were unevenly distributed — middle- and lower-ranking families benefited most, and the railroad simultaneously raised local inequality alongside local mobility.

In depth

Q1. What is the core identification strategy and what are the main threats it addresses?

The authors use a ‘dynamic least-cost path’ (DLCP) instrument. They connect 53 major English and Welsh towns (defined as the top 10% of the 1801 population distribution, with at least 9,172 inhabitants) via least-cost routes computed over a 50×50 meter terrain grid that assigns slope-based costs to each cell. The instrument is proximity from the childhood residence to the nearest line in this DLCP network. The logic is that individuals incidentally located near the geographic route between major historical towns are more likely to be near an actual railroad — but the DLCP route is based purely on terrain costs, not on local demand, local resources, or the political lobbying that shaped where stations were actually placed. The strategy addresses: (a) reverse causality from high-growth areas attracting railroad placement; (b) sorting of ambitious or wealthy households toward connected parishes; (c) railroad companies’ demand-driven routing choices. The exclusion restriction could be violated if location along least-cost paths between 1801 major towns is directly correlated with intergenerational mobility for reasons other than the railroad. The paper addresses this by controlling for distance to the nearest 1801 major town and its population (proximity to nodes), proximity to Roman roads, ancient ports, and navigable waterways (pre-existing trade routes), and household wealth proxies.

Q2. How is the instrument made dynamic, and why does this matter?

The authors divide the hypothetical network into ’early’ (1851) and ’late’ (1881) lines by ranking lines in decreasing order of betweenness centrality — the number of times a line connects major towns via shortest paths — until the total cost of the 1851 observed network is exhausted. This dynamic structure means the instrument varies across both space and census cohorts (sons measured in 1851-1881 versus 1881-1911). Without the dynamic feature, the instrument could conflate the effects of lines that were built early (and thus had decades to affect local economies) with lines built later. The temporal variation bolsters the plausibility of the exclusion restriction and is shown to be robust in alternative specifications using static least-cost paths and slope-free least-cost paths.

Q3. What are the four dependent variables and how is intergenerational mobility defined?

The paper uses four measures: (1) an indicator equal to one if the son works in a different HISCO occupation category than his father; (2) the absolute value of the difference in HISCAM scores between son and father; (3) ‘upward mobility,’ an indicator equal to one if the son’s HISCAM score exceeds his father’s by more than one standard deviation of the son’s score distribution; (4) ‘downward mobility,’ the symmetric indicator for a decline greater than one standard deviation. Sons’ occupations are observed when sons are 40–52 years old; fathers’ occupations are measured 30 years earlier when sons were 10–22. The HISCAM scale is held constant over the period (national GB scale, 1800–1938) so that rankings reflect fixed social stratification positions rather than period-specific prestige. The paper also uses time-varying HISCAM, HISCLASS, Woollard, and Armstrong classifications as robustness checks.

Q4. What is the first-stage performance of the instrument?

The first-stage relationship between proximity to the nearest DLCP line and proximity to the nearest actual train station is positive and statistically significant across all specifications. The Sanderson-Windmeijer F-statistic is 414 in the specification without controls, 136 with county and year fixed effects and full controls, and remains well above the conventional threshold of 10. The first-stage coefficient drops from 0.640 to 0.339 when full controls are added, indicating that a portion of the geographic correlation between the DLCP and the actual network reflects the pre-existing economic importance of towns and travel routes — which is precisely what the controls absorb.

Q5. What are the main mechanisms and how are they distinguished empirically?

The paper decomposes the total IV effect on intergenerational mobility using a three-part decomposition: (1) Changes in local opportunities, measured as the effect of proximity on mobility for sons who stayed in their birth county (stayers); (2) Changes in the returns to spatial mobility, estimated by comparing brothers who moved with brothers who stayed (using family fixed effects), and interacting this comparison with railroad proximity; (3) Changes in the rate of spatial mobility itself, estimated from the effect of proximity on the probability of county-to-county migration. Table C.6 shows that local opportunities account for the dominant share of the total effect. The railroad raised the migration probability by 15 percentage points (Table VI), so spatial mobility channels exist — but the railroad decreased the relative advantage of actually moving (negative interaction term in Table V), meaning the local opportunity channel more than offsets the spatial channel. Supporting evidence from parish-level regressions (Table VII) shows that better-connected parishes experienced significantly higher population growth, more industrial chimneys, more entrepreneurs per 100 square meters, higher shares of skilled and literate workers, higher Gini coefficients, and higher median occupational ranks — consistent with agglomeration and skill-biased industrialization.

Q6. What heterogeneity is documented by father’s occupation and position in the distribution?

The effects are heterogeneous by the father’s occupational position. Figure 6 shows an inverted-U pattern for occupation-category switching and absolute rank divergence: sons of middle-ranking fathers benefit most from railroad access. For upward mobility (Figure 6c), the benefits diminish monotonically from the lower end of the father’s distribution — sons of low-ranking fathers are most likely to move up. Sons of white-collar fathers see smaller (and sometimes statistically insignificant) upward mobility gains (0.031) compared with sons of blue-collar fathers (0.064), while the occupation-category switching benefit is also larger for blue-collar sons (0.108 vs. 0.057) (Table C.1). Separate transition matrices by HISCO category (Table IV) show that railroad access reduces the probability of farming for sons of all father types, and raises probabilities of clerical, sales, and services occupations. Effects on becoming a laborer are heterogeneous: for sons of farmers, proximity raises the probability of becoming a laborer; for sons in service occupations, it decreases it.

Q7. What robustness checks are run?

The paper performs an extensive battery. (1) Alternative connectivity measures: distance to the nearest railroad line, indicator variables for train station within 5, 10, and 15 km, and parish-level station presence. (2) Alternative mobility thresholds: 0.5, 1.5, and 2 standard deviations for upward and downward mobility; time-varying HISCAM to account for changing occupational prestige. (3) Removing railroad-specific occupations (train conductors, controllers) to check for mechanical effects. (4) Alternative specifications: second-order polynomials, parish fixed effects (10,419 parishes), and fully nonparametric covariate controls via k-means clustering (500 clusters). (5) Alternative instruments: a slope-free DLCP and a static (non-dynamic) least-cost path. (6) Geolocation robustness: using parish centroids instead of street-level addresses. (7) Linking bias: controlling for the individual probability of being linked using cubic polynomials on linkage probability and surname-frequency dummies; also checking that the railroad network explains little of the share of linked individuals at the parish level. (8) Subsamples: by census year (1851-1881 vs. 1881-1911), by county (leave-one-out), by rural/urban status, by father’s age, by son’s age, by birth order, by native/first-/second-generation immigrant status, by whether the son was born in the same county he grew up in, and by whether the father was in farming. (9) Causal response weighting: the Loken-Mogstad-Wiswall decomposition shows positive IV weights across the entire proximity distribution, consistent with a LATE interpretation. Results are stable across all checks.

Q8. How does the paper handle the selection-into-migration problem in estimating returns to spatial mobility?

The authors follow Abramitzky, Boustan, and Eriksson (2012) and use a within-family comparison of brothers — a subsample of 342,715 sons from 157,369 households who grew up in the same household but one moved county while the other stayed. Family fixed effects absorb the shared household characteristics (wealth, motivation, family networks, financial constraints) that jointly determine the propensity to migrate and the baseline mobility trajectory. The railroad-proximity interaction with mover status is instrumented using the interaction of the DLCP instrument with the mover indicator, via a control function approach. The estimated baseline return to spatial mobility (the mover premium) is positive and significant — movers have higher occupation-category divergence and shift more in both directions — but the railroad-induced change in return to mobility is negative, meaning that proximity to the railroad reduced the additional mobility benefit of actually migrating. This finding is the core of the conclusion that local opportunities, not spatial mobility, dominate.

Q9. What does the paper document about local labor market changes induced by the railroad?

Parish-level IV regressions (Table VII) show that better proximity to the 1851 network (instrumented by the DLCP) is associated with: significantly higher population growth between 1851 and 1881; a significantly larger number of industrial chimneys (proxying factory concentration, sourced from Heblich-Trew-Zylberberg (2021)); more entrepreneurs per 100 square meters (from the British Business Census of Entrepreneurs); higher shares of high-skilled and literate workers; a higher Gini coefficient over occupational ranks; and a higher median occupational rank. Additionally, sons in better-connected parishes were 19 percentage points less likely to work in a declining occupation and 16 percentage points more likely to work in a growing occupation (Table C.3). Sons were also 3 percentage points more likely to be literate and 7 percentage points more likely to work in a non-manual occupation (Table C.5). These findings collectively point to agglomeration, industrialization, skill-biased technological change, and the creation of a new entrepreneur class as the mechanisms by which the railroad transformed local labor market structure.

Q10. What prior work does this paper relate to most closely, and what distinguishes it?

The paper sits at the intersection of the railroad-infrastructure and intergenerational-mobility literatures. In the infrastructure tradition, it relates closely to Donaldson (2018, AER) on railroads in India, Donaldson and Hornbeck (2016, QJE) on US market access, Bogart et al. (2022, JUE) on population and structural change in England and Wales, and Heblich-Redding-Sturm (2020, QJE) on London commuting and urban growth. The closest prior paper is Perez (2017) on nineteenth-century Argentina, who finds railroad access shifted children from agricultural into white-collar and skilled blue-collar occupations; this paper provides similar evidence for England and Wales at individual level and adds a full mechanism decomposition. In the intergenerational mobility tradition it relates to Long and Ferrie (2013, AER) and Long (2013, ERH) on census-based occupational mobility in Victorian Britain. The key methodological advantages of the current paper are: (a) use of the full (not 2%) census for all three years, yielding close to 1 million father-son pairs with match rates of 43–50% versus 15–33% in prior work; (b) street-level geolocation enabling individual-level rather than district-level measurement of railroad access; (c) the explicit three-way mechanism decomposition separating local opportunities, returns to migration, and migration rates; and (d) documenting rich heterogeneity by father’s occupational rank and occupation category.

Q11. What are the policy implications and what scope conditions limit their external validity?

The paper’s core policy message is that transport infrastructure investment can be an effective mechanism for reducing intergenerational occupational persistence — primarily by creating new local labor market opportunities rather than by enabling low-income workers to reach distant job centers. This provides historical support for place-based policies of the sort embodied in the Biden ‘Build Back Better’ infrastructure proposals or the UK HS2 high-speed railway project (mentioned in the paper). The main scope conditions limiting generalizability are: (1) The setting is nineteenth-century England and Wales during the Industrial Revolution, when the occupational structure was shifting rapidly from farming to industry and commerce — the railroads arrived at a moment of latent demand for new labor market structures; (2) The benefits were not evenly distributed: middle-ranking families (by father’s occupational rank) gained most in absolute occupational switching and rank divergence, while the lowest-ranked families gained most specifically in upward mobility; (3) The railroad simultaneously raised local inequality alongside local mobility, suggesting infrastructure investment can be inequality-increasing in the cross-sectional distribution of wages even as it reduces intergenerational persistence; (4) The effects are highly localized — even 5 km of additional distance matters — implying that the placement of stations relative to where low-income families actually live is crucial for achieving distributional goals.

Q12. What does the paper document about the baseline patterns of intergenerational mobility in the sample?

In the full sample of 980,848 father-son pairs covering 1851-1881 and 1881-1911, 80% of sons do not remain in the same HISCO occupation category as their father. The correlation between father’s and son’s HISCAM ranks is 0.28. Among sons, 18% experienced upward mobility (son’s HISCAM rank more than one SD higher than father’s) and 15% experienced downward mobility (more than one SD lower). About 31% of sons moved to a different county from where they grew up, settling on average 100 km away. Sons grew up on average 3.28 km from the nearest train station (SD 5.45 km). These descriptives reveal strong spatial clustering in intergenerational mobility patterns at the parish level.

Q13. Does the LATE interpretation hold and what does the weighting function show?

The authors verify the LATE interpretation via two approaches. First, following Loken-Mogstad-Wiswall (2012), they compute the causal response weighting function as the covariance between each discrete proximity indicator and the DLCP instrument, divided by the covariance between the proximity measure and the DLCP instrument. They find positive weights across the entire distribution of proximity to the nearest train station, concentrated most heavily for individuals residing 0.5 to 1.5 proximity units (approximately 2.7 to 8.1 km) from a train station — these are the individuals whose proximity is most affected by incidental location along the DLCP. The absence of negative weights indicates the IV estimate does not mix complier and never/always-taker effects in a sign-reversing way. Second, following Blandhol et al. (2022), a fully nonparametric specification using 500 k-means clusters for covariates yields estimates very close to the parametric baseline, consistent with a LATE interpretation of the linear IV estimator.

Key Concepts

Dynamic Least-Cost Path (DLCP) Network: The paper’s instrument for railroad access. A hypothetical railroad network connecting England and Wales’s 53 largest towns in 1801 via routes that minimize geographic cost (distance plus slope-based terrain costs), ignoring all demand-side factors. Lines are classified as ’early’ (1851) or ’late’ (1881) by betweenness centrality until the cost budget of the actual 1851 network is exhausted. Proximity from childhood residence to the nearest DLCP line instruments proximity to the nearest actual train station.

Intergenerational Occupational Mobility: In this paper, the degree to which a son’s adult occupation differs from his father’s, measured both categorically (same versus different HISCO category) and cardinally (difference in HISCAM scores). Upward (downward) mobility is specifically defined as the son’s HISCAM score exceeding (falling below) the father’s by more than one standard deviation of the son’s HISCAM distribution.

HISCAM Score: A continuous occupational ranking (range 28–99, mean 50, SD 10) derived from the frequency of social interactions — marriages, friendships, parent-child links — between occupations in historical data. Higher scores indicate a more advantageous position in the social stratification structure. The paper uses the national Great Britain scale, held constant for 1800–1938, to make rankings comparable across census years.

Local Opportunities Channel: The mechanism by which railroad access improved intergenerational mobility through restructuring the local labor market — enabling commuting, attracting factories and entrepreneurs, spurring urbanization and industrialization, and creating new occupations requiring new skills — without requiring sons to migrate away from their birth county. Identified empirically as the effect of railroad proximity on mobility outcomes for sons who stayed in their birth county (stayers).

Returns to Spatial Mobility: The additional intergenerational mobility benefit (or penalty) associated with actually migrating to another county, estimated using within-family variation among brothers — one who moved and one who stayed — to net out shared household-level determinants of mobility. The paper finds that railroad access reduced (made more negative) the returns to spatial mobility, meaning that the relative advantage of leaving shrank as local opportunities expanded.

Inconsequential Place IV Approach: An identification strategy (following Chandra-Thompson 2000 and Michaels 2008) in which the instrument for infrastructure access is constructed from the geographic convenience of locations lying between endpoints of a planned network, rather than from demand-side factors at those locations. The DLCP instrument in this paper is a specific implementation: individuals living between 1801 major towns incidentally receive railroad access because the low-cost route between towns passes near their residence.

Occupational Tie (Father-Son): The tendency for sons to remain in the same occupation category or same position in the occupational ranking as their father. In this paper, severing the occupational tie means a son moves to a different HISCO category and/or achieves a HISCAM score meaningfully different from his father’s. The railroad’s main effect is framed as reducing this tie, with upward mobility being the dominant direction of change.

Universal Daycare and Mothers' Working Lifetime

Thu, 01 Jan 2026 00:00:00 +0000

Layer 1: Overview

This paper estimates the causal effects of universal daycare access on mothers’ labor force participation, full-time employment, hours worked, and earnings across 34 years after the birth of their first child — the longest window examined in this literature. The motivation is twofold: the existing evidence base is overwhelmingly short-run, and the human capital channel (reduced depreciation of skills, accumulation of experience) implies that early labor market attachment during child-rearing years could compound over decades in ways that short-run estimates miss entirely.

The identification exploits Denmark’s 1964 reform that converted a targeted (means-tested) childcare system into a universal one, which triggered a staggered geographic roll-out of daycare centers from 1966 onward across the country’s 2,033 neighborhoods nested in 277 municipalities. The paper combines digitized historical daycare yearbooks (1964–1975), the 1970 census, and administrative registers from Statistics Denmark covering 370,602 mothers who had their first child between 1964 and 1975. Employment is measured via annual contributions to the Supplementary Pension Fund (ATP); earnings from tax records are available from 1980 through 2015, adjusted to 2016 USD. The empirical strategy is a difference-in-differences design comparing mothers in neighborhoods with versus without daycare within the same municipality over time. Daycare availability when the first-born child turns four is used as the fixed treatment indicator for the long-run regressions. Municipality fixed effects absorb cross-sectional confounders; year-of-first-birth dummies capture macro trends.

The contemporaneous effects are already substantial. Once year and municipality fixed effects and covariates are included, daycare availability raises the probability of participation by 1.5 percentage points when the child is two, rising to 5.3–5.7 percentage points for years three through six — translating to roughly 9 percent more likely to participate relative to the mean. Full-time employment rises by 9–12 percent relative to the mean for years three through six; hours worked increase by 0.27 hours per week (1.8 percent) when the child is four.

The long-run effects persist throughout the entire working life. Relative to the sample mean, mothers with daycare access are 9.7 percent more likely to participate when the first child turns four, declining to 5.7 percent at child age 14, 3.1 percent at child age 22, and still 1.2 percent at child age 34 (when the average mother is approximately 57.7 years old). Full-time employment effects follow a parallel trajectory: 11 percent higher at child age four, 8.2 percent at child age 14, and 4.4 percent at child age 34. Log earnings (conditional on employment) range between 3 and 6 percent higher throughout the observation window; mothers earn 5.3 percent more when the child is 16 and 4.2 percent more when the child is 34.

Heterogeneity by education is a central finding. For low-educated mothers (no post-secondary education, 50 percent of the sample), participation effects are 10.1 percent at child age 10, 5.1 percent at child age 17, and remain statistically significant through 32 years. For higher-educated mothers, participation effects are 3.9 percent at child age 10, fall below 1 percent by child age 17, and become statistically indistinguishable from zero by child age 23. Employment effects are thus larger and more persistent for low-educated mothers. Earnings effects, however, are more closely aligned across education groups and show a distinctive pattern for higher-educated mothers: earnings effects persist and remain significant long after employment effects have faded, suggesting that sustained attachment during child-rearing years translates into qualitative career advancement (not just more years worked) for the more educated group.

Potential mediators include reduced secondary fertility and increased parental separation. Daycare for children aged three to six reduces the total number of children by 0.036 (1.6 percent relative to the mean of 2.2), reduces the probability of having more than two children by 1.8 percentage points (6.0 percent), and increases birth spacing by 0.137 years, making mothers 2.2 percentage points less likely to have a second child within two years. Additionally, mothers with daycare access are 2 percentage points more likely to live apart from the first-born child’s father when that child turns 16 — consistent with greater female economic independence. These mediator effects do not vary systematically by education level. Daycare access does not affect additional educational attainment after first birth, ruling out re-skilling as a channel.

The policy implication is that subsidized universal daycare is not merely a short-run labor supply intervention but a persistent investment in female human capital accumulation, with effects that compound over careers and remain economically meaningful into near-retirement ages.

In depth

Q1. What is the identification strategy and what are the key threats to it?

The paper uses a staggered difference-in-differences design. The key variation is the timing of daycare center openings across neighborhoods within municipalities following the 1964/1966 Danish reform. Daycare availability in the year the first-born child turns four is the fixed treatment indicator for long-run regressions; current-year daycare availability is used for contemporaneous regressions. Municipality fixed effects absorb time-invariant local differences; year-of-first-birth dummies absorb aggregate time trends. The main threat is non-random placement of daycare centers — if centers opened in areas where female labor force participation was already rising, the estimates would be upward biased. The paper addresses this with (1) an event study at the neighborhood level using data from 1960 through 2003 showing no pre-reform differential trends between neighborhoods that later received daycare and those that did not (compared against placebo neighborhoods assigned fictitious opening dates mimicking the actual distribution), and (2) a selective migration check showing that mothers who moved longer distances from their birthplace were no more likely to reside in a neighborhood with daycare once the full conditioning set is included. A residual concern is that for mothers having their first child before 1970, neighborhood assignment is measured post-birth (1970 census), which is addressed by a robustness check excluding the pre-1970 first-birth cohort.

Q2. How does the paper deal with heterogeneous treatment effects and two-way fixed effects bias?

The paper acknowledges the recent literature on TWFE bias under treatment effect heterogeneity (De Chaisemartin and d’Haultfoeuille 2020; Callaway and Sant’Anna 2021; Sun and Abraham 2021; Borusyak et al. 2024). It replicates the pre-reform event study using the Borusyak et al. (2024) imputation estimator, which is robust to heterogeneous treatment effects and allows for covariates, and finds similar results to the standard TWFE event study (Appendix Figure A.2). The main long-run regressions fix the treatment indicator to daycare availability when the child is four, so there is no variation in treatment timing within a regression, limiting but not eliminating TWFE concerns for the long-run estimates.

Q3. What is the main mechanism behind the persistent effects?

The paper attributes the persistence to human capital dynamics: labor force participation during the child-rearing years reduces depreciation of previously accumulated human capital (from education and prior work experience) and enables new on-the-job human capital accumulation through the current job. For low-educated mothers, the primary channel appears to be the extensive margin — daycare moves mothers who would otherwise become homemakers into paid employment, and the employment effects persist because once labor market attachment is established, it is durable. For higher-educated mothers, the earnings-employment gap is the key signal: employment effects fade within roughly 23 years (consistent with convergence once children are no longer preschool age and informal care becomes feasible), yet earnings remain elevated for decades, suggesting that the women who maintained employment during child-rearing years accrued qualitatively better positions — more experience, better job-match, more promotions — compared to those who did not.

Q4. What are the main mediators and how are they distinguished?

Three mediators are examined. First, secondary fertility: daycare for children aged 3–6 reduces number of children by 0.036, probability of a third child by 1.8 percentage points, and probability of a fourth child by 0.5 percentage points. The effect operates through daycare for children 3–6 (not 0–2), consistent with the main employment effects operating when the child is three or older. The fertility reduction increases the opportunity cost interpretation — daycare raises the effective wage, making additional children more costly in terms of foregone earnings. Second, birth spacing: mothers with daycare access wait 0.137 more years between first and second child, and are 2.2 percentage points less likely to have the second child within two years, allowing longer uninterrupted work spells. Third, parental separation: mothers with daycare access are 2 percentage points more likely to live apart from the child’s father at child age 16, consistent with greater economic independence from labor market participation reducing barriers to separation. Additional educational attainment after first birth is tested and found to be an insignificant channel (no significant effect overall, a marginal effect only for low-educated mothers), ruling out re-skilling as a mediator.

Q5. What heterogeneity is documented beyond the education split?

The paper’s primary heterogeneity analysis is by maternal education level (low: no post-secondary education versus higher: any post-secondary education including vocational training, college, or university). The education split produces the most substantive finding: employment effects are larger and more persistent for low-educated mothers, while the earnings-employment divergence is the distinctive feature for higher-educated mothers. No other dimensions of heterogeneity (by birth cohort, by municipality type beyond the urban indicator, by parity) are formally reported in the main results, though geographic robustness checks (exclusion of three largest cities, exclusion of suburbs) implicitly test whether effects are concentrated in particular settings and find they are not.

Q6. What robustness checks are run?

Four main sets of robustness checks are reported. First, selective migration: regressions of daycare availability on distance moved from birthplace (linear, quadratic, and IHST-transformed) with the full conditioning set show no significant relationship, ruling out systematic sorting into daycare neighborhoods. Second, pre-1970 cohort exclusion: restricting to mothers with first birth after 1970 (for whom the 1970 census address is predetermined relative to birth) yields qualitatively similar results, though participation effect sizes are somewhat smaller. Third, urban geography: excluding the three largest municipalities (Copenhagen, Frederiksberg, Aarhus, Odense) and separately excluding suburbs of Copenhagen and Aarhus both leave the main results intact. Fourth, differential time trends: allowing the most populous neighborhood within each municipality to have its own set of time dummies (to capture potentially faster urban trend evolution) does not change the finding that participation and earnings effects persist beyond 30 years. The paper also shows that results are robust to an alternative participation definition based solely on ATP contributions for all years (versus mixing ATP pre-1980 and earnings post-1980).

Q7. How does this paper relate to prior work and what is its main contribution?

The prior literature falls into two camps. The short-run camp (Havnes and Mogstad 2011 for Norway; Carta and Rizzica 2018 for Italy; Bettendorf et al. 2015 for Netherlands; Cascio 2009 and Fitzpatrick 2012 for the US) documents modest to moderate employment effects during the preschool years. The medium-run camp (Lefebvre et al. 2009 and Haeck et al. 2015 for Quebec; Nollenberger and Rodriguez-Planas 2015 for Spain; Herbst 2017 for the US Lanham Act) tracks effects up to about 11–17 years. This paper’s first contribution is extending the window to 34 years — covering the majority of the working life — using Danish administrative data that allow continuous observation rather than decennial census snapshots. The second contribution is documenting the earnings-employment divergence for higher-educated mothers specifically, which was not visible in shorter windows. The third contribution is the simultaneous analysis of fertility, spacing, and parental separation as mediators using the same administrative data and identification strategy, rather than treating these as separate exercises in different papers.

Q8. What are the scope conditions and policy implications?

Several scope conditions qualify the policy implications. First, the context is a universal reform in a Nordic welfare state with strong labor market institutions and universal access; the results may not directly generalize to settings with low baseline female employment or weak formal sector employment. Second, the relevant margin for the 1960s–70s cohorts was daycare for children aged three to six; the paper notes that by recent decades the relevant margin has shifted to children under two (consistent with Simonsen 2010 finding effects for younger children in 2001 data), possibly reflecting changing cultural norms or the fact that 1960s–70s mothers had multiple children before returning to work. Third, the employment effects are larger for low-educated mothers, so the labor market attachment argument applies most forcefully to this group. Fourth, the negative fertility effects mean that the total welfare calculation must weigh labor market gains against reductions in desired family size. The policy implication the paper emphasizes is that universal daycare is an investment in long-run economic output, not merely a short-run participation subsidy, because the labor market attachment it induces during child-rearing years compounds over careers through human capital accumulation.

Q9. What is the sample and data structure?

The sample consists of 370,602 mothers who had their first child between 1964 and 1975 and were resident in Denmark in 1970 (from the census), after excluding women with immigrant backgrounds (2.2 percent) and those who died or emigrated before the first child turned 16 (0.6 percent). Employment is observed from the birth of the first child through 34 years after (1964–2009 approximately); earnings from 1980 through 2015. The daycare panel is constructed from historical yearbooks (1964–1975) and administrative registers (1976–1993) and provides yearly neighborhood-level data on daycare availability. The average mother in the sample was born in 1945, was 23.7 years old at first birth, had 10.8 years of education, and had 2.2 children total. The sample is split roughly 50/50 between low-educated and higher-educated mothers.

Q10. Why do effects appear only when the child is three, not earlier?

The paper finds that contemporary participation effects are small and statistically insignificant for years zero through two, then jump sharply at year three. The paper attributes this to two factors: (1) the universal daycare reform primarily expanded slots for children aged three to six, with nurseries for children under three expanding much more slowly through the 1980s and 1990s (Figure A.1 in the paper); and (2) cultural norms and the multi-child fertility pattern of this cohort — mothers in the 1960s–70s were more likely to have multiple children before returning to work, implying that the eldest child often reached age three or four before the mother re-entered employment. This contrasts with more recent periods (Simonsen 2010 uses 2001 data) where the relevant margin has shifted to children under two.

Key Concepts

Universal daycare: In the paper’s sense, daycare centers open to children from all socioeconomic backgrounds (not means-tested), with building costs fully publicly funded and operating costs split among state, municipality, and parents (with parents paying 30 percent), following the 1964 Danish reform. Contrasted with the pre-reform ’targeted’ system that only subsidized institutions where two-thirds of children came from low-income families.

Working lifetime effects: The paper’s central object of analysis: the causal impact of early daycare access on maternal labor outcomes measured annually across 34 years after the birth of the first child, covering the majority of the working life. Distinguished from short-run (0–7 year) and medium-run (up to 11–17 year) effects documented in prior work.

Labor market attachment: As used in the paper, the sustained connection to paid employment during the child-rearing years (when children are of preschool age). The paper argues that attachment during this period is the mechanism for long-run effects because it reduces human capital depreciation and enables on-the-job accumulation of experience and job-specific skills.

ATP (Supplementary Pension Fund) contributions: The paper’s primary employment measure for years before 1980. Annual ATP contributions are proportional to hours worked: one-third contribution corresponds to 10–19 hours/week, two-thirds to 20–29 hours/week, and full contribution to 30 or more hours/week. Used to construct both a participation dummy and a full-time employment dummy (full ATP contribution = at least 30 hours/week). Crucially, the unemployed, self-employed, and those outside the labor force made no ATP contributions during this period.

Human capital depreciation channel: The mechanism by which absence from the labor market during child-rearing years erodes previously accumulated skills (from education and prior work). The paper uses this concept, following Adda et al. (2017) and Lefebvre et al. (2009), to explain why participation effects on earnings can persist long after direct employment effects have diminished: mothers who worked during preschool years entered subsequent career phases with a larger, less-depreciated human capital stock.

Secondary fertility decisions: The paper’s term for fertility choices conditional on already having a first child, i.e., the decision to have additional children. Examined on the intensive margin (number of additional children, spacing between births) rather than extensive margin (whether to have any children), because the sample consists entirely of women who already have at least one child.

Daycare for 3–6 year olds vs. 0–2 year olds: The paper distinguishes between two types of daycare that expanded at different speeds: daycare for children aged 3–6 expanded rapidly from 1966, while nurseries for children under 3 (crèches) expanded only from the 1980s–1990s. All significant effects in the paper — on employment, fertility, and parental separation — load onto access to daycare for children aged 3–6, not 0–2, consistent with the historical timing of the expansion.

Within-Firm Pay Inequality and Productivity

Thu, 01 Jan 2026 00:00:00 +0000

Layer 1: Overview

This paper investigates how within-firm pay inequality relates to firm-level labor productivity, using a novel linkage of three confidential U.S. Census Bureau datasets covering millions of workers at hundreds of thousands of firms from 2003 to 2015.

The motivating puzzle is that the dramatic rise in U.S. wage inequality since the 1970s is well documented, but the firm-side determinants of within-firm pay dispersion have been difficult to study due to the absence of comprehensive matched employer-employee data in the United States. The paper asks whether firms’ own productivity levels can explain the structure of pay inequality within firms, and whether rising aggregate productivity can account for the secular increase in the CEO-to-median-worker pay gap.

The data come from three linked sources. The Longitudinal Employer-Household Dynamics (LEHD) program provides quarterly earnings for essentially all UI-covered workers from 2003 to 2015, covering all 50 states and Washington, D.C. These earnings encompass salaries, wages, bonuses, and exercised stock options, making them comprehensive for top earners. The Longitudinal Business Database (LBD) supplies annual firm-level revenue and employment, from which the key productivity measure — real revenue per worker, deflated to 2010 dollars using the PCE deflator — is constructed. The Management and Organizational Practices Survey (MOPS), a supplement to the Annual Survey of Manufactures conducted in 2010 and 2015, provides structured management scores (scaled 0 to 1) measuring the intensity of performance monitoring, target-setting, and incentive use across manufacturing firms. The main analysis sample restricts to firms with at least 100 full-year “6-quarter sandwich” workers to ensure clean measurement of annual earnings; it covers approximately 443,000 firm-year observations and 73,000 unique firms. A supplementary Execucomp sample (4,681 firms, 2006–2016) validates results for large publicly traded firms.

Three main findings are reported. First, employees at more productive firms earn more across the entire within-firm pay distribution — from the 1st to the 99th percentile. A 10 percent increase in productivity is associated with a 0.7 percent increase in average worker pay (elasticity 0.068). Moving from the 10th to the 90th percentile of the firm productivity distribution projects an 18 percent increase in average pay.

Second, the pay-productivity relationship is steeper at higher pay ranks — it strengthens monotonically with seniority. For a given doubling of firm productivity, the top-paid employee (likely the CEO) sees approximately 15 percent more pay, while the median-paid employee sees approximately 7 percent more. Equivalently, the pay-productivity elasticity is 0.15 for the top earner and 0.07 for the median earner. At the percentile level, a 10 percent productivity increase predicts a 0.86 percent pay increase at the 90th percentile but only 0.53 percent at the 10th percentile. Consequently, more productive firms have higher within-firm inequality: a 10 percent productivity increase widens the top-earner-to-median-worker log pay gap by 0.9 percent, and moving from the 10th to the 90th percentile of productivity projects a 23.1 percent increase in this gap. These cross-sectional results survive firm fixed effects, demographic controls (sex, education, age), industry fixed effects at the 6-digit NAICS level, and 2SLS instrumentation with industry exposures to seven major currencies, oil prices, and economic policy uncertainty (Alfaro, Bloom, and Lin 2024). Within-worker, within-firm estimates confirm the pattern dynamically: when a firm’s productivity doubles, workers earning $45,000–$65,000 expect roughly a 1 percent pay increase while workers earning above $300,000 expect nearly a 2 percent increase. The pay-productivity relationship is roughly twice as strong for top earners at publicly traded firms as at private firms (coefficient of 0.22 vs. 0.13 for rank-1 earners), while workers outside the top 50 ranks show similar coefficients across ownership types.

Third, the mechanism is traced to performance-based pay. More productive firms exhibit higher within-year pay volatility (measured as the standard deviation of quarterly log earnings within a year), particularly for top earners, consistent with larger bonus payments. Firms with higher structured management scores — capturing more intensive performance monitoring, goal-setting, and incentive pay — also show higher pay levels and higher pay volatility for top earners, with the gradient across ranks matching the productivity results.

Finally, a back-of-the-envelope calculation applies the estimated pay-productivity elasticities to observed aggregate productivity growth. Aggregate U.S. labor productivity roughly doubled (96 percent compounded growth) from 1980 to 2013. The top-earner-to-median-worker pay ratio at firms with at least 100 employees rose from 7.55 in 1980 to 8.69 in 2013 (an increase of 1.14). Applying the paper’s elasticities for rank-1 (0.1534) and rank-50 (0.0657) earners to the observed productivity doubling predicts a ratio of 8.01 in 2013 — accounting for 40 percent of the actual increase. The authors interpret this as evidence that rising productivity, channeled through differential performance pay, is a quantitatively important driver of rising within-firm inequality.

In depth

Q1. What is the primary identification strategy and what are the main threats to it?

The core cross-sectional estimates in models (1) and (2) regress percentile- or rank-specific pay on log revenue per worker, controlling for a quadratic expansion of firm-level worker demographic composition (sex, education, age and their interactions), year fixed effects, and 6-digit NAICS industry fixed effects. The main threat is omitted variable bias: unobserved firm characteristics correlated with both productivity and pay (e.g., high-skill worker sorting into high-productivity firms) could inflate estimates. The paper addresses this in three ways. First, specifications with firm fixed effects (Appendix Figure A.1) use only within-firm changes in productivity and pay, producing similar convex-across-ranks patterns. Second, the within-worker, within-firm change specification (model 4, Figure 2) holds individual workers fixed and relates earnings growth to productivity growth. Third, a 2SLS approach instruments log productivity (and its interaction with rank) using industry-level exposures to seven currency pairs, oil prices, and economic policy uncertainty constructed from rolling 10-year daily stock-return regressions by Alfaro, Bloom, and Lin (2024); the logic is that industries have idiosyncratic exposure to these aggregate shocks, so productivity movements attributable to the instruments are exogenous to individual pay-setting. The 2SLS results are broadly similar to OLS in sign and pattern, though first-stage F-statistics are approximately 3, which is weak by conventional standards. Additional tests using lagged productivity (Appendix Table A.3) show if anything stronger relationships, consistent with productivity causally passing through to pay rather than pay determining past productivity.

Q2. What are the main mechanisms proposed and how are they distinguished empirically?

The primary mechanism proposed is performance-based pay (bonuses and incentive compensation) that is disproportionately concentrated among senior managers at more productive firms. The paper cannot directly observe bonus pay in the LEHD, which reports total quarterly earnings. Instead, it uses within-year pay volatility — the standard deviation of log quarterly earnings within a calendar year — as a proxy for bonus income (most visibly fourth-quarter bonus payments). Figure 4 shows that top earners at more productive firms have significantly higher pay volatility, and this relationship is steeper at higher ranks, exactly paralleling the pay-level results. The management channel is examined separately: Figure 5 shows that firms with higher MOPS structured management scores (capturing explicit monitoring, target-setting, and incentive-pay practices) display higher pay levels and higher pay volatility for top earners, again with the gradient increasing at the top. The public-vs.-private ownership comparison is a further diagnostic: if performance-based executive compensation is the mechanism, it should be stronger at publicly traded firms, where stock grants, option awards, and formal incentive contracts are more prevalent. Panel a of Figure 3 confirms the top-earner pay-productivity coefficient is 0.22 at public firms and 0.13 at private firms, while workers outside the top 50 show similar coefficients across ownership type. This asymmetry is robust to reweighting public firms to match the employment distribution of private firms (panel b of Figure 3), ruling out pure size effects as the explanation.

Q3. What heterogeneity is documented across sectors, firm age, and ownership type?

Across sectors (Appendix Figure A.2), the positive and convex pay-productivity gradient across earnings ranks is present in nearly all 18 two-digit NAICS sectors. Shallower (less convex) patterns appear in utilities, finance and insurance, and health, which the authors attribute to heavy regulation limiting scope for differential performance pay across ranks. Across firm age groups (Appendix Figure A.3), the pattern holds across firms younger than 10 years, between 10 and 25 years, and 25 or more years. Across ownership, the pay-productivity relationship for top earners is roughly twice as large in publicly traded firms as in privately held firms, while the relationship for workers outside the top 50 is similar. Within publicly traded firms, the LEHD top-earner coefficients closely match those for named executives in the Compustat Execucomp data (Figure 3, panel a), validating both the LEHD measure of top earnings and the Execucomp-based executive pay literature.

Q4. What robustness checks are run?

The paper runs the following robustness checks: (1) Full demographic controls — a quadratic expansion of firm-level shares by sex, education category, and age group, plus interactions — included in all baseline regressions to account for worker sorting. (2) 6-digit NAICS industry fixed effects to net out cross-industry pay and productivity variation. (3) Firm fixed effects (Appendix Figure A.1): the convex pattern across ranks survives when only within-firm variation in productivity and pay is used. (4) Sector heterogeneity analysis (Appendix Figure A.2): the main pattern holds across nearly all 18 two-digit NAICS sectors. (5) Firm age heterogeneity (Appendix Figure A.3): results hold across all age groups. (6) Reweighting public firms to match private firms’ employment distribution (Figure 3, panel b): the stronger pay-productivity gradient for top earners at public firms is not explained by their greater average size. (7) Size controls: including log total LEHD employment does not eliminate the pattern. (8) 2SLS with macroeconomic instruments: similar signs and pattern to OLS, supporting causal interpretation despite weak first stages. (9) Lagged productivity (Appendix Table A.3): if anything, the pay-productivity relationship by rank is slightly stronger when using prior-year productivity, reducing reverse-causality concerns. (10) Comparison to Execucomp: the LEHD public-firm top-earner coefficients align with those from Execucomp named executives. (11) Analysis of sandwich-worker selection (Appendix Table A.1): workers at more productive firms are marginally more likely to remain sandwich workers the following year, with this pattern slightly stronger at lower earnings ranks; the paper discusses this selection and argues it does not drive the main results.

Q5. What exactly is the LEHD earnings measure and how does it capture bonuses?

The LEHD is based on state unemployment insurance (UI) wage records submitted by employers. It captures total quarterly earnings, including salaries, wages, bonuses, stock option exercises, and restricted stock awards when vested. Qualified (incentive) stock options are not subject to UI tax and are excluded, but these are capped and the paper judges them immaterial for top earners. The quarterly frequency of the data allows the paper to construct within-year pay volatility (the standard deviation of log quarterly earnings in a year) as a proxy for bonus income, since bonus payments typically appear as spikes in Q4. The paper uses only non-imputed demographic characteristics from ancillary LEHD sources; imputed values (e.g., education, which is imputed for 88 percent of individuals) are replaced with a constant and flagged with a missing-value indicator.

Q6. How exactly is firm productivity measured and what are its limitations?

Productivity is measured as real revenue per worker (log scale), with nominal revenue deflated to 2010 dollars using the PCE deflator. Revenue and employment come from the LBD, which covers all non-farm sectors from 1997 onward. This is a revenue-based labor productivity measure, not total factor productivity, and no industry-level price deflators are used beyond the economy-wide PCE; instead, 6-digit NAICS industry fixed effects control for cross-industry differences in revenue-per-worker levels. The LBD’s revenue coverage may be biased toward older, more stable firms, but the paper argues this has minimal impact because its sample is already restricted to large firms (at least 100 full-year workers). The paper explicitly contrasts its broad economy-wide measure with more granular TFP measures available only for manufacturing and in Economic Census years.

Q7. What is the structured management score and what does it measure?

The structured management score is derived from 16 core questions in the MOPS asking plant managers about practices in three domains: performance monitoring, target setting, and incentivization of workers. Each question is scored 0 to 1, where 0 reflects least structured (less explicit, formal, frequent, or specific) and 1 reflects most structured (more explicit, formal, frequent, or specific). The firm-level score is an employment-weighted average of establishment-level scores (requiring at least 10 non-missing responses per establishment). It ranges from 0 to 1 and follows the methodology of Bloom et al. (2019), who establish that higher scores predict higher establishment-level productivity. Because MOPS targets manufacturing establishments surveyed in the ASM, the management sample is a 2.5 percent subset of the main sample, resulting in wider standard errors for management-related estimates. The paper treats this score as an indirect proxy for the adoption of performance-based incentive systems.

Q8. How does this paper relate to and differ from Song et al. (2019) and the broader between-firm vs. within-firm inequality literature?

Song et al. (2019), also using linked LEHD-LBD data, document that the rise in U.S. earnings inequality between 1978 and 2013 was driven predominantly by increases in between-firm pay dispersion, with within-firm inequality rising more modestly. This paper takes the within-firm inequality result as a starting point and asks what firm characteristics predict cross-sectional and dynamic variation in within-firm inequality. The key addition is connecting within-firm pay dispersion to revenue labor productivity and to management practices, neither of which Song et al. (2019) directly analyze. The paper uses Song et al.’s published aggregate statistics on top-earner and median-earner pay (from their Figure VI) as the benchmark for the back-of-the-envelope calculation linking rising productivity to rising inequality. More broadly, the paper contributes to a cross-country literature (Barth et al. (2016), Card, Heining, and Kline (2013), Faggio, Salvanes, and Van Reenen (2010), Mueller, Ouimet, and Simintzi (2017)) that documents firms as the locus of increasing wage dispersion, by providing a specific firm-level mechanism — productivity and performance-pay practices.

Q9. How does this paper relate to and differ from the CEO pay literature?

The CEO pay literature (Gabaix and Landier (2008), Frydman and Jenter (2010), Kaplan (2013), Edmans and Gabaix (2016)) debates whether rising CEO pay reflects performance, firm size, or rent extraction, but typically studies only the named top executives at large publicly traded firms covered by Execucomp. This paper’s key innovation is extending the analysis to all workers across the full within-firm pay distribution, for millions of U.S. workers at firms of all sizes and ownership types. It finds that the pay-productivity gradient is present across all earnings ranks, not only at the CEO level, though it is steeper at the top. The paper validates its LEHD-based top-earner results against Execucomp, finding close agreement for publicly traded firms, and interprets the public-vs.-private differential as consistent with formal performance-based executive contracts being more prevalent at public firms — a finding consistent with Gao and Li (2015), who show CEO pay-performance sensitivity is greater at public firms.

Q10. What are the aggregate inequality implications and how robust is the 40 percent estimate?

The 40 percent figure comes from a back-of-the-envelope calculation in Table 4. Using Song et al.’s (2019) data, the top-earner-to-median-worker pay ratio rose from 7.55 in 1980 to 8.69 in 2013 (a change of 1.14). Aggregate U.S. labor productivity grew 96 percent compounded over this period (sourced from FRED series PRS85006092). The paper applies the pay-productivity elasticities for rank-1 (0.1534) and rank-50 (0.0657) earners from Figure 1 to this productivity growth to predict earnings levels in 2013. The predicted top-earner mean earnings is $224,357 (versus actual $301,614) and predicted median mean is $28,013 (versus actual $34,702), yielding a predicted ratio of 8.01 and an explained change of 0.46, which is 40.13 percent of the actual change of 1.14. The authors label this a ‘simple back-of-the-envelope’ calculation and do not claim it as a structural decomposition. Key caveats: (i) the cross-sectional elasticities from 2003–2015 are applied to a 1980–2013 trend, assuming stability of these relationships over time; (ii) aggregate productivity growth may also shift the productivity distribution of firms, which the calculation does not fully model; (iii) the calculation attributes none of the remaining 60 percent, which could include technology, globalization, changing labor market institutions, or other forces.

Q11. What is the role of firm size in explaining the results?

Publicly traded firms in the sample are substantially larger than private firms on average (mean 7,763 versus 491.7 full-year employees). To ensure the stronger pay-productivity gradient at public firms is not simply a size artifact, the paper reweights public firms to match the employment distribution of private firms (using ventile-based inverse-probability weights) and finds the differential persists (panel b of Figure 3). The paper also includes log total LEHD employment as a control in additional specifications and reports similar results. The large-firm pay premium literature (Brown and Medoff (1989), Oi and Idson (1999)) posits that large firms pay more due to compensating differentials, monitoring difficulties, or rent-sharing. The paper’s finding that pay is higher at more productive firms across the entire earnings distribution is interpreted as more supportive of the rent-sharing explanation, since compensation-based and monitoring-based explanations would not apply uniformly to all workers.

Q12. What are the policy implications and their scope conditions?

The main policy-relevant implication is that rising productivity — itself associated with technology adoption and innovation — contributes substantially (estimated 40 percent) to the CEO-to-median-worker pay gap that the Dodd-Frank Act requires publicly traded firms to disclose annually from 2018. This implies that policies targeting within-firm pay inequality may need to grapple with the fact that a significant share of observed inequality is tied to real productivity differences and performance-pay practices, not purely to governance failures or rent extraction. However, several scope conditions limit this implication: the 40 percent figure is an economy-wide back-of-the-envelope estimate with caveats about stability of elasticities over time; the paper does not assess whether performance pay practices are optimally structured or reflect rent-seeking; the mechanism analysis uses pay volatility and management scores as proxies rather than direct observation of bonus contracts; and the remaining 60 percent of the inequality increase is left unaccounted for, potentially reflecting factors outside the paper’s framework.

Q13. What are the key data limitations and potential measurement concerns?

Several limitations are acknowledged or implicit. (1) Revenue labor productivity is used rather than TFP; the measure conflates product demand and productivity shocks and does not adjust for industry-specific output price variation. (2) LEHD earnings exclude qualified (incentive) stock options not subject to UI tax; the paper argues these are capped and immaterial for top earners, but this may understate total compensation for senior executives, especially at technology firms. (3) Within-year pay volatility is used as a proxy for bonus income rather than direct bonus data. (4) The management sample is confined to firms with at least one manufacturing establishment in the MOPS, covering only 2.5 percent of main-sample firm-year observations, limiting precision. (5) Education is imputed for 88 percent of individuals in the LEHD; the paper uses only non-imputed values and controls for missingness, but this reduces demographic control precision. (6) The IV first-stage F-statistics are approximately 3, suggesting weak instruments, so 2SLS standard errors are wide and the causal interpretation should be taken cautiously. (7) The sample is restricted to firms with at least 100 full-year workers, so results do not speak to smaller firms, which employ a large share of the U.S. workforce.

Key Concepts

Revenue labor productivity: Real revenue per worker at the firm level, computed from LBD annual revenue deflated to 2010 dollars using the PCE deflator and divided by total firm employment; the paper’s primary measure of firm performance, entered in log form in all regressions.

Pay-productivity elasticity (by rank): The regression coefficient on log firm productivity in a regression of mean log annual earnings for a given within-firm earnings rank or percentile; the paper documents that this elasticity rises monotonically from approximately 0.07 for the median earner to 0.15 for the top earner (rank 1), producing a convex schedule across ranks.

Within-firm earnings inequality: Dispersion in annual earnings among full-year workers within a single firm in a given year; measured variously as the 90th-10th percentile log earnings gap, the 99th-10th gap, the top-earner-to-50th-percentile gap, and the top-earner-to-10th-percentile gap.

Within-year pay volatility: The standard deviation of log quarterly earnings within a calendar year for a given worker rank; used as a proxy for variable (bonus) compensation since it captures deviations from a constant salary path, particularly fourth-quarter bonus payments.

Structured management score (MOPS): A continuous index bounded between 0 and 1 derived from 16 MOPS survey questions on performance monitoring, target-setting, and worker incentivization practices; higher values indicate more explicit, formal, frequent, and specific management practices, following the scoring methodology of Bloom et al. (2019).

6-quarter sandwich worker: An individual who is employed at and earns above the minimum wage at the same firm in all four quarters of the current year, the fourth quarter of the prior year, and the first quarter of the following year; the restriction ensures that measured annual earnings reflect genuine full-year employment rather than partial-year spells or job transitions.

DHS (Davis-Haltiwanger-Schuh) growth rate: A symmetric growth rate measure defined as (x_t - x_{t-1}) / (0.5 * (x_t + x_{t-1})), bounded between -2 and 2; used in the within-worker, within-firm change analysis to measure both earnings growth and productivity growth while accommodating entry and exit.

Top-earner-to-median-worker pay ratio: The ratio of mean annual earnings of the highest-paid worker to mean annual earnings of the median-paid worker within firms, aggregated across firms of different sizes using employment weights; the Dodd-Frank Act metric that publicly traded firms have been required to disclose annually since 2018, and the paper’s primary metric for the aggregate inequality calculation.

Zero-hours Contracts in a Frictional Labour Market

Thu, 01 Jan 2026 00:00:00 +0000

Layer 1: Overview

Dolado, Lalé, and Turon build a structural equilibrium model of the U.K. low-wage labour market to evaluate zero-hours contracts (ZHCs), employment agreements under which firms are not required to guarantee any minimum working hours and workers may decline any hours offered. The paper’s central question is whether ZHCs raise or lower welfare in general equilibrium, and through which channels. The model features two-sided heterogeneity in a random-search-and-matching environment: firms differ in the volatility of their labour demand, workers differ in their relative preferences for flexible versus regular employment, and wages are fixed at or near the statutory minimum wage. Three mechanisms operate simultaneously. First, a job-creation effect: firms facing highly volatile demand that cannot profitably hire under regular terms enter the market only because ZHCs exist. Second, a substitution effect: some firms that could hire under regular contracts instead post ZHC vacancies, crowding out regular employment. Third, a labour-force-participation effect: workers with a strong preference for flexible schedules join the labour force specifically because ZHCs exist and would withdraw if ZHCs were banned.

The model is calibrated to U.K. Labour Force Survey data for the low-pay segment (roughly 16 percent of total employment), covering September 2018 through March 2020, with a sample of 9,342 individuals aged 16 to 69. A mixture-of-exponentials approach due to Karlis and Xekalaki (1999) applied to job-tenure and unemployment-duration distributions reveals statistically exactly two worker types in both ZHC employment and unemployment, and only one in regular employment, consistent with the presence of R-best workers (who prefer regular employment but accept ZHCs as a stepping stone) and Z-only workers (who would exit the labour force without ZHCs) but not R-only or Z-best workers. Calibrated parameters include a biweekly job-finding rate of λ(θ) = 0.051, a job-destruction probability of δ = 0.005, an on-the-job search efficiency of x = 0.352, and a share of R-best workers of ζ_{R-best} = 0.969. The matching function elasticity ψ is estimated to be 0.65 from U.K. occupation-level hiring and vacancy data (range 0.60–0.70 across specifications). ZHC employment accounts for 6.5 percent of the low-wage employment stock but 19.4 percent of vacancies, because higher turnover in ZHC jobs causes them to be re-advertised more frequently.

A ban on ZHCs — simulated as an extreme tightening of flexible-work regulation — raises the unemployment rate by 2.0 to 2.7 percentage points depending on the assumed volatility of ZHC firms’ demand. When ZHC workers have a low enough disutility of labour that they remain in the workforce after a ban (accepting regular jobs instead), the employment rate falls by the same 2.0 to 2.7 p.p., and sectoral GDP falls by only 0.02 to 0.14 percent, because higher average hours per employed worker partially offset the employment decline. When ZHC workers’ disutility is high enough that they withdraw from the labour force, the employment-rate fall is larger — 4.8 to 5.4 p.p. — and sectoral GDP falls by 2.9 to 3.2 percent. Decomposing via the model’s analytical formula (Proposition 4a), lower job creation alone would reduce regular employment by almost 30 percent in isolation (λ(tilde-θ)/λ(θ) = 0.71), but this is partially offset by reduced vacancy competition (+24 percent, ceteris paribus) and improved search efficiency for regular jobs (+15 percent, ceteris paribus) after the ban.

Welfare effects are measured in consumption-equivalent variation units. In general equilibrium, R-best workers (those who prefer regular jobs but sometimes hold ZHCs as a stepping stone) suffer welfare losses of −0.5 to −0.6 percent of consumption from a ZHC ban, driven primarily by longer expected unemployment spells. Yet in a partial equilibrium experiment that converts their ZHC jobs to regular jobs while holding all other equilibrium objects fixed, these same workers gain approximately +0.2 percent: the substitution effect is genuinely welfare-improving for them in isolation, but the job-creation channel dominates in general equilibrium and more than reverses that gain. Z-only workers — those who would exit the labour force if ZHCs were banned — suffer general-equilibrium welfare losses of −1.7 to −2.0 percent (low-disutility scenario) or approximately −1.8 to −2.1 percent (high-disutility scenario). These losses exceed the losses to R-best workers because Z-only workers are also forced into a type of employment they strictly prefer to avoid. The paper concludes that a ZHC ban is welfare-reducing for all workers in general equilibrium, and proposes that policy instead target ZHC use toward matches where workers voluntarily choose flexibility (Recommendation P1) and toward small firms that cannot diversify demand volatility across many positions (Recommendation P2).

In depth

Q1. What is the model’s core structure and what frictions drive the results?

The model is a discrete-time steady-state random-search-and-matching model with two-sided heterogeneity. Workers are heterogeneous in their flow payoffs from regular employment (ω^i_R), flexible ZHC employment (ω^i_Z), and non-employment (ω^i_N), with these payoffs shaped by CRRA utility over consumption and a type-specific disutility of hours worked (α^i). Firms are heterogeneous in the volatility of their demand shock (σ_j), which determines the expected profit flow under each contract type. Flow profits depend on how actual hours h deviate from a stochastic target h-tilde via a quadratic loss specification. Market tightness θ is determined endogenously by free entry. The key friction is random search: workers cannot direct their search to their preferred contract type, so R-best workers sometimes end up in ZHCs and must search on-the-job to move to regular employment.

Q2. How are worker types identified empirically, and why only two types?

The paper adapts a mixture-of-exponential distributions procedure from Karlis and Xekalaki (1999), applied separately to the duration distribution of ZHC employment, regular employment, and unemployment in LFS data. A bootstrapped sequential hypothesis test determines the number of latent classes M* that best fits the survival function. For ZHC employment, two exponential components are needed (p-value for M=1 vs. M≥2 is 0.01; for M=2 vs. M≥3 it is 0.74). For regular employment, one component suffices (p-value for M=1 vs. M≥2 is 0.99). For unemployment, again two components (p-values 0.01 and 0.93 respectively). Cross-referencing which types are present in which states using the model’s theoretical exit-rate table rules out R-only and Z-best workers, leaving only R-best and Z-only workers as consistent with all three distributions simultaneously.

Q3. What is the identification strategy and what are the main threats to it?

Identification rests on three steps. First, the mixture-of-exponentials procedure identifies the number of worker types from shape of duration distributions; this step relies on recalled job tenure and unemployment duration, which the authors acknowledge may suffer from recall bias and heaping (rounding to salient durations). Second, the turnover parameters are calibrated by minimizing distance between model-implied and empirical transition matrices across U, Z, and R states from the longitudinal LFS; the main limitation noted is that the two moments (transitions and durations) are not jointly consistent because they come from different measurement processes. Third, flow profits and payoffs are calibrated to external moments (minimum wage, replacement rate, business creation costs) and the preference for ZHC hours; the hours volatility parameter σ_Z has no direct empirical counterpart and is varied across scenarios. The model abstracts from wage bargaining, treating wages as fixed at the minimum wage, which reduces scope for confounding but is an approximation even in the low-wage sector.

Q4. How are the three channels — job creation, substitution, and labour-force participation — distinguished in the quantitative analysis?

The job-creation channel is captured by Z-only firms (firms with σ_Z = 6 such that regular employment is not profitable): removing ZHCs forces these firms out of the market entirely, reducing labour market tightness θ and hence the aggregate job-finding rate λ(θ). The substitution channel is captured by Z-best firms (σ_Z = 3): these firms could profitably hire under regular contracts but choose ZHCs, and after a ban they convert vacancies to regular posts, with incomplete crowd-out due to general equilibrium adjustment. The labour-force-participation channel is captured by Z-only workers: those with disutility α^i above the threshold (WTP > £7.9 per week to avoid regular work) withdraw from the labour force when ZHCs are banned, while those below the threshold remain and take regular jobs. The paper runs scenarios that vary both the firm side (low vs. high volatility) and the worker side (low vs. high disutility) to disentangle the magnitude of each channel.

Q5. What is the decomposition of the effect on regular employment (Proposition 4a)?

Under the calibrated parameters (no Z-best workers), regular employment in the baseline relative to the no-ZHC counterfactual equals the product of three multiplicative terms. The job-creation term is λ(θ)/λ(tilde-θ) = 1/0.71 ≈ 1.41, meaning that ZHCs raise the job-finding rate by about 41 percent relative to the no-ZHC counterfactual. The vacancy-competition term vR/v ≈ 0.81 (80.6 percent of vacancies are for regular jobs, while the remaining 19.4 percent for ZHC jobs dilute the pool). The search-efficiency term captures the fact that some R-best workers are in ZHC employment and search on-the-job at reduced intensity x < 1. The ceteris paribus decomposition at the ban scenario indicates: job creation alone would cut regular employment by 29 percent; competition reduction adds 24 percent; and search-efficiency gains add 15 percent — so the post-ban equilibrium has higher regular employment despite worse job creation overall.

Q6. How does the paper handle the partial versus general equilibrium distinction for welfare?

For R-best workers, the PE experiment replaces their ZHC jobs with regular jobs while keeping all other equilibrium objects (tightness θ, vacancy composition, etc.) fixed. This isolates the substitution effect and yields a welfare gain of approximately +0.15 to +0.18 percent for R-best workers. In general equilibrium, the full ban requires θ to fall (less job creation), which extends unemployment spells, and the net welfare effect is −0.50 to −0.62 percent. The difference between GE and PE therefore quantifies the job-creation externality that ZHCs provide — approximately 0.65 to 0.80 percentage points of consumption equivalent variation for R-best workers. For Z-only workers, the PE experiment replaces ZHC jobs with non-employment (their next-best option in the baseline), yielding PE welfare changes of −2.94 to −3.28 percent, which overstates the GE loss (−1.65 to −2.0 percent) because GE adjustment allows some Z-only workers to take regular jobs, partially compensating for the loss of ZHC access.

Q7. What heterogeneity is documented in the data for U.K. ZHC workers?

ZHC employment is concentrated at both ends of the age distribution: workers aged 16–29 are over-represented, as are workers aged 55–69, relative to regular employment. Mean age is 40.8 years for ZHC workers vs. 46.3 for regular workers. Gender composition is similar: 56.5 percent female in ZHCs vs. 60.4 percent female in regular employment, a difference that is not statistically significant. Educational attainment distributions are similar: 21.9 percent of ZHC workers hold a degree vs. 18.0 percent of regular workers. By industry, ZHC employment is heavily concentrated in Accommodation and food services (19.9 percent), Health and social work (20.5 percent), and Arts, entertainment and recreation (6.7 percent). Average hours worked are 18.4 per week for continuously employed ZHC workers vs. 28.1 for regular contract workers; the standard deviation of hours is 7.8 vs. 7.2. 16.6 percent of ZHC workers report wanting more hours vs. 10.1 percent in regular contracts, and 18.2 percent of ZHC workers are looking for another/additional job vs. 5.0 percent of regular workers, suggesting a minority are in involuntary underemployment while a majority are not actively seeking to change.

Q8. What are the key calibrated parameter values and how do they compare to the broader literature?

The biweekly job-finding rate λ(θ) = 0.051; the biweekly job-destruction probability δ = 0.005; on-the-job search efficiency x = 0.352 (authors note this is on the high end but consistent with estimates accounting for flexible work); share of R-best workers ζ_{R-best} = 0.969; share of type-R vacancy-posting firms γ_R = 0.950. The matching function elasticity ψ = 0.65 (estimated from U.K. data, range 0.60–0.70, higher than the commonly used 0.50 but consistent with bias-corrected estimates from Borowczyk-Martins et al. 2013). The job-filling rate is 0.21 per biweek, consistent with Kuhn et al. (2021) U.K. estimates of 0.35–0.38 per month. The vacancy posting cost κ = £36.3 per week and startup cost K = £4,376, the latter close to the £4,500 implied by U.K. business creation data. Non-employment income b = £148.8 per week (replacement ratio 80 percent). The minimum wage is set to £7.50 per hour (2017 U.K. National Living Wage); labour productivity p = £8.25, implying a 10 percent productivity premium over the minimum wage.

Q9. What robustness checks are run, and do the main results change?

The authors run three main robustness analyses. First, they vary the hours parameters: an alternative calibration uses σ_Z = 4.5 for both firm types but differentiates by mean hours (µ_Z = 20 for Z-best, µ_Z = 16 for Z-only); employment and unemployment effects are modestly smaller than the baseline but welfare effects are nearly identical. Second, they hold µ_Z = 18 and vary σ_Z to 1.0 (low) and 8.0 (high); results move in the expected direction and remain broadly consistent. Third, they vary the targeted job-filling rate: at λ(θ)/θ = 0.16 (25 percent lower than baseline), the unemployment response to a ZHC ban is only 0.33–0.51 p.p. and GDP effects are positive in the low-disutility case; at λ(θ)/θ = 0.26 (25 percent higher), unemployment rises by 4.1–5.5 p.p. and sectoral GDP falls by up to 6 percent. The authors conclude that the baseline calibration of 0.21 is the most plausible. The qualitative conclusions — that GE welfare effects are negative for all workers — are robust across specifications.

The closest model-based study is Scarfe (2019) on casual work in Australia. Scarfe’s model features homogeneous agents ex ante, with contract choice driven by luck (stochastic match productivity), while Dolado et al. emphasise ex ante heterogeneity in preferences/profitability as the primary source of variation. The empirical study of Datta et al. (2019) documents U.K. ZHC characteristics using LFS, online survey, and matched employer-employee data from the social care sector; Dolado et al. use the LFS but impose structural discipline to recover preference parameters and conduct GE welfare analysis. The paper differs from the dual labour market literature (Cahuc et al. 2016, 2020; Créchet 2022) in that temporary jobs in that literature have a fixed expiration date, whereas ZHCs are jobs with potentially long tenure but endogenously lower expected duration due to on-the-job search quit-outs, not contractual termination. Mas and Pallais (2017) and Angelici and Profeta (2020) use field experiments to estimate workers’ valuation of flexibility; Dolado et al. instead recover this from duration distributions, allowing for general equilibrium job-creation and participation effects that field experiments cannot capture.

Q11. What are the sorting patterns in the equilibrium, and what sustains ZHC jobs?

In the baseline equilibrium, 66.8 percent of filled ZHC jobs are held by R-best workers (workers who prefer regular employment but accept ZHCs as a stepping stone). Only 4.8 percent of employed R-best workers are in ZHCs at any point in time, because most vacancies are for regular jobs (80.6 percent of vacancies). This sorting has a crucial implication: ZHC vacancies would not be viable without the presence of R-best workers, because Z-only workers alone are too few to sustain the ZHC sector in equilibrium. A firm posting a ZHC vacancy accepts a higher worker-turnover risk (R-best workers quit on-the-job once they find a regular vacancy) in exchange for the profit advantage of hours flexibility; the trade-off is viable only because the random search pool contains enough R-best workers willing to take ZHC jobs temporarily.

Q12. What are the policy implications and their scope conditions?

The paper identifies four recommendations. P1: restrict ZHCs to matches where the worker voluntarily chooses the flexible contract when offered a choice; this would protect R-best workers who currently end up in ZHCs due to search frictions from the substitution effect without eliminating the job-creation channel. P2: prioritise access to ZHCs for small firms (as a proxy for inability to diversify demand shocks), limiting substitution by large firms while preserving genuine job creation by high-volatility operators. P3: recognise that the allocation of hours-flexibility between firms and workers is often an implicit and incomplete contract rather than an explicit one. P4: regulate the sharing of hours flexibility — specifically, who controls the timing and quantity of work — to reduce the income uncertainty that generates the main political objections to ZHCs. The scope conditions for all recommendations are: the low-wage sector of the U.K. labour market; the results do not directly apply to higher-wage workers with more bargaining power, or to markets where exclusivity clauses remain common.

Q13. What key empirical facts about ZHC flows does the paper document?

From the transition matrix estimated from LFS data: 11 percent of exits from unemployment are to ZHC employment. The rate of transition to unemployment is almost 50 percent larger in ZHC employment than in regular employment (6.2 percent vs. 4.4 percent semi-annually). Job-to-job transitions from ZHC to regular employment are 6.5 percent semi-annually; the reverse (regular to ZHC) is only 0.5 percent. Nearly half of ZHC workers report job tenures longer than two years. 9.2 percent of ZHC workers were recruited in the last three months vs. 3.4 percent of regular workers; 30.3 percent of ZHC workers have been with their employer less than one year vs. 14.3 percent in regular contracts. The non-employment rate for this low-pay segment is 11.2 percent; ZHCs account for 4.6 percent of the overall sample (5.2 percent of employees), about 1.5 times the aggregate U.K. incidence rate.

Q14. What does the model say about time spent out of regular employment following a ZHC ban?

Despite higher aggregate unemployment rates after the ban, R-best workers spend less total time out of regular employment: the duration of non-regular-employment spells decreases by 7 weeks. This is because ZHCs, by acting as a stepping stone, expose workers to more frequent labour market transitions — they cycle through unemployment, ZHC employment, and regular employment rather than simply unemployment and regular employment. The ban removes the ZHC stepping stone, so workers face longer individual unemployment spells but avoid the ZHC-employment phase, and on net spend more time in regular employment. However, this does not translate into a welfare gain because (a) ZHC employment, even if imperfect, provides utility above the unemployment level, and (b) the longer unemployment spells that do occur under a ban are more costly than the shorter ZHC spells they replace.

Key Concepts

Zero-hours contract (ZHC): In the paper’s sense, an employment arrangement under which the employer is not obligated to provide any minimum guaranteed hours of paid work, and the worker is not required to accept any hours offered. Workers on ZHCs in the U.K. hold ‘worker’ status (between employee and self-employed), entitling them to holiday pay, minimum wage protections, and Universal Credit, but not redundancy pay. The key feature for the model is that actual hours worked equal the firm’s demand realisation, eliminating the quadratic deviation costs that arise under fixed-hours regular contracts.

R-best workers: In the paper’s worker taxonomy, individuals for whom the asset value of regular employment strictly exceeds that of ZHC employment, which in turn exceeds the asset value of non-employment (W^i_R > W^i_Z > N^i). These workers accept ZHCs as a stepping stone when regular jobs are unavailable, and search on-the-job (at reduced efficiency x) for regular vacancies. They constitute 96.9 percent of the low-wage sector in the calibration and account for two-thirds of filled ZHC jobs.

Z-only workers: Workers for whom the asset value of ZHC employment exceeds both the value of regular employment and non-employment (W^i_Z > N^i > W^i_R, or W^i_Z > W^i_R > N^i), and who prefer non-employment to regular work. Without ZHCs, these workers’ participation in the labour market depends on whether their disutility parameter α^i implies ω^i_R > ω^i_N. A subset — those with high disutility (WTP > £7.9 per week to avoid regular work) — exit the labour force if ZHCs are banned, generating the participation effect.

Z-only firms: In the paper’s firm taxonomy, firms with high demand volatility (σ_Z = 6 in the calibration) for which regular employment is not profitable (V^j_R < 0 < V^j_Z). These firms can only operate and post vacancies because ZHCs allow them to set actual hours equal to realised demand. A ban on ZHCs causes Z-only firms to exit entirely, generating the pure job-creation loss.

Z-best firms: Firms with moderate demand volatility (σ_Z = 3 in the calibration) that could profitably post regular vacancies (V^j_R > 0) but prefer ZHC vacancies because the hours-flexibility profit advantage outweighs the higher quit risk from R-best workers. A ban redirects these firms to regular contracts, constituting the substitution effect on the firm side.

Stepping-stone effect: The mechanism by which R-best workers accept ZHC employment when unemployed, using it as a bridge to search on-the-job for regular employment. ZHCs therefore simultaneously reduce unemployment duration and extend the time workers spend out of regular employment. The paper documents that a ZHC ban reduces total time out of regular employment by 7 weeks for R-best workers despite raising the unemployment rate, precisely because the stepping-stone pathway — which adds a ZHC phase before reaching regular employment — is eliminated.

Consumption equivalent variation (welfare measure): The percentage permanent change in consumption that would make a worker indifferent between the baseline equilibrium (with ZHCs) and the counterfactual (ZHC ban). The paper uses this metric to express welfare effects: R-best workers suffer losses of −0.50 to −0.62 percent, and Z-only workers suffer losses of −1.65 to −2.0 percent, in general equilibrium following a ZHC ban.

Mixture-of-exponentials identification of worker types: A statistical procedure adapted from Karlis and Xekalaki (1999) that fits the empirical distribution of job tenure or unemployment duration as a mixture of M exponential distributions. Each component corresponds to a latent class of workers exiting the labour market state at a distinct rate. The optimal number of components M* is chosen via a bootstrapped sequential hypothesis test. Applied to U.K. LFS data, the procedure identifies M* = 2 for ZHC employment and unemployment, and M* = 1 for regular employment, which the model interprets as evidence for R-best and Z-only worker types.

Estimating the Interest Rate Trend in a Shadow Rate Term Structure Model

Wed, 01 Jan 2025 00:00:00 +0000

This paper proposes a shadow rate no-arbitrage dynamic term structure model (SDTSM) with drifting trends to estimate the long-run trend of the real interest rate using yield curve data from the U.S., U.K., and Germany from January 1972 to April/March 2022. The model combines the shadow rate approach of Wu and Xia (2016) to handle the zero lower bound with the shifting endpoint of Bauer and Rudebusch (2020) to capture low-frequency movements. Interest rate trends in all three countries have declined since the 1990s, with strong co-movement among them. The model provides better yield forecasts than existing models. Term premium estimates from the model are stationary and positively correlated with inflation uncertainty measures, corroborating Wright (2011). Under the convention that all permanent shocks to real interest rates are derived from real shocks, the model’s trend estimate also serves as a measure of the natural rate of real interest.

In depth

Q1. What are the two key modeling innovations?

The model combines two innovations: (1) a shadow rate approach following Wu and Xia (2016) to handle the zero lower bound (ZLB)—defining the policy rate as max(shadow rate, lower bound) so that the model remains valid when rates are near zero; and (2) a drifting trend (shifting endpoint) following Bauer and Rudebusch (2020) to capture the slow downward movement of the interest rate trend since the 1990s. Combining these two features is the paper’s key contribution: existing shadow rate models (Wu-Xia) do not model the low-frequency trend; existing shifting-endpoint models (Bauer-Rudebusch) do not account for the ZLB. The combination produces better-identified trend estimates because the shadow rate summarizes financial conditions including the effects of unconventional monetary policy.

Q2. Why use the full yield curve rather than a few selected maturities?

Using the full yield curve with no-arbitrage restrictions allows the model to exploit all information in the Treasury bond market and impose internally consistent restrictions on how maturities are related, improving estimation efficiency relative to models that select a few yields and do not impose no-arbitrage restrictions (e.g., Del Negro et al. 2017; Johannsen and Mertens 2021). The failure of the pure expectations hypothesis implies that a model handling term premiums coherently and flexibly is necessary to correctly extract interest rate trends from long-term yields; the no-arbitrage DTSM provides this structure while also being free of the liquidity premium complications in TIPS-based models.

Q3. What are the main empirical findings about the interest rate trend?

Interest rate trends in the U.S., U.K., and Germany have all declined since the 1990s, with strong co-movement among them; under the convention that all permanent shocks to real interest rates are derived from real shocks, the paper’s trend estimate can be interpreted as a trend estimate of the natural rate of real interest. The strong international co-movement is consistent with global factors—such as declining trend output growth, rising savings, and global safe asset demand—driving the secular decline in real interest rates rather than purely country-specific factors.

Q4. What is the relationship between term premiums and inflation uncertainty?

Term premium estimates from the model are stationary (rather than trending downward as in some models where the trend and the term premium are not well separated) and are positively correlated with inflation uncertainty measures, corroborating Wright (2011)’s finding that term premiums are driven partly by inflation risk. The stationarity of term premiums is a desirable property that results from properly separating the trend component (modeled via the shifting endpoint) from the cyclical component; models that do not include a shifting endpoint may attribute some of the trend to the term premium, producing non-stationary term premium estimates.

Key concepts

shadow rate dynamic term structure model (SDTSM) : a term structure model in which the policy rate is defined as the maximum of a latent shadow rate and the effective lower bound, following Wu and Xia (2016); allows the model to be estimated without modification when short-term rates are near zero. drifting trend (shifting endpoint) : a slow-moving unconditional mean of interest rates that evolves over time, following Bauer and Rudebusch (2020); captures the secular decline in interest rates since the 1990s and separates trend from cyclical variation and term premiums. natural rate of real interest : the long-run equilibrium real interest rate consistent with stable inflation and output at potential; under the assumption that all permanent shocks to real rates are real shocks, the paper’s trend estimate provides a measure of this rate. Beveridge-Nelson trend : the long-run forecast of the shadow rate derived from the model; used here as the operational definition of the interest rate trend; transforms the information in the entire yield curve into a single macroeconomic equilibrium measure.

Heterogeneity in Manufacturing Growth Risk

Wed, 01 Jan 2025 00:00:00 +0000

Layer 1: Overview

Research question and motivation. Since the Great Recession, quantifying downside risks to economic activity (rather than only expected outcomes) has become central for policymakers and investors. A large “growth-at-risk” literature documents that tightening financial conditions sharply raise downside risks to aggregate output while leaving upside potential roughly unchanged (Adrian, Boyarchenko and Giannone, 2019). This paper argues that the aggregate focus misses important structure: aggregate fluctuations can originate from industry-specific shocks, and recessions sharply raise cross-industry dispersion in growth (Bloom, 2014). The authors ask how downside output-growth risk from tight financial conditions differs across U.S. manufacturing industries, and which industry characteristics explain that heterogeneity.

Data and method. They use monthly industrial production (IP) growth for 74 U.S. manufacturing industries at the four-digit NAICS level over January 1973–July 2020 (Federal Reserve G.17; same industry selection as Chang and Hwang, 2015), and the Chicago Fed’s National Financial Conditions Index (NFCI) as the financial-conditions gauge. The method is a two-level (multi-level) quantile regression. Level 1 (following Adrian et al., 2019) regresses the τ-th quantile of average h-month-ahead IP growth on the current NFCI and current IP growth, industry by industry, focusing on h=3. Level 2 (inspired by Petersen and Strongin, 1996) regresses the estimated level-1 NFCI quantile coefficients cross-sectionally on standardized, time-invariant industry characteristics (capital, materials, energy, production-labor and overhead-labor intensities; a correlation-based labor-hoarding measure; four-firm concentration ratio; industry size measured by value-added share; and a durability dummy). Inference uses a stationary bootstrap (1,000 replications) that propagates level-1 estimation uncertainty into level 2. Industries split into 45 durables and 29 nondurables.

Main quantitative findings. Deteriorating financial conditions hit downside risk far harder than the center or upside of the growth distribution. On average across industries, a one-standard-deviation positive NFCI shock lowers three-month-ahead IP growth by 0.237% at the median and 0.773% at the 5% quantile, and raises the 95% quantile by 0.042%. The average 5% NFCI coefficient is -0.77 across all industries versus -0.31 (linear) and -0.24 (median); 47 of 74 industries (63.5%) have significant 5% coefficients, only 5 (6.8%) have significant 95% coefficients. Durables are about twice as sensitive in the left tail: average 5% coefficients are -0.96 (durables) versus -0.48 (nondurables), with 75.6% of durables versus 44.8% of nondurables significant at 5%. Some industries (computer, aerospace, food, dairy) are essentially unaffected across the whole distribution. The relationship is nonlinear for 46 of 74 industries (62.2%) at the 5% quantile (77.8% of durables, 37.9% of nondurables). Galvao et al. (2018) slope-homogeneity tests reject coefficient equality across industries for lower quantiles. Subsample analysis (1973-84 / 1985-2006 / 2007-2020) shows tail effects strongest in the most recent period (average 5% coefficient -1.38 vs -0.73 and -0.49), weakest during the Great Moderation.

Explaining heterogeneity / implications. In the all-manufacturing second level, large industries and durable-goods producers have significantly more vulnerable downside growth, while capital-intensive, overhead-labor-intensive, and labor-hoarding industries are less vulnerable. Within durables, size, materials intensity (more vulnerable) and overhead labor intensity (less vulnerable) matter; within nondurables, energy intensity (more vulnerable) and labor hoarding (less vulnerable) matter. Implication: industry-targeted stabilization policy may be more effective than nationwide policy given the heterogeneity, and investors can build industry-rotation strategies less exposed to financial-market shocks.

In depth

Q1. What is the empirical/identification strategy, and what are the main threats to it?

The strategy is descriptive-predictive rather than causal. Level 1 estimates industry-specific quantile regressions of average h-month-ahead IP growth on the current NFCI and current IP growth (Koenker-Bassett check-function minimization via the Frisch-Newton interior-point algorithm). Level 2 regresses the estimated NFCI quantile coefficients on standardized industry characteristics via OLS. The key inferential innovation is a stationary bootstrap (Politis-Romano 1994; block length via Politis-White 2004 with Patton et al. 2009 correction, expected block ~36.76 set by the NFCI series) that jointly resamples industry IP and NFCI and feeds level-1 estimation uncertainty into level-2 confidence bands. Main threats: (i) the relationship is associational, not identified as causal — the NFCI is endogenous to the macroeconomy; (ii) generated-regressor problem in level 2 (coefficients are estimates), addressed by the bootstrap; (iii) small cross-sections (45 durables, 29 nondurables, even fewer at the three-digit level) reduce power to detect characteristic effects; (iv) time-invariant characteristics are averaged over varying available windows, abstracting from time variation.

Q2. How is nonlinearity established, and against what benchmark?

Quantile coefficients are compared to OLS linear coefficients (constant across quantiles) using 95% bootstrap bands generated under a null that the data-generating process is a VAR(4) for the NFCI and IP growth (the Adrian et al. 2019 approach). Quantile estimates falling outside those bands are evidence of nonlinearity. 46 of 74 industries (62.2%) have a 5% coefficient significantly different from OLS; the total manufacturing sector is also nonlinear, mirroring Adrian et al. (2019) for aggregate GDP.

Q3. What heterogeneity is documented?

Three layers. (1) Durables vs nondurables: durables roughly twice as sensitive in the left tail (avg 5% coefficient -0.96 vs -0.48). (2) Within sectors: e.g. motor vehicles, motor bodies and motor parts have significant 5% coefficients below -2; resin and fiber below -1.5; while computer, aerospace and food are insignificant/unaffected. (3) Across the distribution: strong effects at low quantiles, near-zero at high quantiles (avg 95% coefficient 0.04). Industries with large negative 5% coefficients also tend to have larger positive 95% coefficients (higher conditional volatility under tight conditions), most clearly iron, motor vehicles, fiber and resin — though upside gains are generally smaller than the downside increase.

Q4. Which industry characteristics explain the heterogeneity, and in which direction?

All-manufacturing (74 industries): negative effects on lower-quantile NFCI coefficients (i.e. more downside vulnerability) from industry size and durability; positive effects (less vulnerability) from overhead labor intensity, labor hoarding, and capital intensity. Durables: significant negative effect of materials intensity, negative (small) effect of size, positive effect of overhead labor intensity; production labor intensity significant at some higher quantiles. Nondurables: significant negative effect of energy intensity, positive effect of labor hoarding. Energy intensity, production labor intensity and concentration ratio are NOT significant for total manufacturing or durables in the way Petersen-Strongin found for cyclicality.

Q5. What economic mechanisms are offered for each characteristic effect?

Size: mean reversion — an industry larger than average is more likely to see growth fall (Braun-Larrain 2005). Durability: durable production is inherently more cyclical (Petersen-Strongin 1996). Labor hoarding / overhead labor: firms retain trained (especially nonproduction) workers due to sunk hiring/training costs (Becker 1962; Oi 1962; Parsons 1986), lowering the incentive to cut production in downturns. Capital intensity: higher fixed-to-variable cost ratio reduces incentive to cut output, and tangible capital provides collateral easing financing (consistent with Braun-Larrain 2005). Materials intensity (durables): higher share of variable costs raises cyclicality; also links to the negative materials-intensity/TFP relation of Baptist-Hepburn (2013).

Q6. What robustness checks are run?

(i) Additional controls (Gilchrist-Zakrajsek variables: term spread, real federal funds rate, credit spread, excess bond premium, plus extra IP lags) — qualitatively similar, wider bands. (ii) Unobserved heterogeneity via Ando-Bai (2020) interactive-fixed-effects panel quantile model (one common factor optimal) — highly similar. (iii) Alternative NAICS disaggregation: three-digit (21 industries; capital intensity dropped for multicollinearity; only labor hoarding and durability significant) and six-digit (101 industries; more characteristics significant, including production labor intensity and concentration ratio). (iv) Longer horizons h=6 and h=12 — qualitatively similar but weaker/less significant as horizon lengthens. (v) Subsample analysis of both the growth-risk coefficients and the characteristic construction windows (1973-84, 1985-2006, 2007-2020; and start dates 1958/1973/1987) — effects relatively stable; size and labor-hoarding effects weaken in recent periods while overhead labor and durability stay significant.

Q7. How does this relate to and differ from Petersen and Strongin (1996) and Adrian et al. (2019)?

It extends Adrian et al. (2019) from aggregate to industry-level growth-at-risk, documenting substantial cross-industry variation that is invisible at the aggregate level — to the authors’ knowledge the first disaggregate growth-at-risk study. It extends Petersen-Strongin (1996), who used a linear cyclicality framework, by allowing a flexible/nonlinear quantile relationship specifically with financial conditions. Findings broadly echo Petersen-Strongin for downside risk (materials intensity most important in durables; labor hoarding for nondurables — their only significant nondurable effect), but deviate by NOT finding energy intensity, production labor intensity, or concentration ratio significant in durables, and by adding size and capital intensity (cf. Braun-Larrain 2005) as relevant for total manufacturing. The agreement is attributed to business and financial cycles being closely intertwined (Claessens et al. 2012).

Q8. What are the policy implications and their scope conditions?

Because vulnerability is highly heterogeneous, industry-level stabilization policy may be more effective than nationwide policy (OECD 2003), and policies can be targeted using the signalling characteristics (size, durability, materials/energy intensity vs capital/overhead-labor intensity and labor hoarding). Investors can build industry-rotation strategies less exposed to financial shocks. Scope conditions: evidence is U.S. manufacturing only, associational not causal, conditional on the NFCI as the financial-conditions measure, strongest at the three-month horizon and in the post-2007 subsample, and characteristic effects rest on relatively small cross-sections.

Q9. Are there caveats the authors themselves flag?

Yes: after splitting into durables/nondurables, fewer characteristic effects are significant, which the authors attribute to smaller cross-sections rather than absence of effects; the two-level model is estimated sequentially (two-step) not simultaneously; characteristics are treated as time-invariant averages (justified by stable cross-industry rankings, though production labor intensity shows a downward trend); and upside potential, while present, is generally smaller than the increased downside risk.

Key Concepts

Growth-at-risk / downside growth risk: The lower-quantile (e.g. 5%) of the conditional distribution of future output growth given current conditions; here the 5% quantile of average three-month-ahead industry IP growth conditional on the NFCI, capturing how bad growth could plausibly get under tight financial conditions.

Multi-level quantile regression: The authors’ two-step procedure: level 1 estimates industry-specific quantile regressions of future IP growth on the NFCI and current IP growth; level 2 regresses the estimated NFCI quantile coefficients cross-sectionally on industry characteristics, with a bootstrap carrying level-1 uncertainty into level-2 inference.

NFCI (National Financial Conditions Index): Chicago Fed weekly index of U.S. money, debt, equity, and (shadow) banking conditions built from a large dynamic factor model; positive values mean tighter-than-average financial conditions, negative values looser-than-average. Averaged to monthly here.

Labor hoarding: Retention of employees during downturns because of sunk search, hiring and training costs; measured here as the negative correlation between changes in materials usage and changes in production-worker hours (a value of -1 = no hoarding), so higher values indicate more hoarding and predict less cyclical, less vulnerable growth.

Overhead labor intensity: Cost of nonproduction (overhead) labor relative to value added. Because nonproduction workers embody more firm-specific investment, they are more subject to labor hoarding, so overhead-labor-intensive industries have less vulnerable downside growth.

Durable vs nondurable goods sector: Federal Reserve classification (45 durable, 29 nondurable industries here). Durable-goods production is more cyclical and, in this paper, about twice as sensitive in the left tail of the growth distribution to adverse financial conditions.

Slope homogeneity test: Galvao et al. (2018) Swamy-type and standardized Swamy-type tests for a quantile-regression fixed-effects panel, used to formally reject equality of NFCI quantile slopes across industries, especially at lower quantiles.

Real Effects of Exchange Rate Depreciation: The Roles of Bank Loan Supply and Interbank Markets

Wed, 01 Jan 2025 00:00:00 +0000

Layer 1: Overview

Research question and motivation. The paper asks how exchange rate movements affect the real economy and what role the banking system’s foreign-asset exposure plays in transmitting exchange rate shocks. The motivation is concrete: with the Federal Reserve’s “tapering” of quantitative easing, the euro lost slightly more than 20% against the US dollar between 2014:Q2 and 2015:Q1, a sharp, persistent and largely unanticipated move. Standard open-economy models predict depreciations raise output via the trade balance, but recent work questions this classical trade channel and emphasizes firm/bank balance-sheet channels. The paper complements this by examining how a depreciation reshapes the composition of bank credit and, ultimately, regional output—working through banks’ net foreign asset (NFA) exposure rather than trade.

Data and empirical strategy. The authors build two datasets. The first is a matched bank-firm panel from the German credit registry (quarterly; reporting threshold 1 million euro, 1.5 million before 2014; ~two-thirds of German bank loans), merged with Bundesbank bank balance-sheet data and Amadeus firm accounts, yielding more than 300,000 bank-firm observations (Table 1: 344,777 for the loan-growth variable). The second matches INKAR region-level data on 401 German administrative regions with local savings-bank balance sheets, exploiting that savings banks lend within a fixed administrative district. Identification uses a difference-in-differences design around 2014:Q2-2015:Q1. The dependent variable is the log change in bank b’s credit to firm f from the pre-depreciation average (2013:Q2-2014:Q1) to the post average (2015:Q2-2016:Q1). Identification rests on banks’ differential pre-shock USD NFA share; firm fixed effects (sample restricted to firms borrowing from at least two banks) absorb loan demand (Khwaja-Mian, 2008), and bank fixed effects are added in the interaction model. Regressions are weighted by credit exposure.

Main quantitative findings. (1) Only large banks with higher USD NFA expand lending after the depreciation. In the full sample the NFA coefficient is positive but just below 10% significance; for systemically important banks (SIBs) it is 5.651 (significant at 5%): a SIB with a 1-percentage-point higher NFA share than the median SIB has a 5.65 pp smaller credit contraction, and given the overall ~-7% credit decline, a SIB with a 1.24 pp higher NFA share than the median turns overall credit growth positive. (2) The effect is driven by interbank lending: dropping financial-sector borrowers makes the NFA coefficient negative and insignificant; for financial borrowers it is positive (significant at 10%), and for SIBs lending to financial borrowers the coefficient is 10.915 (1%). (3) Credit shifts toward export-intensive firms, not riskier firms: the NFA × export-intensity interaction is 0.092 (10%); a firm at the 75th vs 25th export-intensity percentile sees a credit-growth differential of about 2.4 pp per 1 pp higher NFA; Z-Score and leverage interactions are insignificant. (4) Large banks act as a central intermediary: NFA × borrowing-bank export-portfolio share is 0.268 (10%), implying a 6.9 pp credit-growth differential between borrowing banks at the 75th vs 25th portfolio-export-share percentile per 1 pp higher NFA, driven by small borrowing banks. (5) Small banks with high interbank dependence and high export-firm portfolio shares raise lending (coefficient 0.609, 5%). (6) Regional real effects: for high-interbank-dependence regions, the export-share coefficient is 0.030-0.031 (10%/5%), implying regions at the 75th vs 25th export-share percentile grow 1.2 pp more cumulatively over the two post-depreciation years relative to the two pre years; no effect (even negative) in low-dependence regions.

Mechanisms and implications. The depreciation raises NFA-rich banks’ net worth (Appendix B: NFA coefficient on equity growth is 4.571 for SIBs, 1%), expanding their lending capacity. They channel this mostly via interbank loans to small, geographically constrained banks holding many exporters, which pass liquidity to export firms whose demand rises post-depreciation. Investment (not employment) of more-affected firms rises (Appendix C). The policy implication: exchange-rate depreciations can have sizeable real effects via interbank liquidity even when local banks have no direct foreign exposure; estimates are likely downward-biased since cooperative and private banks are excluded.

In depth

Q1. What is the identification strategy and what are the main threats to it?

A difference-in-differences design around the 2014:Q2-2015:Q1 euro depreciation. The dependent variable is the log change in bank-to-firm credit from a four-quarter pre-average (2013:Q2-2014:Q1) to a four-quarter post-average (2015:Q2-2016:Q1); this pre/post averaging mitigates serial correlation (Bertrand et al., 2004) and seasonality (Duchin et al., 2010). Cross-bank identification rests on differential pre-shock USD NFA shares. The Khwaja-Mian (2008) within-firm approach restricts to firms borrowing from at least two banks and includes firm fixed effects to absorb loan demand and isolate supply; bank fixed effects are added in the interaction model. The key threat is that the depreciation be endogenous to German bank lending—addressed by arguing the shock was driven largely by Fed tapering (exogenous to German lending) and ECB policy calibrated for the euro area as a whole, not Germany. A second threat is that NFA correlates with other exposures (e.g., interest-rate risk, since rates also fell); column (4) of Table 3 controls for interest-rate exposure and the NFA coefficient survives (if anything increases). A third threat is the parallel-trends assumption, addressed by placebo tests around 2002 and all quarters 2001-2014 where the NFA coefficient is never positive and significant at 5%+. Selection between firms and banks is argued away by low correlations between firm characteristics and bank NFA (-4% leverage, -0.5% export shares, 7% size).

Q2. What are the two competing hypotheses on credit allocation and how are they distinguished?

H1 (export channel): the depreciation disproportionately increases credit supply to firms with higher ex-ante export intensity, because exporters’ cash flows and creditworthiness improve. H2 (risk-taking channel): the depreciation disproportionately increases lending to riskier firms, because higher net worth loosens capital constraints (Martynova et al., 2020). They are distinguished by interacting bank NFA with (a) industry-median export intensity and proxies (size, TFP, labor productivity, capital intensity) for H1, and (b) Altman Z-Score and leverage for H2. The export interaction is positive and significant (0.092, 10% in Table 5 col 1), all four proxies are positive/significant, and in a horserace using residuals orthogonal to export intensity (col 6) only export intensity (and capital intensity) survives. The Z-Score and leverage interactions are insignificant. Conclusion: H1 confirmed, H2 rejected—no evidence of increased risk-taking.

Q3. How is the interbank intermediation mechanism established?

In three steps. First (Table 2), dropping financial borrowers kills the NFA effect while restricting to financial borrowers preserves it (col 7: 1.947, 10%; col 9 for SIBs: 10.915, 1%), showing the lending increase is interbank, not corporate. Second (Table 6), restricting to large lenders and financial borrowers, the NFA × borrowing-bank export-portfolio-share interaction is 0.268 (10%), a 6.9 pp differential per 1 pp NFA between borrowing banks at the 75th vs 25th portfolio export-share percentile—driven by small borrowing banks (col 2: 0.359 significant; col 3 large borrowers: 0.046 insignificant). Third (Table 7), small banks with high export-firm portfolio shares raise lending (full sample 0.452, 10%), and splitting by interbank dependence the effect is significant only for high-dependence small banks (0.609, 5%) and insignificant for low-dependence (0.141), confirming interbank liquidity—not pre-existing excess liquidity—drives the result. A double interaction (col 4: 0.025, 10%) shows small banks pass the liquidity especially to export-intensive firms.

Q4. What heterogeneity is documented?

Large vs small banks: only large/SIB banks with high NFA respond; small banks do not (Table 2 cols 3,5). Section 4.3 shows this is because only the largest banks have economically meaningful NFA (SIB average USD NFA/assets 4.6% vs 0.3% for others); dropping the 5 largest NFA banks among SIBs renders the coefficient insignificant (4.899) and dropping the 10 largest turns it negative and imprecise (-3.257). So it is NFA level, not size per se, that drives the response. Firm heterogeneity: export-intensive firms gain, riskier firms do not. Interbank-dependence heterogeneity: regional GDP and small-bank lending effects appear only for high-interbank-dependence banks/regions. Firm real outcomes (Appendix C): investment of exporters rises only when relationship banks have high interbank dependence (col 6: 0.146, 10%); employment effects are insignificant throughout.

Q5. What robustness checks are run?

Table 3: (1) broadening NFA to include CHF, JPY, GBP (5.850, 5%); (2) disaggregating into gross USD assets (3.829, 5%) and gross USD liabilities (4.369, 10%, counter-intuitive but attributed to 89% asset-liability correlation acting as a proxy); (4) adding interest-rate exposure as a control (NFA rises to 6.847, 5%); (5) eight-quarter pre/post windows (4.996, 5%); (6) a 2002 placebo where NFA is insignificant, plus all-quarters-2001-2014 placebos never positive-and-significant at 5%+, supporting parallel trends. Table 8 col 5 runs a regional placebo around 2002 with no disproportionate growth. Appendix D between-firm regressions (controlling for demand via Abowd et al. 1999 firm fixed effects) confirm more-exposed firms get higher overall credit (0.868, 5%), though the export interaction there is insignificant (all exposed firms benefit, no extra amplification for exporters in the between-firm dimension). Appendix B confirms the net-worth channel.

It is closest to Agarwal (2019), who exploits the 2015 Swiss franc appreciation and shows banks with high foreign-currency liabilities changed domestic credit and growth. This paper differs by: (i) studying a depreciation rather than appreciation; (ii) using disaggregated bank-firm credit-registry data covering non-listed firms (Agarwal uses listed firms); (iii) identifying interbank lending as the dominant channel explaining the credit increase; (iv) showing banks use interbank liquidity to lend especially to exporters; and (v) documenting higher regional GDP growth. It also contrasts with Bruno and Shin (2019), who find Mexican firms reliant on high-dollar-funding banks suffer credit and export declines after the taper tantrum; here the same taper tantrum has a positive credit effect because USD appreciation raises the value of USD assets where domestic banks hold significant foreign-currency exposure. It contributes to the interbank-markets-and-monetary-policy literature (Abbassi et al., 2014; Freixas et al., 2011; Allen et al., 2014) by showing monetary policy can affect interbank markets indirectly via the exchange rate.

Q7. What are the policy implications and their scope conditions?

Exchange-rate depreciations can have sizeable real effects through bank-balance-sheet and interbank channels, distinct from the trade channel, and these effects reach banks with no direct foreign exposure via interbank liquidity reallocation. Scope conditions: the result requires (a) a banking sector with significant, imperfectly hedged net foreign-currency (USD) assets concentrated in large banks; (b) an export-intensive economy where credit to exporters has aggregate bite (Germany has one of the world’s largest net-exports-to-GDP ratios); (c) a geographically segmented banking system (German savings banks) that lets regional output be linked to local-bank exposure; and (d) the depreciation being large, persistent, and largely exogenous/unanticipated (driven by Fed tapering). The 1.2 pp regional growth differential is between high- vs low-export-share regions among high-interbank-dependence regions only. The authors stress estimates are likely downward-biased because cooperative and private credit banks are omitted from the regional analysis.

Q8. What are the most important caveats and limitations?

(1) Export turnover is reported by only a minority of Amadeus firms, so export intensity is proxied by industry medians, introducing measurement error. (2) Regional GDP is nominal (no regional CPI), justified by low, stable German inflation. (3) Within-firm regressions capture only the intensive margin; new and terminated relationships are handled separately in Appendix D between-firm regressions. (4) Firm-level real-outcome regressions (Appendix C) have small samples covering a small subset of German firms and compare 2014 vs 2012 (firm data end 2014), so they are interpreted as merely indicative. (5) The gross-foreign-liability robustness result is counter-intuitive and attributed to high asset-liability correlation. (6) The paper studies a depreciation only; asymmetric responses to appreciation and the source of the exchange-rate move (domestic vs foreign monetary policy) are left for future research.

Key Concepts

"Compensate the Losers?" Economic Policy and the Origins of U.S. Partisan Realignment

Mon, 01 Jan 0001 00:00:00 +0000

Layer 1 — Overview

Research Question. Why have less-educated voters in the United States abandoned the Democratic Party over recent decades? The paper argues that the Democratic Party’s evolution on economic policy — specifically its retreat from “predistribution” — is a central, previously understudied driver of partisan realignment by education.

Conceptual Framework. The authors distinguish between two categories of egalitarian economic policy: (1) predistribution — policies that alter the pre-tax-and-transfer earnings distribution, including job guarantees, minimum wage increases, union support, and protectionist trade policies (following Hacker 2011); and (2) redistribution — taxes and transfers. The paper’s central claim is that these two types of policy have sharply different educational gradients among voters, and that the Democratic Party moved away from predistribution beginning in the 1970s, triggering educational realignment.

Data and Methodology. The authors harmonize over 1,000 surveys (N ≈ 2.2 million observations) spanning 1942–2020, drawn from Gallup, ANES, GSS, CCES, and historical survey archives housed at iPoll/Cornell. Education is translated into a common metric (adjusted years of schooling) using Census data, controlling for sex, race, year, and birth cohort to address the changing selectivity of educational categories over time. Congressional roll-call data come from the Comparative Agendas Project (CAP). Campaign finance data come from FEC filings, Congressional hearing records, and watchdog sources. DLC membership data are compiled from official Democratic Leadership Council records (available for 1985, 1986, 1991, 1993, and 1997 onward) and DLC-aligned Congressional caucus lists. House election returns are taken from King and Palmquist (1997) at the minor-civil-division-group (MCDG) level (~60 units per Congressional district), matched to 1980 Census demographic data.

Main Findings.

Voter preferences (demand side): The educational gradient for predistribution is large and negative: averaged across the four predistribution questions (job guarantee, minimum wage, union support, trade protection), each additional year of education reduces support by 0.044 standard deviations (p < 0.001). A college graduate relative to a high school graduate supports predistribution 0.176 standard deviations less — equivalent to roughly half the average Democrat-Republican gap in predistribution support (which is 0.34 standard deviations). This gradient has been stable since at least the 1940s. By contrast, the educational gradient for redistribution (higher taxes on the rich, views on own taxes, welfare spending) is close to zero (summary β = 0.004, not distinguishable from zero in the full sample). The difference between the two gradients is statistically significant (p < 0.001). These results replicate in white-only samples. Notably, the educational gradient on social issues — measured across nine questions on racial attitudes, gender roles, sexual norms — is positive (more education predicts more liberal positions) but has been largely stable since the 1940s, not increasing, conditional on the long-run sample.

Party supply (supply side): Before 1976, predistribution topics accounted for roughly one-quarter of Democratic House roll-call votes when Democrats controlled the chamber. After 1976 (taking Jimmy Carter’s presidency as the start of the “New Democrat” era), this share falls by approximately nine to ten percentage points, while the redistribution share of votes holds steady. Between 1968 and 1980, the union share of total PAC donations to Democratic Congressional candidates falls from approximately 90 percent to 40 percent, coincident with 1970s campaign finance reforms that placed union and corporate PACs on equal legal footing and allowed corporations to exploit their naturally deeper pockets. Corporate PAC share of Democratic donations correspondingly rises from approximately 10 percent to 45 percent over the same period. In individual contributions to primary elections (data beginning in 1980), Democratic primaries rely on increasingly more-educated census tracts relative to Republican primaries; by 2018 Democratic primaries are financed from census tracts averaging 0.41 more years of education than Republican primaries (against a within-year standard deviation of 1.56 years).

The New Democrat/DLC faction: The authors identify the anti-predistribution faction through official DLC membership records and aligned caucus lists. DLC membership as a share of Democratic House seats grows from near zero in the mid-1970s to approximately half by the early 2000s. Roll-call voting analysis (N = 3,428,405 vote-observations) shows DLC members are more conservative than other Democrats overall, and especially so on predistribution: for a 10-percentage-point increase in the share of Republicans voting for a bill, the probability a DLC member votes in favor increases 36 percent more on predistribution bills than on other bills. DLC members show no differential conservatism on redistribution. They are also significantly more socially conservative — more likely than other Democrats to support the Defense of Marriage Act (by 16 pp), the Partial-Birth Abortion Ban (by 7 pp), and restrictive immigration bills (by 10 pp). DLC candidates receive significantly less from labor PACs and significantly more from corporate PACs, and draw their out-of-district individual donations from census tracts averaging more than 0.1 years more educated than non-DLC Democrats.

Voter reaction and the inflection point: Using the N ≈ 2.2 million partisan identification dataset, the authors estimate a structural break in the education-party identification gradient. From the 1940s through the mid-1970s, each additional year of education reduces the probability of identifying as a Democrat by approximately 3 percentage points. A Chow breakpoint test identifies 1976 as the inflection point. Since 1976, the gradient steadily rises; by 2000 it reaches zero; and today (as of the sample period end ~2020) each additional year of education increases Democratic identification by approximately 3 percentage points — an almost exact reversal. The breakpoint for Republican identification occurs later, in 1992, consistent with the Democratic agenda changing first. A Gallup prosperity question (“which party will better keep the country prosperous?”) shows a parallel pattern: controlling for views on parties’ economic performance explains approximately 44 percent of partisan realignment, interpreted as an upper bound on economic policy’s contribution.

Factional tests — hypothetical elections and actual results: In hypothetical general-election matchups from 1972–1992 Democratic primaries (in which most contests pitted a “New Democrat” against an “Old Democrat”), a voter with a college degree is roughly 3 percentage points more likely to vote Democratic when the candidate is a New Democrat rather than an Old Democrat. In 1980s actual House elections using MCDG-level data, DLC candidates out-perform other Democrats in more educated neighborhoods by a magnitude large enough to erase approximately 90 percent of the general Democratic underperformance in highly educated areas. Combining these estimates, the party’s shift toward the DLC accounts for a lower bound of approximately 20 percent, and an upper bound (from the prosperity question) of approximately 50 percent, of educational realignment.

Scope Conditions. The analysis focuses on the United States, 1942–2015 (with some post-2015 discussion in the conclusion). The faction analysis focuses on the Democratic side; Republican faction changes are discussed but not the primary focus. The paper is explicit that between 20–50 percent of realignment is explained, leaving room for other factors, including social issues. The analysis ends mostly before 2016 to avoid complications from the closure of the DLC in 2011 and shifting post-2010 party dynamics.

In depth

Q1. What is the paper’s central conceptual innovation, and how does it differ from prior realignment research?

The paper separates egalitarian economic policies into “predistribution” (pre-tax-and-transfer market interventions such as minimum wages, job guarantees, union support, and protectionism) and “redistribution” (taxes and transfers) and shows these two types have sharply different educational gradients. Prior work typically aggregated all economic policies into a single index, which the authors argue masks essential heterogeneity. By documenting that the educational gradient is large and negative for predistribution but close to zero for redistribution — a pattern stable since the 1940s — the paper reframes the “voting against economic interest” puzzle: less-educated voters leaving the Democratic Party may be responding rationally to changes in the supply of the type of economic policy they actually prefer.

The average coefficient on adjusted years of schooling across the four predistribution questions is -0.044 (p < 0.001), stable over eight decades. A four-year difference in education (high school vs. college) shifts an individual’s support for predistribution by 0.176 standard deviations in the conservative direction — about half the average Democrat-Republican gap in predistribution support (0.34 standard deviations). For social issues, the summary gradient is positive (+0.028, p < 0.001 for the full sample), but this gradient has been largely stable since the 1940s across nine social issue questions, not increasing over time. This stability undermines the interpretation that rising social liberalism among the educated is a new phenomenon driving realignment, at least through the supply of parties’ social positions.

Using the Comparative Agendas Project classification, predistribution topics (labor regulation, industrial policy, public works, trade) accounted for roughly one-quarter of all House roll-call votes during years Democrats controlled the Speakership before 1977. After 1977, this share falls by approximately 9–10 percentage points (a decline of nearly half from its pre-1977 share), and the decline is statistically significant (p < 0.001). The redistribution share of votes holds essentially constant. Party platform data from Hopkins et al. (2022) show a sharp decline in Democratic use of terms like “minimum wage,” “full employment,” and labor-relations language beginning in the 1970s and 1980s, while Republican platforms use these terms sparingly throughout.

Q4. How did 1970s campaign finance reforms change the financial composition of the Democratic Party?

Before the early 1970s, unions enjoyed substantially more freedom than corporations under separate legal regimes governing PAC donations; mid-1970s reforms placed them on equal legal footing, enabling corporations to exploit their deeper pockets. The union share of total PAC donations to Democrats fell from approximately 90 percent in 1968 to approximately 40 percent by 1980, while the corporate share rose from approximately 10 percent to 45 percent. For Republicans, both series barely changed: unions had never donated substantially to the GOP, and the corporate share rose only modestly (from approximately 70 to 80 percent). The authors note the rapid decline cannot be attributed to falling union density in the economy, since both union and corporate PAC donations grew in absolute terms during this period; the relative shift was the result of the regulatory change.

Q5. Who are the “New Democrats” / DLC, and when did they emerge?

The DLC officially operated from 1985 to 2011, but members who would join it began entering Congress in large numbers in the 1970s (“Watergate Babies” of 1974, “Atari Democrats”). The DLC grew to approximately half of all Democratic House seats by the early 2000s. Members were drawn from suburban, affluent districts; their founder Al From explicitly criticized all four predistribution policies the paper studies (minimum wage, job guarantees, unions, and protectionism). The breakpoint test on DLC share in Congress identifies 1975 as the pivotal year — one year before the 1976 inflection point in partisan identification.

Q6. How do DLC members vote differently from other Democrats, and how is this differential conservatism distributed across policy types?

In roll-call regressions (N = 3,428,405 observations, with roll-call fixed effects), a 10 pp increase in the Republican vote share for a bill increases the probability a DLC member votes in favor by 1.48 pp more than for other Democrats (baseline result for all bills). For predistribution-classified bills, this excess alignment with Republicans is 36 percent larger than for non-predistribution bills. Crucially, DLC members are no more conservative than other Democrats on redistribution-classified votes (the interaction with redistribution is near zero and insignificant). DLC members are also differentially more conservative on social issues, a result that proves useful in separating economic from social-issue explanations of realignment.

Q7. Do DLC members finance differently from other Democrats?

Yes. In primary elections, DLC candidates receive approximately 9.7 pp less of their PAC financing from labor unions and approximately 6.7 pp more from corporate PACs (with state fixed effects) relative to non-DLC Democrats. Out-of-district individual contributions to DLC primary candidates come from census tracts averaging more than 0.1 years more educated than those for non-DLC Democrats, while within-district contributions show no significant difference (0.060 years, insignificant). This pattern suggests educated out-of-district donors, rather than local constituency demands, drive DLC candidates’ anti-predistribution orientation.

Q8. When precisely did educational realignment in Democratic party identification begin, and what does the inflection-point analysis show?

Using N ≈ 2.2 million observations from 1,006 surveys, a Bai-Perron breakpoint test on the year-by-year education gradient in Democratic party identification identifies 1976 as the inflection point (with robustness to alternative specifications yielding breakpoints of 1978–1980 for white-only samples and unadjusted years of schooling). Before 1976, each additional year of education reduces the probability of Democratic identification by approximately 3 percentage points (a stable, significantly negative relationship since the 1940s). After 1976, the gradient steadily rises; it reaches zero around 2000 and today is approximately +3 percentage points per year of education — nearly an exact reversal of the baseline. The corresponding Republican inflection point occurs in 1992, about 16 years later, consistent with the Democratic Party’s agenda changing first.

Q9. How do hypothetical presidential matchup surveys test the DLC mechanism?

The authors identify six Democratic primaries from 1972–1992 where a “New Democrat” and an “Old Democrat” were the top two contenders (e.g., Hart vs. Mondale in 1984, Clinton vs. Brown in 1992). Gallup and other surveys asked all respondents — regardless of party — whom they would vote for if either the New or the Old Democrat faced the eventual Republican nominee. A voter with a college BA is approximately 3 percentage points more likely to vote for the Democrat when the candidate is a New Democrat versus an Old Democrat (the “difference in differences” of hypothetical vote shares). This holds after controlling for state × election fixed effects and in five of the six election cycles studied (the 1976 exception is attributed to Mo Udall’s low name recognition, with 28 percent of respondents unfamiliar with him in a May 1976 poll). The result is attenuated but remains marginally significant when excluding non-white respondents, consistent with New Democrats’ success with white voters due in part to their more conservative civil rights positioning.

Q10. What do actual House election results (MCDG-level data) show about DLC electoral performance by neighborhood education?

Using 1980s House returns at the MCDG level (~60 neighborhoods per Congressional district), the authors regress Democratic vote share on neighborhood years of education interacted with a DLC candidate indicator, with Congressional district fixed effects. More-educated neighborhoods generally depress Democratic vote share (reflecting the still-negative overall educational gradient in the 1980s), but DLC candidates dramatically out-perform other Democrats in educated areas: the interaction coefficient is positive and significant, and its magnitude is large enough to erase approximately 90 percent of the general Democratic underperformance in highly educated neighborhoods. This result is robust to including District × Year fixed effects (so the identification comes from within-election, cross-neighborhood variation) and to adding controls for share white and share under age 35.

Q11. How much of educational realignment can the paper’s mechanism account for, and how is this calculated?

Two bounding estimates are provided. Upper bound (~44–50%): controlling for a respondent’s view on which party is better for economic prosperity (from Gallup since 1950) explains approximately 44 percent of the change in the education-party identification gradient (specifically, the total difference in the unconditional gradient between the 1948–1967 baseline and 2001–2020 is 2.411 pp per year of schooling; after controlling for the prosperity question, the unexplained residual is 1.342 pp, leaving a share explained of 44.3 percent). Lower bound (~20%): the difference in the education gradient between matchups involving New versus Old Democrats in Table 4 (~0.75 pp) divided by the total realignment shift (~4 pp from pre-1976 to post-2008 for presidential voting) implies the faction shift accounts for at least approximately one-fifth of realignment. The authors interpret these as bounds because the prosperity question may partly capture party identification itself (upper bound concern), while the hypothetical matchup estimate misses the broader ideological shift not captured in a single election (lower bound).

Three alternative explanations are addressed. (1) Civil Rights: Regional analysis shows that educated white Southerners left the Democrats in the 1940s–1960s (not the 1970s), consistent with their realignment being driven by Democrats’ liberal turn on civil rights rather than economic policy. After the 1960s, the South follows all other regions in the pace of educational realignment. (2) Republican changes: The Republican party identification inflection point occurs in 1992, about 16 years after the Democratic inflection in 1976. Reagan elections in 1980 and 1984 do not appear to have differentially attracted less-educated voters (the “Reagan Democrats” were not differentially less educated). (3) Social issues: The New Democrats were actually more socially conservative than other Democrats (more likely to vote for DOMA, anti-abortion bills, restrictive immigration legislation), yet they disproportionately attracted educated voters. This internal inconsistency rules out a pure social-issues explanation for why educated voters preferred the DLC faction. (4) Religion: Flexibly controlling for religious affiliation explains essentially none of partisan realignment (Appendix Figure A.24).

Q13. What is the role of out-of-district individual donors in shifting Democratic Party positions?

Out-of-district primary donors are analytically important because they influence candidate supply without being able to vote in the election, isolating the “within-party” financial influence of educated supporters. By 1980, out-of-district primary donors to Democratic candidates already come from census tracts more educated than those for Republican candidates, even as local Democratic voters and within-district donors remain less educated than Republican counterparts. Democratic candidates also receive a substantially higher share of out-of-district contributions than Republican candidates — by almost 10 percentage points (Appendix Table A.7). Out-of-district donors thus represent a channel through which educated, anti-predistribution preferences are transmitted into the Democratic Party’s candidate supply before the electoral realignment is visible in vote totals.

Q14. Are predistribution policies becoming less popular overall, which might independently push Democrats away from them?

The paper tests this alternative in Appendix Table A.9 and finds no evidence that predistribution has become less popular relative to redistribution over time. Predistribution appears on average more popular than redistribution across the sample period. If anything, support for predistribution has held steady or slightly risen relative to redistribution over time, conditional on the paper’s survey harmonization. The stability of the educational gradient (shown in Appendix Table A.10 to be unchanged even using educational rank within cohort rather than raw years of schooling) further suggests the negative education-predistribution relationship is a relative, not absolute, phenomenon — consistent with rising average education and stable preferences by education rank.

Key Concepts

Predistribution: Policies that aim to change the distribution of earnings or income before taxes and transfers are applied. In this paper, this comprises government job guarantees, minimum wage increases, support for unions and collective bargaining, and protectionist trade policies. Distinguished from redistribution in that it operates on pre-tax market income rather than post-tax outcomes. The paper uses this term following Hacker (2011): “a focus on market reforms that encourage a more equal distribution of economic power and rewards even before government collects taxes or pays out benefits.”

Redistribution: Policies that change post-market income through the tax and transfer system, including higher taxes on the rich, views on own tax burden, prioritization of tax cuts, and transfers to the poor (welfare spending). In the paper’s usage, redistribution is analytically distinct from predistribution and has a near-zero educational gradient, in contrast to predistribution’s strongly negative gradient.

Educational Gradient: The coefficient on adjusted years of schooling in a regression of an outcome variable (policy preference or partisan identification) on education, estimated separately by time period. The paper’s core finding is that the educational gradient for predistribution is stably negative (approximately -0.044 per year of schooling over the full sample), while the gradient for redistribution is close to zero, and the gradient for Democratic party identification shifts from approximately -0.03 to +0.03 per year of schooling between the 1940s and 2020.

New Democrats / DLC (Democratic Leadership Council): An explicitly anti-predistribution faction within the Democratic Party, identified through official DLC membership records and affiliated Congressional caucus lists. Founded formally in 1985 (operating through 2011), the DLC arose in part from the “Watergate Babies” cohort of 1974. DLC members were more conservative than other Democrats especially on predistribution and social issues, relying differentially on corporate PACs and educated out-of-district donors. The paper treats DLC membership as a proxy for an anti-predistribution faction that gained bargaining power within the Democratic Party from the 1970s onward.

Adjusted Years of Schooling (AdjYearsEduc): The paper’s harmonized education variable across more than 1,000 surveys spanning eight decades. Because raw educational categories change over time and represent different selectivity (e.g., in 1940 only one-quarter of adults had completed twelfth grade, versus nearly 90 percent today), the authors use Census microdata to predict years of schooling as a function of self-reported educational category, sex, race, year, and birth cohort in ten-year bins. This provides a common unit of measurement across surveys with incompatible category systems.

Inflection Point (1976): The structural break in the trend of the education-Democratic identification gradient, estimated using Bai-Perron (1998) methods on N ≈ 2.2 million observations. The data select 1976 as the year at which the previously stable negative gradient begins its upward trajectory. The corresponding Republican inflection point occurs in 1992. The paper argues that identification of this inflection point — not previously documented in the realignment literature — is made possible only by the large historical dataset assembled.

Minor Civil Division Group (MCDG): The granular geographic unit used in the House election analysis for the 1980s, with approximately sixty MCDGs per Congressional district. Matched to 1980 Census demographic data to assign average years of education. Used to test whether DLC candidates out-perform other Democrats in more-educated neighborhoods, within the same Congressional district and election year, to address the concern that DLC candidates sort into more-educated districts.

A Cognitive Theory of Reasoning and Choice

Mon, 01 Jan 0001 00:00:00 +0000

Bordalo, Gennaioli, Lanzani, and Shleifer develop a cognitive theory of choice in which a decision maker’s attention to the features of options is determined by her categorization of the current problem against a memory database of problems she solved in the past. The core claim is that before solving a problem, the decision maker asks “what kind of problem is this?” and resolves it by selecting the category — indexed by a prototype attention-plus-context vector and a time-discounted frequency — whose similarity to the current problem is maximized. This problem recognition step then pins down which features (price, quality, probabilities) receive attention, which in turn shapes valuation and choice.

The model formalizes two-step choice. In step one (recognition), the decision maker jointly chooses an attention vector alpha_P and a category c* to maximize a separable similarity function S[(alpha_P, kappa_P), (alpha_c, kappa_c)] weighted by category frequency F_c, plus a Type I extreme-value shock that yields a logit probability over categories. In step two, she maximizes perceived value over the menu using the endogenously determined weights. Perceived hedonic value of feature i shrinks toward the menu average when alpha_{P,i} < 1; perceived probabilities compress toward uniform when the event-attention weight falls below 1, producing probability overweighting of unlikely events. Full attention recovers expected utility.

The model yields three structural predictions that hold without changing tastes or information. First, within-person multi-modal attention: because categorization is stochastic, the same person can cluster on entirely different features (e.g., the base rate vs. the likelihood in an inference problem) across otherwise identical choice occasions. Second, systematic context-driven instability: when an irrelevant context feature kappa_{P,i} drifts away from a category’s diagnostic kappa_{c,i}, the probability of that category falls discontinuously, causing a discrete switch in the attention profile and hence in valuation. Third, experience-driven heterogeneity: people more frequently exposed to a category (higher F_c) are more likely to use it, producing persistent differences in price elasticities or probability weighting at constant income and tastes.

Applied to riskless consumer choice, the paper introduces two categories — “buying” (full attention to price, partial to quality: alpha_{M_g}=1 > alpha_{Q_g}=alpha) and “consuming” (full attention to quality, partial to price: alpha_{Q_g}=1 > alpha_{M_g}=alpha). A jam problem categorized as buying yields valuation v = alphaq - etap; categorized as consuming, v = q - alphaetap. The valuation jumps discontinuously as context crosses a threshold kappa*, which shifts when relative category frequency F_{buy}/F_{con} changes. This framework accounts for context-dependent price elasticities (Wakefield and Inman 2003), poverty-driven excess price focus (Shah et al. 2018), de-commoditization through advertising, and mental accounting anomalies including opportunity cost neglect and the sunk cost fallacy — both arising because con neglects capital gains (alpha_{con,Delta_M}=0) and buy neglects quality shocks (alpha_{buy,Delta_Q}=0).

Applied to statistical judgment, the paper introduces two categories — “frequency estimation” (attention alpha_1=1 to a single i.i.d. draw from a known DGP) and “agnostic inference” (attention alpha_S=1 to the share of heads as a sufficient statistic). The threshold N* separates recognition: for sequence length N_P < N*(F_{freq}/F_{inf}), the decision maker categorizes as frequency and correctly assesses odds; for N_P >= N*, she switches to inference and overweights balanced sequences, producing the Gambler’s Fallacy. The same competition between categories also accounts for base rate neglect, conjunction fallacy, and correlation neglect, with the bias strengthening as sequences grow longer.

Applied to risky choice, bottom-up salience — sensory prominence and contrast — interacts with categorization. A publicity shock drawing attention to a low-probability contamination risk raises similarity to “consuming,” triggering a category switch that amplifies attention to quality broadly and reduces attention to price, producing large valuation drops disproportionate to the actual probability shift. This mechanism generates the framing effects of prospect theory without a stable S-shaped utility function: gains and losses frames correspond to different contexts activating different categories.

Scope conditions: the theory applies when features and their values are fully known to the decision maker (no uncertainty about attributes), so the distortions take the form of altered sensitivity to known features rather than missing information. The set of categories C is taken as given in the formal analysis, though the authors discuss endogenization as future work.

Q: What is the paper’s central departure from standard rational inattention and noisy-perception models?

A: Standard models (Sims 2003, Woodford 2012, Enke and Graeber 2023) produce unimodal, stably weighted valuations — the decision maker’s weighting of features is a smooth function of payoff-relevant costs or priors. In this paper, the weighting is determined by problem recognition, which is discrete and stochastic, producing within-person multi-modal attention: the same person can cluster on entirely different features across identical problems. The authors cite direct evidence from Bordalo, Conlon, Gennaioli, Kwon, and Shleifer [20] showing bimodal clustering on base rates vs. likelihoods in statistical problems, a pattern inconsistent with stable-weighting models.

Q: How is perceived value distorted when the attention weight on a hedonic feature is below 1?

A: The perceived value of hedonic feature i is u_i(alpha_P) = alpha_{P,i} * u_i + (1 - alpha_{P,i}) * u_bar_i, where u_bar_i is the average value of that feature across options in the menu. An attention weight of zero collapses perceived variation in that feature to zero; full attention recovers the true value. The implication is that under-attention shrinks the decision maker’s effective sensitivity to a known attribute, causing systematic under- or over-valuation relative to a rational benchmark while tastes (marginal utilities) are held fixed.

Q: How is perceived probability distorted?

A: With attention weight alpha_{P,W} on event W, the perceived probability of event e is P(e)^{alpha_{P,W}} / sum_{e’} P(e’)^{alpha_{P,W}}, which compresses the distribution toward uniform as alpha_{P,W} falls toward 0 and recovers the true distribution at alpha_{P,W}=1. In the jam example, under-attention to the small probability of spoilage causes the decision maker to overestimate the risk of contamination. For multi-dimensional event vectors the formula generalizes multiplicatively, allowing “editing out” of entire event dimensions (e.g., urn selection in a balls-and-urns problem) when their attention weight hits zero.

Q: What is the mechanism for context-dependent price elasticity?

A: When context kappa_P is below threshold kappa*(F_{buy}/F_{con}), the decision maker categorizes the problem as “buying” and her valuation is v = alphaq - etap, giving a high price sensitivity (coefficient eta) and attenuated quality sensitivity (coefficient alpha < 1). Above kappa*, she categorizes as “consuming” and valuation is v = q - alphaetap, reversing the emphasis. Because the threshold kappa* is increasing in relative frequency F_{buy}/F_{con}, a decision maker with more buying experience has a higher threshold and thus acts as more price-elastic at any given context level. These elasticity differences arise without any change in the true marginal utility of money eta or quality q.

Q: How does the model generate the sunk cost fallacy and opportunity cost neglect as a unified phenomenon?

A: Both anomalies arise because buying and consuming categories selectively neglect shocks. In the football example, recognizing the problem as “buying” activates alpha_{buy,Delta_Q}=0, so the blizzard quality shock Delta_q<0 is ignored and the decision maker drives to the game as if the shock did not occur — the sunk cost fallacy. In the wine example, recognizing the problem as “consuming” activates alpha_{con,Delta_M}=0, so the capital gain Delta_p is ignored and the decision maker reports a zero or purchase-price cost — opportunity cost neglect. The unifying mechanism is that each category attends only to the features diagnostic of its prototypical experiences: buying attends to price paid and normal quality; consuming attends to realized quality and partly to price, but not to capital gains.

Q: What comparative static does the model predict for sunk cost susceptibility based on experience?

A: People with higher F_{buy} (more buying experiences, e.g. poverty experiences or having recently purchased but not yet consumed the good) exhibit more sunk cost fallacy and less opportunity cost neglect. Conversely, season ticket holders face many consuming experiences relative to one buying event, raising F_{con} and thus reducing susceptibility to the sunk cost fallacy for sports events. Making the blizzard more salient in the description shifts similarity toward “consuming,” also reducing the sunk cost fallacy through a different channel (bottom-up salience rather than experience).

Q: What is the paper’s explanation for the Gambler’s Fallacy, and what distinguishes it from prior accounts?

A: The Gambler’s Fallacy arises when sequence length N_P exceeds threshold N*(F_{freq}/F_{inf}), causing the decision maker to switch from the frequency category (which attends to the 50:50 fairness of the coin) to the inference category (which attends to the share of heads). Under inference, the decision maker treats balanced and unbalanced sequences as representatives of their “share of heads equivalence class,” and the class of balanced sequences is larger, so balanced sequences receive higher estimated probability — the Gambler’s Fallacy. This differs from Rabin and Vayanos (2010), where the bias stems from a belief that the coin is drawn from a pool; here the decision maker knows the coin is fair (kappa_{P,U}=0.5) but the inference representation causes question substitution rather than a wrong model of the DGP.

Q: How does the model make the Gambler’s Fallacy testable beyond length effects?

A: The model predicts the bias is stronger for decision makers who recently solved many inference problems (lower F_{freq}/F_{inf}), and weaker when the 50:50 nature of flips is made bottom-up salient in the choice context (because salience raises similarity to the frequency category, hindering recognition of inference). These cognitive proxies — experience frequencies and bottom-up salience — are orthogonal to the statistical content of the problem and thus allow identification of the mechanism separately from changes in information or incentives.

Q: How does the model produce framing effects in risky choice without a stable S-shaped utility function?

A: Gains and losses frames are modeled as different context vectors kappa_P that differentially increase similarity to a “safe outcome” category or a “risk” category. Recognizing the problem as the safe-outcome category shifts attention toward the certain option; recognizing it as the risk category shifts attention toward variance. The reversal of preferences between gain and loss frames (the Asian Disease problem, Tversky and Kahneman 1981) thus emerges from context-driven re-categorization rather than from a fixed probability weighting function. The novel prediction is that framing effects should be stronger for decision makers with more experience with the category activated by each frame, and weaker when bottom-up salience of the alternative frame’s features is raised.

Q: How does bottom-up salience interact with top-down categorization in the contamination example?

A: A publicity shock alpha_{delta,Q_b}>0 raises baseline attention to the spoiled-jam quality feature, increasing the similarity of the current problem to the “consuming” category (where quality is focal). This triggers a category switch for marginal agents, activating the full consuming attention profile — which attends to quality broadly, not just to contamination specifically, and reduces attention to price. The resulting valuation drop is therefore disproportionate to the actual probability of contamination and exhibits price insensitivity, because re-categorization shifts the entire attention profile rather than just updating a single probability.

Q: How does the model relate to and distinguish itself from case-based decision theory (Gilboa and Schmeidler 1995) and analogical reasoning (Mullainathan 2002, Fryer and Jackson 2008)?

A: In Gilboa-Schmeidler and related models, the decision maker uses past cases to resolve uncertainty about unknown attributes of current options; attention is full and the mechanism is extrapolation of payoffs from similar cases. In Mullainathan (2002) memory-based model, categories again serve to fill in missing information. In this paper, there is no uncertainty about attributes — features and their values are fully known — and the distortion instead takes the form of altered sensitivity to known features through selective attention. This allows the model to produce biases even in simple problems with full data disclosure, and to explain phenomena like base rate neglect and price insensitivity that are not primarily about missing information.

Q: What does the model predict about within-person versus across-person distributions of valuations?

A: Within a person, attention is multi-modal (bimodal in the two-category case) because categorization is stochastic. However, if many categories are possible across the population, the aggregate distribution of valuations can appear approximately unimodal even though each individual’s distribution is not. This distinction is empirically important: a researcher observing average choices may incorrectly infer smooth preference heterogeneity when the underlying mechanism is discrete category switching.

Q: What cognitive proxies does the model propose for empirical identification?

A: The theory links endogenous attention and choice to three observable (or measurable) proxies: (1) past experience frequencies F_c, measurable from administrative histories, surveys about past exposure, or experimental manipulation of training; (2) contextual similarity, measurable from field or experimental variation in irrelevant context features; and (3) bottom-up salience, experimentally controllable via prominence or contrast manipulations. The key identification logic is that these proxies are payoff-irrelevant — they do not change tastes, information, or the objective choice problem — yet predict systematic shifts in choice through their effect on recognition.

Problem Recognition: The first step in the decision maker’s choice process, in which she jointly selects an attention vector alpha_P and a category c* by maximizing weighted similarity between the current problem (characterized by its context vector kappa_P) and the prototype of a past category (alpha_c, kappa_c), multiplied by the category’s time-discounted frequency F_c. Recognition is not about resolving uncertainty over attributes but about selecting which known attributes to attend to.

Category: A partition element of the decision maker’s memory database, indexed by a prototype attention-plus-context vector (alpha_c, kappa_c) and a frequency scalar F_c. The prototype encodes both the context features diagnostic of experiences in that category (binary alpha_{c,i} for i in Phi_K) and the attention to hedonic and event features (alpha_{c,i} for i in Phi_H union Phi_E) used when solving problems in that category. Examples in the paper: “buying” and “consuming” for riskless choice; “frequency estimation” and “agnostic inference” for statistical judgment.

Attention Weight (alpha_{P,i}): A scalar in [0,1] assigned to feature i of the current problem P. For hedonic features, alpha_{P,i}<1 collapses perceived variation toward the menu average; for event features, alpha_{P,i}<1 compresses perceived probabilities toward uniform. Full attention alpha_{P,i}=1 recovers expected utility. Attention weights are the endogenous output of the recognition step, not fixed preference parameters.

Contextual Similarity S: A separable function measuring how close the current problem (alpha_P, kappa_P) is to a category prototype (alpha_c, kappa_c). It decreases in discrepancies in the attention vector (measured by a strictly increasing, convex function d) and in discrepancies in the values of context features diagnostic of the category (d_i(kappa_{P,i}, kappa_{c,i}) * alpha_{c,i}). Endogenous attention to context is set to reduce sensitivity to discrepancies, not to eliminate them.

Mental Accounting (as categorization): In the paper’s account, non-fungibility, sunk cost fallacy, and opportunity cost neglect all arise because buying and consuming categories selectively attend to different monetary and quality features. The sunk cost effect is alpha_{buy,Delta_Q}=0; opportunity cost neglect is alpha_{con,Delta_M}=0. Mental accounts are not separate budget constraints but the by-product of category-specific attention profiles that were calibrated to normal-state experiences and do not generalize to shocks.

Bottom-up Salience: Exogenous attention to a feature driven by sensory prominence (described by alpha_{delta,i} in the problem’s presentation vector) or payoff contrast (the DM attends more to features where her option’s value deviates more from the menu average relative to total menu variance). Bottom-up salience raises baseline attention to a feature before top-down categorization acts, and can trigger a category switch by raising similarity to the category for which that feature is focal.

Gambler’s Fallacy via Question Substitution: In the model, the Gambler’s Fallacy arises when a long sequence length kappa_{P,N} causes recognition of the “agnostic inference” category, which focuses attention on the share of heads alpha_S=1. The decision maker then treats sequences as representatives of a “share of heads equivalence class,” and since the balanced class is larger than the unbalanced class, balanced sequences are assigned higher estimated probability. This is not a belief that the coin is unfair; it is question substitution induced by the inference representation.

A Heterogeneous Agent Model of Energy Consumption and Energy Conservation

Mon, 01 Jan 0001 00:00:00 +0000

Layer 1: Overview

Audzei and Sutóris ask whether inflation-targeting monetary policy affects households’ incentives to invest in energy conservation, and whether the standard central bank response to energy price shocks is welfare-optimal when agents are heterogeneous. They embed energy in both the consumption bundle and the production function of a tractable heterogeneous-agent New Keynesian (HANK) model that features Challe–Ravn–Sterk search-and-matching frictions in the labor market, nominal bond holdings, and — the paper’s central innovation — household-level energy conservation (abatement) capital that converts raw energy into energy services. The model is calibrated to the Czech Republic, with an energy share in household consumption of 10%, an energy share in production of 5%, a steady-state job-finding rate of 0.15 (targeting a poor hand-to-mouth share of 9%), and a capitalist share of 12%. The main quantitative findings are that a tighter monetary policy shock reduces abatement capital investment, increases the energy intensity of consumption, and depresses the job-finding rate, all of which fall disproportionately on lower-wealth households; conversely, a weaker policy response to a persistent energy price shock — one with a lower inflation coefficient (φ_π = 1.1 rather than the baseline φ_π = 2) — generates welfare gains for all agent groups (capitalists, employed workers, newly unemployed, long-term unemployed) despite higher measured inflation, because it preserves employment and stimulates abatement investment, reducing households’ long-run exposure to energy price shocks. The paper also shows that a “looking-through” policy (reacting to core rather than CPI inflation) does not deliver welfare benefits because it is too accommodative when energy prices rise but too restrictive once they start to fall; Ramsey-optimal policy instead features a sharp front-loaded rate spike followed by a rapid decline, minimizing aggregate consumption volatility through higher abatement capital.

In depth

Q1. What is energy conservation capital, how is it modeled, and why does it matter for the monetary policy transmission channel?

Energy conservation capital (abatement capital) is a durable investment good held by households that reduces raw energy required to produce a unit of energy service; because unemployed workers cannot afford it and its return competes with nominal savings, it creates a novel interaction between labor market outcomes and monetary policy. Households derive utility from a CES composite of non-energy consumption and energy services, where energy services are produced from raw energy multiplied by an efficiency factor that is increasing and concave in abatement capital: $E^s = f(K^e_{t-1}) E^r$, with $f(K^e) = \varphi_{1,e} (K^e)^{\varphi_{2,e}}$ and $\varphi_{2,e} = 2$. The elasticity of substitution between energy and non-energy goods is set to $\lambda_e = 0.3$, reflecting limited short-run substitutability. Abatement capital depreciates at 1% per quarter (equivalent to 4% annually, matching housing and heating systems lifetimes of ~25 years). Crucially, workers lose their abatement capital when they become unemployed (they move to a communal stock at the steady-state unemployed level $\bar{K}^e_u$), so abatement capital is not a precautionary savings vehicle and unemployed workers have no incentive to invest in it. Employed workers who optimally invest must account for the probability of becoming unemployed and therefore losing their capital. This structure means that monetary policy tightening — by raising unemployment and raising the return on nominal bonds — simultaneously pushes more workers into the non-investing unemployed pool and reduces the relative attractiveness of abatement investment for employed workers, raising the energy intensity of consumption.

Q2. What are the four agent types, and how do their asset positions differ?

The model compresses the household distribution into four types — employed workers, first-period unemployed, long-term unemployed, and capitalists — each with sharply different asset positions that determine how they are affected by monetary policy. Employed workers hold positive nominal bonds ($B’{e,t-1} > 0$) and invest in abatement capital ($K^e{e,t-1}$); they are the only group making active portfolio and investment decisions. First-period unemployed workers consume all their precautionary savings in a single period (their IMRS × R < 1) and receive 75% of unemployment benefits; they hold $B_{e,t-1} > 0$ (inherited from their last employed period) but make no new saving or abatement decisions. Long-term unemployed workers hold zero assets, receive full unemployment benefits indexed to the real wage, and maintain abatement capital at the fixed communal level $\bar{K}^e_u$. Capitalists ($\xi = 12%$ of population) own all firms, invest in productive capital and abatement capital, and are net borrowers in the steady state (rich hand-to-mouth in the Kaplan–Moll–Violante sense); they are subject to an endogenous discount factor that stabilizes the capital stock. Risk-sharing among employed workers — all employed household members pool their nominal bonds — enables tractability while preserving precautionary saving motives.

Q3. How does a monetary policy shock propagate through energy conservation decisions?

A 0.25 percentage-point positive monetary policy shock reduces abatement capital and raises energy intensity, operating through two reinforcing channels: the labor market channel (more unemployment, fewer households able to invest) and the intertemporal substitution channel (higher returns on nominal bonds reduce the relative attractiveness of abatement investment). Following the shock, the policy rate rise suppresses output and raises unemployment (Figure 3 of the paper). The increase in the job-destruction-net-of-finding probability $\omega(1-\eta_t)$ shifts more workers into the first-period unemployed pool, which carries no abatement investment. Among employed workers, the higher nominal bond return means that saving in bonds is relatively more attractive than investing in illiquid abatement capital, so their abatement holdings fall. The result is a rise in raw energy per unit of consumption, meaning the economy becomes more energy-intensive precisely when energy prices may also be elevated — a double vulnerability.

Q4. What are the welfare effects of different policy rules in response to a persistent energy price shock, and what are the magnitudes?

After a persistent hump-shaped energy price shock, welfare losses (measured as discounted infinite-horizon utility) are smaller for all agent groups under the weak-reaction policy (φ_π = 1.1, φ_y = 0) than under the baseline (φ_π = 2, φ_y = 0), even though inflation is higher under the weaker rule; the welfare gap is largest for employed workers and capitalists, and broadly preserved under alternative calibrations. Policies that react more weakly to inflation result in a smaller output recession and lower unemployment (Figures 7–9 of the paper). In the welfare simulation (Figure 9), all four agent types — capitalists, employed workers, newly unemployed, and long-term unemployed — show smaller welfare declines under the weak-reaction rule compared with baseline. Capitalists benefit because lower interest rates reduce their debt service and higher output raises firm profits. Employed and unemployed workers benefit primarily because of the higher job-finding rate, which lowers the probability of falling into the HtM state. Additionally, accommodative policy supports more investment in abatement capital, which reduces all agents’ long-run exposure to energy price fluctuations, further boosting welfare. The welfare ranking is robust to: (i) benefits fixed in nominal terms (narrower but preserved gap), (ii) more flexible wages (narrower gap; welfare ranking of capitalists reverses under flexible wages), and (iii) larger steady-state household savings (wider gap).

Q5. Why does the “looking-through” policy fail, and how does it differ from the weak-reaction policy?

The looking-through policy (φ_π = 2 on core inflation, ignoring energy-price CPI inflation) does not deliver welfare gains because it creates an asymmetric response profile: it is too accommodative during the energy price surge and too restrictive once energy prices start to fall, generating a welfare trajectory that is inferior to a consistently weaker policy. When energy prices are rising, CPI inflation exceeds core inflation; reacting only to core means the central bank does not raise rates as much as under the baseline, so the policy is more stimulative in the short term and supports output and abatement investment in the near term. However, once energy prices start declining, CPI inflation reverts to the steady state faster than core inflation (which is still elevated due to nominal rigidities), meaning the looking-through policy becomes more restrictive relative to the baseline at precisely the time when agents need support. The result is that long-run welfare, which discounts the entire future path, does not improve under looking-through relative to either the baseline or the weak-reaction rule. This finding provides an important caution against the standard “look through supply shocks” recommendation in a HANK environment with abatement capital.

Q6. What does Ramsey-optimal policy look like, and why does it differ from Taylor-type rules?

Ramsey-optimal policy — which minimizes the volatility of population-share-weighted aggregate utility — features a sharper and faster initial rate spike than the baseline Taylor rule, followed by a more rapid decline; it results in the highest abatement capital investment and lowest energy intensity of all policies considered. The Ramsey planner’s first-order conditions (solved with Dynare’s Ramsey tool, taking private-sector FOCs as constraints) imply that the policy rate peaks before the energy price shock itself peaks, reflecting the planner’s desire to front-load inflation stabilization while ensuring that rates fall quickly enough to not suppress abatement investment in the medium term. The Ramsey rate path is lower than the baseline Taylor rule after the shock peak. Compared with all Taylor-type rules, Ramsey policy results in the largest negative deviation in consumption energy intensity and the largest positive deviation in abatement capital (Figure 8). Ramsey policy also delivers the highest welfare for all agent groups (Figure 9), validating the intuition that protecting abatement investment is an important channel for central bank welfare optimization in this setting.

Q7. What is the role of heterogeneity in shaping these results, and what would be missed by a representative-agent model?

The distributional effects are essential to the paper’s core conclusions: a representative-agent model would miss the asymmetric impact of unemployment risk on energy conservation investment and would fail to generate the welfare reversal whereby a weaker inflation response dominates. Figure 6 of the paper shows the distributional responses to an energy price shock: capitalists reduce energy intensity the most because they can invest in abatement capital and their consumption is less constrained; employed workers also reduce energy intensity but less so; poor HtM households (unemployed workers) cannot adjust abatement capital and their energy intensity rises because the raw energy share in their limited consumption basket increases. The welfare comparison across agent types in Figure 9 shows that even newly unemployed workers — who lose their abatement investment and consume their precautionary savings — are better off under accommodative policy because the higher job-finding rate reduces the expected duration of unemployment. The key heterogeneity-driven mechanism absent from representative-agent models is the labor market channel: changes in unemployment risk affect who can and cannot invest in energy conservation, generating an indirect channel from monetary policy to aggregate energy intensity.

Q8. What are the model’s main limitations and scope conditions?

The paper abstracts from variable policy rule coefficients, wage-price spirals, unanchoring of inflation expectations, and open-economy dimensions beyond energy-price pass-through; the welfare ranking is conditional on the persistent energy price shock used for calibration and should not be extrapolated to short-lived or demand-driven inflation episodes. The authors explicitly note that the model operates under full-information rational expectations, which rules out the possibility that accommodation generates self-fulfilling inflation or credibility loss. Wage rigidity plays an important role: with more flexible wages, the welfare benefit of accommodative policy narrows and the capitalist welfare ranking reverses (baseline strict inflation targeting is preferred by capitalists). The “looking-through” and weak-reaction findings are specific to the persistent, hump-shaped energy price shock analyzed; for short-lived shocks the standard result (no reaction) would reassert itself. The model is also calibrated to the Czech Republic as a small open economy with above-average energy intensity; the qualitative conclusions extend to other European small open economies with similar energy share profiles, but quantitative magnitudes may differ.

Key Concepts

energy conservation capital (abatement capital) : a durable household investment good that converts raw energy into energy services more efficiently; modeled as $E^s = f(K^e_{t-1}) E^r$ with a quadratic abatement function; the level determines the energy intensity of consumption and is chosen optimally only by employed workers and capitalists.

energy intensity of consumption : the ratio of raw energy used to final consumption $E^r / C$; the paper’s key outcome variable for tracking how efficiently households use energy; a rise signals less efficient usage, a fall signals improved conservation.

looking-through policy : a monetary policy rule that reacts to core inflation (excluding energy) rather than CPI inflation, intended to avoid responding to transient supply shocks; the paper finds this does not improve welfare in a HANK setting because it creates an asymmetric response profile that is too accommodative when energy prices rise and too restrictive when they fall.

Ramsey-optimal policy : the interest-rate path that minimizes the volatility of population-share-weighted aggregate utility subject to the full set of private-sector equilibrium conditions; in this model it features a sharper front-loaded rate spike than Taylor-type rules followed by a rapid decline, and delivers the highest welfare for all agent groups by protecting abatement investment.

hand-to-mouth (HtM) households : households that are highly sensitive to income shocks but do not respond to interest rate changes as predicted by the Euler equation; in this model, poor HtM are both types of unemployed workers (zero savings, zero abatement investment), and rich HtM are capitalists (large debt, no labor income); their presence is central to the distributional welfare results.

search-and-matching frictions : the Challe–Ravn–Sterk labor market structure in which the job-finding rate $\eta_t$ is determined endogenously by the vacancy-unemployment ratio (Cobb-Douglas matching function) and job destruction is exogenous at rate $\omega$; this structure makes unemployment risk stochastic and endogenous to monetary policy, creating the key link between policy rates and energy conservation decisions.

A Housing Portfolio Channel of QE Transmission

Mon, 01 Jan 0001 00:00:00 +0000

Layer 1 — Overview

Research Question

This paper identifies and quantifies a housing portfolio channel of quantitative easing (QE) transmission that operates through household portfolio rebalancing toward second homes (as opposed to the well-studied bank credit channel). The central question is whether, and how much, the ECB’s formal adoption of QE in January 2015 induced households with larger pre-existing bond holdings to shift wealth into residential real estate—specifically second homes held for investment—and what the downstream effects on regional housing market outcomes were.

Setting and Motivation

Germany is used as the empirical laboratory because it experienced a sustained housing boom from 2009 onward that was not accompanied by a household credit boom—a “housing boom without a credit boom.” The national house price-to-rent ratio rose markedly from 2009, especially accelerating after QE adoption in 2015, while the stock of mortgage credit to households as a share of GDP was flat or declining. This decoupling makes Germany well-suited for isolating a non-credit portfolio rebalancing mechanism.

Data

Household-level data come from the Deutsche Bundesbank’s Panel on Household Finances (PHF), a triennial survey fielded in 2011, 2014, and 2017, from which the authors construct a panel of 1,651 households. The key exposure variable is each household’s pre-QE (2014) share of total wealth invested in bonds, both directly and indirectly via mutual funds and insurance. Regional housing outcomes (prices, rents, rental yields) are from Bulwiengesa AG for all 401 German administrative regions (Kreise) at annual frequency, and listing data come from Immoscout 24, Germany’s largest online real estate platform.

Methodology

The household-level analysis uses a difference-in-differences (DiD) specification comparing changes in housing portfolio shares between the pre-QE wave (2014) and the post-QE wave (2017), against the pre-period change (2011 to 2014), with the degree of exposure measured by the 2014 bond share. The specification includes household and time fixed effects. A parallel-trends check using all three survey waves (Figure 2) shows that more- and less-exposed households tracked identically before QE adoption, diverging sharply thereafter. Two indirect placebo tests—using households’ share in non-financial, non-housing assets as a spurious treatment, and using the change in non-financial assets as a spurious outcome—both return null results, supporting the identification assumption. For regional housing outcomes, the authors use a panel regression interacting lagged ECB debt-securities-to-GDP (the QE intensity measure) with a regional exposure variable—the 2008 pre-QE share of refugees housed in independent accommodations—across 401 regions from 2010 to 2017.

Main Findings with Quantitative Magnitudes

Benchmark portfolio rebalancing: A household with an ex-ante bond share that is 10 percentage points higher (roughly the interquartile range of the bond share distribution) increases its portfolio share of second homes by 1.72 to 1.87 percentage points more than a less-exposed household after QE adoption, conditional on household and time fixed effects. This result is statistically significant at the 1% level across multiple specifications and is robust to alternative bond share definitions, alternative portfolio denominators, and controlling for negative interest rate policy exposure (via initial deposit shares).
Equity rebalancing: Controlling for risk aversion does not attenuate the second-home result. Strikingly, households with larger ex-ante bond shares reduce, rather than increase, their equity shares after QE (coefficient: −0.042, significant at 5%), ruling out the interpretation that the housing result merely picks up broad rebalancing toward all risky assets. This implies that cash purchases of second homes are funded by liquidating bonds, drawing down deposits, and also selling equities.
Heterogeneity—household characteristics: Rebalancing is stronger for (a) bank-advised households (triple-interaction significant at 5%), (b) financially more literate households (significant at 1%), and (c) households aged 40–60 (significant at 5%), consistent with a lifetime-income-peak, tax-optimization motive rather than a bequest motive. The result for age 61+ is positive but statistically insignificant.
Tax-motive heterogeneity: In Germany, rented-out second homes (or those declared for future letting) benefit from substantial tax deductions not available for owner-occupied primary residences, with the advantage rising in marginal tax rates. Rebalancing is stronger for higher-income households (triple interaction with income per capita positive and significant, especially after controlling for deposit shares) and for church-affiliated households, who face an additional 8–9% church tax surcharge on their regular tax bill, amplifying the tax gain from rental property deductions. For church members, the income-interaction triple coefficient is statistically significant; for non-church members it is not, directly linking the rebalancing gradient to the church tax burden.
Buy-to-let motive: The benchmark result is driven entirely by households that already owned a second home in the pre-QE period and were generating rental income from it (coefficient 0.821, significant at 1%); households without a pre-owned second home show a near-zero, statistically insignificant coefficient (0.000). This establishes that the rebalancing is driven by experienced buy-to-let investors, not vacation-home buyers or commuters.
Credit channel control: The portfolio rebalancing result is not driven by credit access or credit growth. The triple interactions of the bond-share × Post term with both (a) pre-QE leverage (mortgage credit to housing wealth) and (b) post-QE mortgage credit growth are statistically insignificant. Restricting the sample to households with no mortgage credit growth leaves the main coefficient essentially unchanged (0.175, significant at 1%). Nonetheless, an independent credit-channel effect is also present: mortgage credit growth has its own positive and significant effect on second-home share increases, confirming the two channels operate in parallel but independently.
Regional housing market outcomes—prices and yields: In regions more exposed to rental market tightness (higher refugee-in-independent-accommodation share), QE is associated with larger declines in rental yields. A one-standard-deviation increase in QE (approximately 4.3 pp higher ratio of ECB debt securities to GDP) reduces the rental yield in the 75th-percentile-exposure region relative to the 25th-percentile region by 2 to 12 basis points per year (depending on whether the refugee share or the renter share is used as the exposure measure). As ECB holdings rose from 7% of GDP in 2014 to 24% in 2017, the cumulative implied rental yield decline at the regional interquartile range is 8 to 48 basis points, sizable relative to the average regional rental yield decline of 140 basis points (from 7.4% to 6.0%) over the same period. House prices increase more than rents in more exposed regions.
Regional housing market outcomes—listings: Using Immoscout 24 data, both sale and rental listings decline in more exposed regions as QE expands, but the ratio of sale to rental listings falls significantly: sale listings decrease significantly more than rental listings in more exposed regions. This relative shift in supply toward the rental market is interpreted as evidence consistent with the buy-to-let motive documented at the household level and as potentially having benign implications for housing affordability through increased rental supply.

Scope Conditions

All household-level findings are conditional on the German institutional setting: Germany’s combination of a low-homeownership norm, substantial tax incentives favoring rental properties, triennial household survey data spanning one pre- and one post-QE wave, and a housing boom that was decoupled from household credit prior to 2015. The regional results apply to 401 German administrative regions (Kreise) over 2010–2017, using exposure instruments that are argued to capture rental-market tightness or depth rather than direct household bond holdings.

In depth

Q1. What is the housing portfolio channel of QE transmission, and how does it differ mechanically from the credit channel?

A: In the housing portfolio channel, the ECB’s bond purchases reduce the net supply of bonds available to private investors, raising bond prices and reducing expected bond returns. Under the assumption that bonds and houses are substitutes in household portfolios, households with larger initial bond positions rebalance toward housing to restore their target allocation, bidding up house prices. This mechanism operates through changes in risk premia rather than through future short-term rates or bank reserves and loan supply. The credit channel, by contrast, operates through increased bank reserves enabling expanded mortgage lending. The authors show empirically that the two channels operate in parallel and independently, but that greater prior credit access and post-QE mortgage credit growth do not amplify the portfolio rebalancing effect.

Q2. What is the key exposure variable and why is it a valid identification strategy?

A: The exposure variable is each household’s 2014 (pre-QE) share of total wealth invested in bonds, including both direct holdings and indirect holdings via mutual funds and insurance companies. The logic, drawn from the bank-portfolio-rebalancing literature (Rodnyansky and Darmouni, 2017; Luck and Zimmermann, 2020) and from the authors’ own portfolio model, is that the larger a household’s bond share, the stronger its incentive to rebalance when the central bank reduces bond supply. Identification rests on the parallel-trends assumption: Figure 2 shows that before 2015, more- and less-exposed households (defined by a median split on the 2014 bond share) followed identical trends in second-home shares; the trends diverge sharply post-QE. Two indirect placebo tests corroborate this: using a spurious treatment variable (non-financial, non-housing asset share) and using a spurious outcome (change in non-financial, non-housing asset share) both yield null results.

Q3. What is the benchmark magnitude of the portfolio rebalancing effect and how robust is it?

A: A 10-percentage-point higher 2014 bond share (the approximate interquartile range) is associated with a 1.72–1.87 percentage point larger increase in the second-home portfolio share post-QE relative to the pre-QE period (Table 3, columns 1–2, significant at 1%). This result is robust to: scaling second-home shares by a model-consistent denominator (bonds + housing + deposits, column 3); using total housing wealth instead of second-home wealth alone (column 4); using the count of second homes rather than their value share to rule out valuation-effect confounds (column 5); using direct bond holdings without imputation, or indirect holdings only, as alternative exposure measures (columns 7–8, where the coefficients are if anything larger at 0.403 and 0.420); controlling for a broad set of time-varying household characteristics including net worth, age, household size, financial literacy, and risk aversion (Table 4, range 0.19–0.23); and explicitly controlling for the deposit-share post-interaction to rule out the negative interest rate policy as a driver (column 6, main bond coefficient unchanged at 0.122).

Q4. Do households with higher bond exposure also rebalance toward equities after QE?

A: No. Column (7) of Table 4 shows that households with larger ex-ante bond shares reduce their equity shares after QE adoption (coefficient: −0.042, significant at 5%). This rules out the interpretation that the second-home finding merely captures broad rebalancing toward all risky assets due to general risk-appetite changes. Combined with the evidence that deposit shares also decline (though not precisely estimated), the result implies that households fund second-home purchases by selling bonds, drawing down deposits, and reducing equity positions.

Q5. Which household characteristics amplify the rebalancing, and what do they reveal about the mechanism?

A: Five characteristics are shown to amplify rebalancing (Table 5 and Table 7): (1) being actively advised by a bank on asset allocation (triple interaction significant at 5%), consistent with banks that own real estate agencies steering clients toward property; (2) higher financial literacy (significant at 1%), consistent with more informed investors acting more quickly on QE-induced return differentials; (3) middle age (40–60), significant at 5%, but not older age (61+), ruling out bequest motives and pointing to households near their lifetime income peak optimizing their tax burden; (4) higher income per capita (positive and significant, especially among church members), reflecting the progressive German tax schedule that makes property-related deductions more valuable; and (5) church affiliation (the income-triple interaction is significant only for church members, who face an 8–9% church tax surcharge, amplifying the tax advantage of rental property ownership). Tenure status (renter vs. owner of main residence) shows that both groups rebalance, but the triple interaction is significant only at 10%, suggesting the effect is not confined to existing homeowners.

Q6. How is the buy-to-let motive established directly in the data, as opposed to vacation-home or commuter motives?

A: The authors use variation in whether households owned a second home and generated rental income from it before QE adoption (Table 8). Households that owned a second home and reported rental income in the pre-QE wave rebalance very strongly (coefficient 0.821 on Bonds × Post, significant at 1%). Households that owned a second home but did not generate rental income show a positive but imprecisely estimated coefficient (0.641, significant at 10% in a very small sub-sample of 138 households). Critically, households that did not own any second home prior to QE show a coefficient of essentially zero (0.000). This pattern establishes that rebalancing is driven by experienced buy-to-let investors rather than by households acquiring second homes for personal use, and is consistent with the income-seeking motive documented in the Australian context by Gargano and Giacoletti (2022).

Q7. How does the paper demonstrate that the effect is independent of the credit channel, while also acknowledging the credit channel operates?

A: The paper employs three complementary tests (Table 6). First, triple interactions of the Bonds × Post coefficient with pre-QE leverage (mortgage-to-housing-wealth ratio) and with post-QE mortgage credit growth are both statistically insignificant (columns 5–6 of Table 5), meaning that greater credit access does not amplify the bond-share rebalancing effect. Second, restricting the sample to households with zero mortgage credit growth between 2014 and 2017 leaves the main coefficient unchanged at 0.175 (column 1 of Table 6). Third, including the two credit variables as additional controls only marginally reduces the bond-share coefficient without affecting its significance (columns 2–3 of Table 6). At the same time, column 3 of Table 6 shows that mortgage credit growth does have its own statistically significant positive effect on second-home shares (coefficient 0.009, significant at 1%), confirming a separate, independently operating credit channel.

Q8. How is regional exposure to the channel proxied, given that household survey data cannot be aggregated to the regional level?

A: Because the 1,651-household panel provides only 3–4 observations per region on average across 401 German Kreise, the authors cannot construct representative regional averages of household bond shares. Instead, they use the pre-QE (2008) share of refugees housed in independent accommodation in each region as developed by Bednarek et al. (2021), arguing that a larger refugee share creates tighter rental housing market conditions and therefore makes buy-to-let investment more attractive. For robustness, they also use the 2011 census share of renters in each region as an alternative measure of rental market depth. Both regional exposure variables take higher values in urban areas (refugee share: 21% urban vs. 10% rural; renter share: 70% urban vs. 46% rural), consistent with household-level rebalancing being stronger in urban regions.

Q9. What are the quantitative effects on regional rental yields, house prices, and rents?

A: Table 9 shows that a one-standard-deviation increase in QE (approximately 4.3 percentage points higher ECB debt securities-to-GDP ratio) reduces the rental yield in a region at the 75th percentile of the refugee-share exposure distribution relative to the 25th percentile by 2 basis points per year (using the refugee share) to 12 basis points per year (using the renter share). Comparing the 5th vs. 95th percentile of exposure, the yield differential is 5–24 basis points per year. Over the full 2014–2017 QE expansion (from 7% to 24% of GDP), the cumulative implied rental yield decline at the interquartile range of exposure is 8 to 48 basis points—sizable relative to the average regional decline of 140 basis points. House prices increase more than rents in more exposed regions. Using the Campbell-Shiller decomposition, about 70% of return variation is attributable to future price-to-rent increases, 36% to lower future rent growth (consistent with more rental supply), and only 5% to discount rate differentials.

Q10. What do the listing data reveal about the supply implications of the channel?

A: Table 10 shows that QE reduces both sale and rental listings in more exposed regions (both significant at 1%), consistent with the aggregate national decline visible from 2015 onward. Critically, the ratio of sale listings to rental listings declines significantly in more exposed regions: sale listings fall more than rental listings (columns 3 and 6, significant at 1% with both exposure measures). This relative shift implies that the share of properties available for rent increases relative to properties available for sale in regions more exposed to the portfolio rebalancing channel, providing evidence of an expanded rental supply. This finding is interpreted as a potentially beneficial side effect of QE-induced buy-to-let investment for housing affordability, to the extent that a larger rental supply mitigates rent increases even as house prices rise.

Q11. What is the theoretical model underlying the empirical analysis?

A: The model (Appendix C) features a representative local household with mean-variance preferences managing a portfolio of bonds, housing, and cash (equities are omitted for tractability). Preferred habitat investors segment both the national bond market and the local housing market. QE reduces the fixed net supply of bonds, raising bond prices and reducing expected bond returns. Under the substitutability of bonds and houses, households rebalance toward housing to restore optimal allocation, bidding up house prices; the larger the initial bond share, the larger the required rebalancing. Housing supply constraints determine how much rebalancing depresses expected housing returns (rental yields). The model does not unambiguously predict the response of the cash (deposit) share, motivating the empirical investigation reported in column (6) of Table 3.

Q12. What are the aggregate household balance sheet patterns consistent with the individual-level results?

A: Table 1 shows that Germany’s aggregate household real estate share rose from 55% of total assets in 2014 to 56–57% in 2017–2018, while the bond share declined by roughly 0.5 percentage points. The homeownership rate declined by about 2 percentage points over the sample period (from 52.5% in 2014 to 51.4–51.5% in 2017–2018), consistent with an increasing share of landlords and renters—which is compatible with the buy-to-let mechanism since more than 60% of German renters lease from other households. Household leverage also declined (loans-to-assets from 13% in 2014 to 12% in 2017), consistent with portfolio rebalancing rather than credit-driven housing acquisition. The deposit share remained constant over the period, weighing against the negative-interest-rate policy as a driver of portfolio rebalancing.

Key Concepts

Housing portfolio channel of QE transmission: The paper’s central concept—a mechanism by which central bank bond purchases (QE) induce households holding bonds to rebalance their portfolios toward second homes held for investment (buy-to-let), operating through changes in risk premia (bond prices and expected returns) rather than through bank lending channels or future short-term interest rates.

Ex-ante bond share (QE exposure measure): Each household’s share of total wealth invested in bonds (direct holdings plus indirect holdings via mutual funds and insurance) measured in the 2014 pre-QE survey wave. Used as a continuous household-level treatment intensity: the larger this share, the stronger the portfolio pressure to rebalance when the ECB reduces bond supply to the private sector. Corresponds roughly to 10 percentage points per interquartile range.

Buy-to-let motive: In the paper’s usage, the investment purpose of purchasing second homes specifically to rent them out—or to declare them for future letting—in order to exploit Germany’s substantial tax advantages for rented properties (depreciation allowances, deductibility of mortgage interest, management costs, and property taxes against rental income), which are unavailable for owner-occupied primary residences. Distinguished from vacation-home or commuter motives by the presence of pre-QE rental income.

Segmented housing markets / preferred habitat investors: Assumptions embedded in the paper’s theoretical model (following Flavin and Yamashita, 2002; Gete and Reher, 2018; Greenwald and Guren, 2021) that local real estate markets are insulated from national or international housing markets, and that some investors have a binding preference to hold bonds or local housing, so that QE-induced price changes in the bond market are not fully arbitraged away by shifting into liquid alternatives.

Parallel trends (DiD validity): The identifying assumption that, absent QE, households with larger and smaller initial bond shares would have followed the same trajectory in their second-home portfolio shares. The paper documents this graphically using all three survey waves (Figure 2) and supports it with two indirect placebo tests involving unrelated treatment and outcome variables.

Regional rental yield: The rent-to-price ratio at the regional (Kreise) level, derived from Bulwiengesa data. Used as the primary regional outcome variable because it jointly captures discount rate, rent-growth, and price-to-rent dynamics. A Campbell-Shiller decomposition decomposes its predictive content into three components: discount rates (5%), future rent growth (36%), and future price-to-rent ratio changes (70%) in the German regional panel.

Sale-to-rental listing ratio: The ratio of sale listings to rental listings for apartments on Immoscout 24, used as a quantity-side outcome variable. A decline in this ratio in more-exposed regions is interpreted as evidence of a relative increase in rental supply, consistent with the buy-to-let motive and with potentially beneficial implications for housing affordability.

Church tax (Kirchensteuer): A German institutional feature—formally affiliated church members pay an additional 8–9% surcharge on their regular income tax bill (varying by state). Because the tax advantage of owning rental property is proportional to the marginal tax rate, church members face a higher effective marginal tax rate and thus derive larger tax benefits from buy-to-let investment, producing stronger QE-induced portfolio rebalancing for this sub-group.

A Model of Multiple Hypothesis Testing

Mon, 01 Jan 0001 00:00:00 +0000

This paper develops an economic framework for determining when and how much multiple hypothesis testing (MHT) adjustment is warranted in research settings. The research question is: under what conditions do MHT adjustments arise as an optimal solution to incentive misalignment between a researcher and a mechanism designer (social planner)?

The model is a two-stage game. In the first stage, a benevolent social planner commits to a hypothesis testing protocol. In the second stage, a researcher decides whether to conduct a pre-specified experiment based on private costs and benefits. The planner’s utility function combines an ambiguity-averse (maximin) component—limiting harm from mistaken conclusions—with an expected-utility component capturing the generic benefits of research production. The framework focuses on multiplicity arising from testing multiple treatments or estimating effects within multiple subpopulations; multiple outcomes are treated as an economically distinct case covered in a companion paper.

The main theoretical result is that separate t-tests are uniformly globally optimal under linearity of the researcher’s payoff and welfare functions and normality of test statistics. The optimal critical value takes the explicit form: t(J, Σ) = Φ⁻¹(1 − C(J, Σ) / (b · |J|)), where |J| is the number of hypotheses, C(J, Σ) is the experiment cost, and b is the researcher’s per-rejection benefit. This formula nests two limiting cases. When costs are fully fixed (invariant to |J|), the formula delivers a Bonferroni correction. When costs scale proportionally with the number of hypotheses, no MHT adjustment is warranted—because the researcher already faces sufficient deterrent from the incremental cost of each additional test.

The key economic mechanism is as follows. In the worst states of the world (where all treatments are harmful relative to the status quo), a research study has only downside risk for society. The planner must keep the researcher’s expected payoff from false positives low enough that she chooses not to experiment. If critical values were invariant to |J|, for sufficiently many hypotheses the researcher’s expected payoff from false positives alone would exceed costs, inducing unwanted experimentation. Some upward adjustment to critical values (i.e., tighter thresholds) is therefore generically optimal. The same logic implies that critical values should also adjust for sample size, since larger samples raise costs.

The framework is calibrated to two empirical applications. For FDA clinical trial approval, using Sertkaya et al. (2016) data on approximately 31,000 U.S. pharmaceutical trials (2004–2012), fixed costs constitute approximately 46% of average total trial cost. At a benchmark significance level of 5% and benchmark sample size, the optimal level is approximately 3.2% for two tests, 2.6% for three tests, and asymptotes to approximately 1.4% as |J| → ∞. Sidak’s correction yields 2.5% and 1.7% for two and three tests respectively, and tends to zero as |J| → ∞—more conservative than the model implies. Optimal adjustments must also be less conservative for larger samples to preserve researcher incentives to bear the correspondingly larger costs.

For program evaluation in development economics, the paper uses a unique dataset of funding proposals submitted to J-PAL from 2009 to 2021. The estimated cost elasticity with respect to the number of treatment arms ranges from 0.13 to 0.22 (p < 0.05), indicating costs rise significantly but far less than proportionally. The implied optimal significance levels are slightly less conservative than Bonferroni/Sidak corrections but more conservative than unadjusted testing.

Scope conditions: the framework assumes pre-specified experiments (no p-hacking), linear payoffs, normally distributed statistics, and a researcher whose preferences are common knowledge. The analysis focuses on multiple treatments and subpopulations, not multiple outcomes. Results extend to imperfectly informed researchers and heterogeneous variances.

Q: What is the core mechanism by which MHT adjustments arise as optimal in this framework? A: The planner must deter experimentation in the worst-case states—those where all treatments are harmful. If the testing protocol did not adjust for the number of hypotheses, a researcher testing sufficiently many hypotheses could earn enough expected payoff from false positives alone to justify experimentation, even when all treatments are truly harmful. Tighter critical values (higher thresholds) reduce the probability of false positives and thus cap the researcher’s expected payoff in the null space, deterring unwanted experimentation. This is the maximin optimality condition: the researcher’s expected payoff must be non-positive over the null space.

Q: What are the two limiting cases of the optimal critical value formula, and what do they correspond to? A: The optimal level of the separate t-tests is α(J, Σ) = C(J, Σ) / (b · |J|). When C(J, Σ) = ᾱ (costs are fixed, invariant to the number of hypotheses), this reduces to ᾱ/|J|, the Bonferroni correction. When C(J, Σ) = ᾱ · |J| (costs scale proportionally with the number of hypotheses), the optimal level equals ᾱ regardless of |J|—no MHT adjustment is warranted. The intuition for the second case is that proportional costs already deter excess testing; the researcher has no undue incentive to test many hypotheses because each additional test costs the same incremental amount.

Q: Why do optimal critical values also depend on sample size, and what is the policy implication? A: Since research costs C(J, Σ) increase with sample size (Σ captures design features including sample size), the optimal test level α(J, Σ) = C(J, Σ)/(b·|J|) rises with sample size. Equivalently, larger studies warrant less conservative significance thresholds. The policy implication is that a single uniform correction (e.g., Bonferroni at the 5% level) applied without regard to sample size is suboptimal: it is too conservative for large studies, which would over-deter valuable high-powered research.

Q: What are the two optimality properties required of protocols in the paper’s main characterization? A: The paper shows (Proposition 3.1) that a protocol is uniformly globally optimal—optimal for all values of the welfare weight λ and prior π—if and only if it is both maximin optimal and unbiased. Maximin optimality (Proposition 3.2) requires two conditions: the researcher’s expected payoff must be non-positive over the null space (deterring experimentation when all treatments are harmful), and expected welfare must be non-negative when some treatments are beneficial. Unbiasedness requires that the researcher’s maximum power strictly exceeds the test size, ensuring that experimentation is motivated when treatments are genuinely beneficial.

Q: How does the paper rationalize conventional hypothesis testing asymmetry (type I vs. type II error weighting) without extreme restrictions? A: In Tetenov (2012), justifying 5%-level testing with minimax regret in a single-agent model requires the decision-maker to place 102 times more weight on type I than type II regret—an extreme restriction. In this paper, the asymmetry arises naturally from the planner’s desire to prevent harmful treatment implementation: the planner is willing to forgo some power (probability of detecting beneficial treatments) to ensure that harmful treatments are not implemented. The researcher’s private incentives and the planner’s objective diverge in a way that makes tight size control endogenously optimal.

Q: What does the FDA empirical calibration imply quantitatively about optimal versus standard adjustments? A: Using Sertkaya et al. (2016) data showing that fixed costs are 46% of average total trial cost for U.S. pharmaceutical trials, and using Pocock et al. (2002) to set J̄ = 3 (average number of subgroups), the paper calculates that at a benchmark level of ᾱ = 0.05: the optimal level is approximately 3.2% for two tests, 2.6% for three tests, and asymptotes to approximately 1.4% as |J| → ∞. By contrast, Sidak’s correction yields 2.5%, 1.7%, and zero, respectively. Both the unadjusted 5% and the Sidak/Bonferroni levels are therefore suboptimal—the unadjusted level is too permissive while standard FWER corrections are too conservative.

Q: What do the J-PAL data reveal about optimal MHT adjustment in program evaluation? A: Using the universe of J-PAL funding proposals from 2009 to 2021, the paper estimates the cost elasticity with respect to the number of treatment arms to be 0.13–0.22, which is statistically significant (p < 0.05) but far below 1 (the proportional case). This means costs rise with arms but much less than proportionally. As a result, optimal significance levels for program evaluation studies are slightly less conservative than Sidak/Bonferroni corrections (e.g., approximately 3.8–4.5% versus 2.5% at a two-arm study with ᾱ = 5%) but more conservative than unadjusted testing. The testing thresholds also vary moderately with sample size, with larger samples implying less conservative procedures.

Q: When are cross-study MHT adjustments warranted according to the framework? A: Cross-study MHT adjustments are warranted only when there are cost complementarities across those studies. If studies are conducted independently with separate cost structures, each study’s costs do not depend on the number of hypotheses tested in other studies, so no cross-study adjustment is optimal. This provides a principled resolution to the disputed question of whether researchers should correct for tests performed in other papers.

Q: When is FWER control (e.g., Bonferroni or Sidak) the appropriate form of MHT adjustment? A: Appendix B.2 shows that FWER control is appropriate when the researcher’s payoff is nonlinear—specifically when the researcher requires at least one positive finding to receive any benefit (e.g., to publish). In the baseline linear payoff model, average size control (Bonferroni) is the correct adjustment only when all costs are fixed. The broader insight is that the form of compound error control—whether average error rate or FWER—is itself determined by economic fundamentals rather than being a statistical choice made in advance.

Q: How does the paper extend to cases of heterogeneous variances across hypotheses? A: Proposition 5.2 shows that under heterogeneous variances, the optimal protocol uses separate t-tests based on sample-equalizing allocations—dividing the sample equally across treatment arms—with critical values t*(J, n(J)) = Φ⁻¹(1 − C(J, n(J))/(b·|J|)), where n(J) is the total sample size. This protocol remains maximin optimal and unbiased, preserving the main qualitative results.

Q: What does the paper contribute relative to Tetenov (2016) on single-hypothesis testing? A: Tetenov (2016) showed that in the single-hypothesis case, separate t-tests are maximin optimal and uniformly most powerful (UMP) unbiased. This paper extends that result to multiple hypotheses, but two major complications arise: first, maximin optimality in the multi-hypothesis case requires verifying that welfare is non-negative even when treatment effects have opposite signs, which requires a non-trivial argument absent in the single-hypothesis case; second, no protocol is UMP unbiased in the multi-hypothesis case, so the paper develops a weaker notion of unbiasedness (power exceeding size) that is sufficient to motivate experimentation.

Q: Why do multiple outcomes require different procedures than multiple treatments or subpopulations? A: Multiple outcomes and multiple treatments are economically distinct types of multiplicity. For multiple outcomes that are noisy proxies for a common underlying quantity, the optimal rule tests an index formed using statistical weights (as in Anderson, 2008). When outcomes capture distinct components of the planner’s utility, economic weights are appropriate. In contrast, multiple treatments or subpopulations lead to separate t-tests with cost-adjusted critical values. Conflating these two forms of multiplicity leads to incorrect inferences about what procedures are appropriate.

Maximin optimality: A hypothesis testing protocol is maximin optimal if it maximizes the planner’s worst-case welfare across all parameter values, equivalent to two conditions: deterring researcher experimentation over the null space (where all treatments are harmful), and ensuring non-negative expected welfare when some treatments are beneficial.

Unbiasedness (in the paper’s sense): A protocol is unbiased if the researcher’s maximum achievable power strictly exceeds the test size, ensuring that experimentation is motivated when treatments are genuinely beneficial. This is a weaker condition than UMP unbiasedness, which does not exist in the multi-hypothesis case.

Uniform global optimality: A protocol is uniformly globally optimal if it maximizes the planner’s objective for all values of the welfare weight λ ≥ 0 and all priors π over the parameter space, making it robust to uncertainty about the relative importance of deterrence versus research motivation.

MHT correction factor: Defined as C(J, Σ) / (C̄ · |J|), this factor captures how the cost per test varies as the number of hypotheses grows. It equals 1/|J| (Bonferroni) when all costs are fixed, and equals 1 (no correction) when costs are proportional to the number of tests; the empirically appropriate correction lies strictly between these extremes.

Cost function C(J, Σ): The private cost borne by the researcher for conducting the experiment, which depends on both the set of treatments J and the experimental design Σ (including sample size). The degree of optimal MHT adjustment is a direct function of how this cost varies with the number of hypotheses tested.

Global null space Θ₀(J): The set of parameter vectors θ for which the welfare effect of implementing any combination of treatments is strictly negative—i.e., the status quo of no treatment dominates all interventions. Maximin optimality requires deterring researcher experimentation over this set.

Cost complementarities across studies: Cost structures in which conducting multiple studies together is cheaper than conducting them separately. Cross-study MHT adjustments are warranted if and only if such complementarities exist; absent complementarities, each study’s optimal threshold is set independently of others.

A Robust Test for Weak Instruments for 2SLS with Multiple Endogenous Regressors

Mon, 01 Jan 0001 00:00:00 +0000

This paper develops a test for instrument strength based on the bias of two-stage least squares (2SLS) that: (1) generalizes the Stock-Yogo (2005) and Sanderson-Windmeijer (2016) tests to be robust to heteroskedasticity and autocorrelation (HAC), and (2) extends the Montiel Olea-Pflueger (2013) robust test from models with a single endogenous regressor to models with multiple endogenous regressors—the important remaining gap identified by Andrews et al. (2019). The test is based on a weighted quadratic loss in the asymptotic bias of 2SLS and can use either the Stock-Yogo absolute bias criterion or the 2SLS bias relative to Montiel Olea-Pflueger’s worst-case benchmark. Extensions are developed to test whether instruments are weak for individual 2SLS coefficients. In simulations, the test controls size and is powerful, and the authors provide efficient code packages. The test is applied to state-dependent fiscal multipliers (Ramey-Zubairy 2018).

In depth

Q1. What is the key gap in the existing weak instrument testing literature that this paper fills?

The key gap is the absence of a test for weak instruments that is both HAC robust and applicable to models with multiple endogenous regressors. Stock-Yogo (2005) requires conditionally homoskedastic and serially uncorrelated (CHSU) errors. Montiel Olea-Pflueger (2013) introduced a HAC-robust effective F-statistic for a single endogenous regressor but their test does not extend to multiple regressors. Sanderson-Windmeijer (2016) addressed multiple endogenous regressors but retained the CHSU assumption. This paper combines HAC robustness with multiple-regressor generality, filling the gap Andrews et al. (2019) identify as the most important remaining open problem in the literature.

Q2. What is the test statistic and what are its two bias criteria?

The test statistic is based on a weighted quadratic loss in the asymptotic bias of the 2SLS estimates when first-stage coefficients are close to zero, with two criteria: (i) the absolute bias criterion of Stock-Yogo (2005)—the 2SLS bias relative to the maximum OLS bias; and (ii) the 2SLS bias relative to Montiel Olea-Pflueger’s (2013) worst-case benchmark. The test accommodates both the Stock-Yogo setting (instruments weak because the first-stage coefficient matrix is near rank zero) and the Sanderson-Windmeijer setting (instruments weak because the first-stage coefficient matrix is near having a rank reduction of one rather than near rank zero).

Q3. What extensions are provided for individual coefficient testing?

Extensions are developed to test whether instruments are weak for individual 2SLS coefficients, by applying the test to a transformed regression that isolates the coefficient of interest, accommodating the Sanderson-Windmeijer (2016) setting in which one regressor is locally under-identified while others may not be. This is important in practice because researchers with multiple endogenous regressors often care about whether instruments are weak for each coefficient separately, not just for the system as a whole; the extension provides a formal basis for this common applied practice.

Q4. What does the empirical application show?

The paper demonstrates the testing procedures in the context of estimating state-dependent fiscal multipliers as in Ramey and Zubairy (2018), where the two endogenous regressors are lagged spending interacted with a state variable (recession/expansion indicator), illustrating both the implementation of the test and how inference differs from relying on CHSU-based critical values. In simulations, the test controls size accurately and is powerful against alternatives where instruments are strong, providing a reliable and practically useful tool with efficient code packages distributed for applied researchers.

Key concepts

weak instruments test : a test assessing whether the first-stage regression is sufficiently strong to make 2SLS inference reliable; based on the maximum bias of 2SLS relative to a benchmark; weak instruments cause 2SLS to inherit the bias of OLS. HAC robustness : robustness to heteroskedasticity and autocorrelation; absent from Stock-Yogo (2005), meaning researchers who use their critical values while allowing for HAC errors in second-stage inference apply mismatched validity assumptions. effective F-statistic : the statistic introduced by Montiel Olea and Pflueger (2013) for HAC-robust weak instruments testing with a single endogenous regressor; generalized in this paper to the multiple-regressor setting. absolute bias criterion : the criterion that the 2SLS relative bias (standardized absolute bias) is below a threshold; equivalently, the 2SLS bias as a proportion of the maximum OLS bias; defined by Stock-Yogo (2005) and generalized here to the HAC-robust multi-instrument setting.

A Tale of Two Bailouts and Their Impact on Subprime Consumer Debt

Mon, 01 Jan 0001 00:00:00 +0000

This paper examines the effects of the Troubled Asset Relief Program (TARP) and the Paycheck Protection Program (PPP)—two government bailout programs during the Global Financial Crisis and the COVID-19 crisis, respectively—on subprime consumer debt, using over 11 million credit bureau observations of individual consumer debt combined with banking, bailout, and local market data. TARP and PPP are found to have opposite effects: subprime consumers in markets with more TARP institutions experienced significantly increased debt burdens following the bailouts, while PPP was associated with reduced subprime consumer debt. Both programs are treated as quasi-natural experiments due to their rapid, largely unanticipated assembly. The findings yield policy implications regarding bailout structures and the conditions attached to bailout funds.

In depth

Q1. What are the two bailout programs studied and why are they treated as natural experiments?

TARP (2008) and PPP (2020) are treated as quasi-natural experiments because they were assembled quickly during crisis conditions and were largely unanticipated, providing relatively exogenous financial shocks to markets based on the presence of eligible institutions, rather than on prior local demand for credit. Both programs had distinct structures and intended targets—TARP aimed at stabilizing financial institutions directly, while PPP aimed at supporting small business payrolls to prevent employment losses—making their differential effects on subprime consumer debt informative about the channels through which bailout design matters.

Q2. How did TARP affect subprime consumer debt and why?

Subprime consumers in markets with more TARP institutions had significantly increased debt burdens following TARP, consistent with a channel in which bank stabilization via TARP relaxed credit supply conditions (especially for lower-quality borrowers) or with a moral hazard channel in which TARP-recipient banks extended credit more aggressively knowing they had government backing. Subprime mortgages played a central role in the buildup to the GFC, growing from 2.5% to 8.4% of mortgage balances outstanding between 2001 and 2007; the finding that TARP increased rather than reduced subprime debt burdens raises concerns about whether bank stabilization programs sufficiently constrain the subsequent lending behavior of recipient institutions.

Q3. How did PPP affect subprime consumer debt and why?

PPP was associated with reduced subprime consumer debt, consistent with a channel in which the payroll support prevented the expected wave of unemployment-driven debt distress and credit score deterioration that would otherwise have converted prime consumers into subprime borrowers during the COVID-19 crisis. Prior to PPP, the COVID-19 recession—with unemployment peaking at 14.7% in April 2020—was expected to cause a ballooning of subprime consumer debt; the failure of this ballooning to materialize and the actual decline in subprime debt is attributed in part to PPP’s employment and income support function.

Q4. What are the policy implications for bailout design?

The opposite effects of TARP (which increased subprime debt) and PPP (which reduced it) yield policy implications for bailout structures and the conditions attached to bailout funds: bailouts directed at banks without explicit restrictions on subsequent lending behavior may inadvertently stimulate the accumulation of high-risk household debt, while bailouts directed at supporting household incomes and employment may reduce systemic credit risk. These findings suggest that the distribution channel of bailout funds (through banks vs. directly to households and employers) has first-order effects on the resulting debt accumulation and credit risk in the household sector.

Key concepts

TARP (Troubled Asset Relief Program) : the 2008 U.S. government program that provided capital injections to financial institutions during the Global Financial Crisis; found in this paper to be associated with increased subprime consumer debt burdens in affected markets. PPP (Paycheck Protection Program) : the 2020 U.S. government program that provided small business loans/grants to support payrolls during the COVID-19 crisis; found in this paper to be associated with reduced subprime consumer debt, opposite to TARP’s effect. subprime consumer debt : obligations of consumers with low credit scores; the paper’s key outcome measure; elevated levels associated with systemic credit risk (as seen in the buildup to the GFC) and used as a barometer of financial vulnerability in the household sector.

A Temporary VAT Cut as Unconventional Fiscal Policy

Mon, 01 Jan 0001 00:00:00 +0000

The paper studies Germany’s temporary 3 percentage-point VAT cut from July 1 to December 31, 2020 (standard rate 19%→16%, reduced rate 7%→5%), combining two causal identification strategies with microdata and a HANK model to establish that intertemporal substitution drove a large spending response concentrated in durable goods.

Ex-ante approach (July 2020 BOP-HH survey, fielded immediately after the cut took effect): The survey distinguishes households informed about the January 2021 reversal (treated) from those who believed the cut was permanent (control). Treated households are approximately 10 percentage points more likely to increase durable purchases on the extensive margin. This is a lower bound on the intertemporal substitution effect because some “control” households likely learned about the reversal before the survey, attenuating the control group’s spending behavior toward that of the treated group.

Ex-post approach (January 2021 BOP-HH survey and GfK scanner data): Cross-household variation in perceived VAT pass-through identifies the spending effect. Households perceiving high pass-through — who saw prices actually fall at their usual stores — spent approximately 37 percent more on durables in 2020HY2 than those perceiving low or no pass-through (preferred OLS/IV specification, Table 3). GfK scanner data on semi-durables shows approximately 10 percent higher spending for high vs. low perceived pass-through (coefficient ≈ 0.093, Table 5). Non-durable spending shows no statistically significant response. The magnitude of the response increases with the durability of the good and increases over time toward the December 2020 cutoff, consistent with intertemporal substitution (a more durable good generates larger discounted savings from buying before the reversal; a later purchase locks in savings for longer until January).

Direct evidence of intertemporal pull-forward (Table 4): Households reporting high perceived pass-through in 2020HY2 planned to spend approximately 1,642 EUR less on durables in 2021 first-half relative to those with low pass-through in the GfK survey — a direct “spend now, buy less later” pattern confirming temporal shifting rather than a pure income effect.

Cross-sectional heterogeneity: The response is driven by young, low net-wealth households and price-sensitive “bargain hunters” who actively compare prices across stores. Critically, the response is NOT concentrated in financially literate households or those reporting long planning horizons, which distinguishes the VAT policy from forward guidance (which requires understanding and acting on future rate paths) and implies the policy reaches a broad spectrum of household types.

No COVID-19 confound: The paper finds no significant interaction between a household’s pandemic exposure (work disruption, income loss, health shock) and its durable spending response, confirming the intertemporal substitution mechanism operated independently of the concurrent COVID-19 environment.

HANK model (based on the Bayer, Born, Luetticke 2024a two-asset heterogeneous-agent New Keynesian framework, adapted with illiquid durable goods and a Calvo durable-adjustment friction):

Durable adjustment probability per semi-annual period: λ = 18% (Calvo friction calibrated to the spread of the durable spending response through 2020HY2)
Perceived-pass-through heterogeneity: 65% of households perceive high pass-through; perceived average cut among treated = 2.4pp (both calibrated to BOP-HH data)
Calibration targets: durable spending response elasticity = 0.32; X/Y = 0.08 (durable expenditure share); B/Y = 0.86 (liquid bond share); (B+qΠ)/Y = 1.90 (total liquid wealth); G/Y = 0.29; top-10% wealth share = 52%; fraction liquidity-constrained = 18%
Structural parameters: β = 0.92 (semi-annual discount factor); ξ = 2.0 (CRRA coefficient); ϑ = 0.5 (Frisch labor supply elasticity); ν = 0.80 (non-durable expenditure weight); τc = 17.5% (baseline VAT rate); τ = 31% (income tax rate); δ = 5% (semi-annual durable depreciation rate)
Impact effects: total consumption +4.3%; durable consumption +29.4%; the VAT-inclusive price level falls by approximately 1.0pp on impact (less than the 2.4pp perceived cut because of demand-driven upward pressure on prices)
Multipliers at ELB: impact consumption multiplier = 3.0; cumulative two-year consumption multiplier = 1.7
Multipliers with Taylor rule: impact = 2.2; cumulative two-year = 0.9 (lower because the central bank raises nominal rates in response to the demand boost, partly crowding out consumption)
Decomposition: the direct effect — computed holding GE equilibrium objects (wages, asset prices, aggregate demand) fixed — accounts for approximately 90% of the durable consumption response and approximately 4/5 of the non-durable response; the remaining indirect effect operates through positive Keynesian income spillovers
Comparison to interest rate cuts: the VAT cut delivers a larger aggregate consumption response per unit of fiscal cost than a comparable nominal interest rate reduction, because interest rate cuts create countervailing income effects for net savers (who lose interest income) that partially offset the stimulus for net borrowers

Scope conditions: Empirical estimates are local to Germany’s 2020 economic environment (near-zero ECB policy rate, partial COVID-19 demand suppression). The causal identification exploits cross-household variation in perceived pass-through, instrumented by bargain-hunting behavior; the exogeneity assumption requires that price-searching behavior affects spending through perceived prices rather than through other channels. The HANK quantitative results are conditional on the Calvo durable adjustment friction and the 65%/35% perceived-pass-through split; sensitivity to these calibration choices is explored but not the primary focus.

Note on working paper versions: This summary is based on NBER Working Paper 29442 (August 2024 revision), which uses a HANK framework and reports a 4.3% impact on total consumption. A Bundesbank Discussion Paper (24/2025, April 2025) describes the model as a “RANK” (representative-agent) framework with a 4.4% impact. The published RES version (June 2026) may differ from both working paper versions in its model specification; the core empirical findings (37% durable response, 10% semi-durable response, 10pp ex-ante effect) are unlikely to have changed.

In depth

Q1. What is the ex-ante identification strategy, and what does it identify?

The July 2020 BOP-HH survey ran immediately after the VAT cut took effect and identifies the causal effect of expecting a tax cut to be temporary by comparing households informed about the January 2021 reversal (treated) with those who believed the cut was permanent (control); treated households are approximately 10 percentage points more likely to report an intention to increase durable purchases. This is a lower bound on the true intertemporal substitution effect: if some “control” households learned about the reversal through other channels between the survey date and December 2020, they would have behaved more like treated households, compressing the gap. The ex-ante design also measures the extensive-margin decision (whether to increase purchases) rather than the total spending level, so the 10pp estimate is not directly comparable to the 37% ex-post level estimate.

Q2. What is the ex-post identification strategy, and how does it address endogeneity?

The January 2021 BOP-HH survey asks respondents how their 2020HY2 spending compared to a counterfactual without the VAT cut, and instruments perceived price pass-through with bargain-hunting behavior (price comparison across stores) — a variable that predicts who notices price changes but should not directly affect intertemporal allocation decisions. OLS and IV estimates are close (Table 3), suggesting limited endogeneity bias; the IV result of 37% more durable spending for high vs. low perceived pass-through is the preferred causal estimate. GfK scanner data provides an independent corroboration using objective purchase records rather than survey recall, yielding the 10% semi-durable estimate (Table 5, coefficient ≈ 0.093 in IHS-transformed spending).

Q3. Why does the response increase with the durability of the good?

A durable good yields a flow of consumption services over multiple periods; purchasing it before the January 2021 VAT reversal locks in tax savings for the entire lifetime of the good, while purchasing a non-durable before the reversal saves taxes only on a single-period consumption unit — so the present-discounted-value gain from intertemporal substitution is proportional to the good’s durability. This prediction is confirmed empirically: durables (white goods, electronics) show the largest response (37%); semi-durables (clothing, textiles in GfK) an intermediate response (~10%); non-durables no significant response. The fact that the spending response also builds toward the December cutoff — with the largest response in November and December 2020 — further supports intertemporal substitution (households delay purchases even within the cut period, maximizing the remaining time advantage).

Q4. Why was the VAT cut effective despite the concurrent COVID-19 shock?

The paper finds no statistically significant interaction between household-level COVID-19 exposure (income loss, work disruption, health shock) and the durable spending response to the VAT cut; the intertemporal price channel operated independently of pandemic-related income and uncertainty effects. This is consistent with the bargain-hunting interpretation: price-sensitive households who actively compare prices adjusted toward durables regardless of their pandemic-specific economic circumstances. The finding also implies that the simultaneous COVID-19 shock does not confound the identification, because the cross-household variation in perceived pass-through is independent of COVID-19 exposure.

Q5. Why is a HANK model appropriate, and what does durable heterogeneity add?

A HANK model is needed because the spending response is driven disproportionately by young, low net-wealth households who face binding liquidity constraints at some frequencies — in a representative-agent model all households respond immediately to the intertemporal price signal, which would predict an immediate front-loaded response; in the HANK model with Calvo durable adjustment, constrained households adjust their durable stock only when they receive an adjustment opportunity (λ=18% per semi-annual period), spreading the response through time and matching the observed gradual build-up of durable spending through 2020HY2. The illiquid-durable extension of the Bayer-Born-Luetticke framework separately tracks liquid financial assets and illiquid durables, allowing the model to capture both the temporal dynamics of the spending response and the cross-household variation in responses across the wealth distribution.

Q6. What is the impact consumption multiplier, and why is it larger at the ELB?

The impact consumption multiplier — the increase in total consumption divided by the fiscal cost of the VAT cut (measured as the VAT rate reduction times baseline consumption) — is 3.0 at the effective lower bound (ELB) and 2.2 with an active Taylor rule. At the ELB, the demand boost from the VAT cut raises inflation expectations; since the nominal rate cannot rise, the real rate falls, providing a secondary stimulus through the inter-temporal Euler equation; with an active Taylor rule, the central bank raises the nominal rate in response to higher inflation, crowding out some consumption and reducing the multiplier. The 3.0 impact multiplier exceeds the standard Keynesian multiplier because the durable sector amplifies the effect: a 2.4pp perceived price cut induces a 29.4% jump in durable purchases, whose production generates large income spillovers.

Q7. Why does the cumulative two-year multiplier fall below the impact multiplier?

The cumulative two-year multiplier is 1.7 at the ELB (vs. 3.0 on impact) because durable purchases pulled forward into 2020HY2 create a “payback effect” — households that already upgraded their durables need fewer new purchases in 2021, reducing durable consumption below the counterfactual path for several quarters after the reversal. This is directly documented in Table 4: high perceived pass-through households planned to spend approximately 1,642 EUR less on durables in 2021H1, and the GfK data confirms a spending decline in early 2021. The cumulative multiplier remains above zero and above 1.0, confirming the policy provides net stimulus over the two-year horizon even accounting for the post-cut hangover.

Q8. Why is the VAT cut more powerful than a comparable interest rate cut?

An interest rate cut stimulates borrowers but simultaneously reduces interest income for net savers, who partially offset their reduced income by consuming less; the VAT cut lowers current prices for all households without changing the interest rate, so there is no countervailing income effect for savers, and the consumption stimulus is less diluted by redistribution. In the HANK calibration, the additional dimension is that the VAT cut operates through a perceived price channel that requires only that households notice lower prices in stores — a much lower bar than the financial sophistication required to respond to forward guidance or interest rate signals — so the policy reaches a broader share of the household distribution than monetary easing.

Q9. What does the distributional evidence imply for fiscal stimulus design?

Young, low net-wealth households respond most strongly to the VAT cut, the opposite of the pattern expected if the response required financial sophistication; combined with the bargain-hunting identification, this implies the policy’s effectiveness does not depend on forward-looking planning or consumption-smoothing capacity — it is triggered simply by noticing prices are lower at the store. This finding challenges the conventional view that temporary fiscal policies are less effective than permanent ones because households do not optimize over them; instead, the price-noticing channel bypasses the forward-looking optimization entirely and generates a large spending response among households who do not match the life-cycle model assumptions. The distributional progressivity (young, low-wealth households drive the response) also contrasts with unconventional monetary policy (which benefits asset-holders through wealth effects) and improves the equity case for temporary VAT cuts as a stimulus instrument.

Key concepts

intertemporal substitution : the mechanism by which a temporary price reduction — here a VAT cut that will be reversed — induces households to shift consumption from the post-cut period to the cut period; the paper’s primary transmission channel, more powerful for durable goods because the present-value savings scale with the good’s lifetime.

perceived pass-through : the fraction of the statutory VAT rate reduction that a household perceives as an actual reduction in the prices it faces in its usual stores; the paper’s main source of cross-sectional identification in the ex-post strategy, correlated with bargain-hunting behavior.

ex-ante approach : the identification strategy using the July 2020 BOP-HH survey; identifies the causal effect of expecting a cut to be temporary by comparing informed (reversal known) vs. uninformed (thought permanent) households on their intended durable purchase behavior.

ex-post approach : the identification strategy using the January 2021 BOP-HH survey and GfK scanner data; identifies the causal effect of perceived price changes on realized spending by comparing high vs. low perceived pass-through households and instrumenting with bargain-hunting behavior.

payback effect : the reduction in durable spending in 2021H1 among households that pulled forward purchases during the 2020 cut; documented through the 1,642 EUR planned spending gap in Table 4 and GfK scanner data; makes the cumulative two-year multiplier (1.7) substantially lower than the impact multiplier (3.0).

HANK model with durable Calvo friction : the Bayer-Born-Luetticke (2024a) two-asset heterogeneous-agent New Keynesian framework adapted with illiquid durable goods and a Calvo probability of durable adjustment (λ = 18% per semi-annual period); the Calvo friction matches the gradual build-up of the durable spending response through 2020HY2 rather than an immediate front-loaded spike.

Aggregate Implications of Heterogeneous Inflation Expectations: The Role of Individual Experience

Mon, 01 Jan 0001 00:00:00 +0000

Consumers’ inflation expectations are heterogeneous across birth cohorts and history-dependent: using panel data from the Survey of Consumer Expectations (SCE), the paper documents that each cohort’s inflation forecast is anchored to its cumulative inflation history, with the degree of anchoring estimated structurally. The authors model this via an experience-based Kalman filter in which each agent’s forecast combines a common Kalman-filtered signal (derived from food prices) with a cohort-specific reference term built from the cohort’s entire prior sequence of expected inflation. The estimated history-weight parameter θ is negative, confirming that agents positively weight their inflation history rather than overreacting to current news — a pattern that holds not only in US SCE and Michigan Survey of Consumers data but also across six European countries in the ECB Consumer Expectations Survey. Embedded in a Blanchard–Yaari perpetual-youth OLG New Keynesian model — where households hold experience-based expectations but firms set prices under rational Calvo frictions — the mechanism produces qualitatively different aggregate dynamics from full-information rational expectations (FIRE): after inflationary shocks, expectations initially underreact (agents anchor to the low-inflation steady state) and then persist well beyond the shock horizon as high inflation is gradually incorporated into cohort memory, generating hump-shaped expectation dynamics. For monetary policy, the optimal Taylor rule must be more aggressive after cost shocks than under FIRE: an energetic early response prevents the high-inflation episode from entering cohort memories, avoiding a self-reinforcing upward drift in inflation expectations. Applied to the 2021 high-inflation episode, the model predicts that the youngest cohorts — experiencing high inflation for the first time — will exhibit persistently elevated inflation expectations long after the supply shocks that caused the episode have dissipated.

In depth

What are the four empirical patterns in the survey data, and do they hold outside the US?

Using the New York Fed’s Survey of Consumer Expectations (a monthly panel from 2013), the paper documents four patterns: (i) inflation expectations differ substantially across birth cohorts; (ii) cohort-specific inflation experience is age-clustered; (iii) individual inflation history is positively correlated with individual inflation expectations; (iv) cohorts do not differ in how they update to current information once their own inflation history is controlled for. These patterns hold in the Michigan Survey of Consumers and, with cohort fixed effects, across the ECB Consumer Expectations Survey covering six European countries — suggesting the mechanism is not US-specific.

How does the experience-based Kalman filter work, and what does estimation yield?

Each consumer’s forecast has two components: a standard Kalman filter signal common to all agents (extracted from food price data) and a cohort-specific reference term that is a weighted average of all past expectations formed by that cohort, governed by the parameter θ. Structurally estimated from SCE data using time fixed effects, θ is negative — meaning consumers positively anchor to their inflation history rather than over-extrapolating from current news. In a goodness-of-fit regression, the experience-based Kalman filter predicts observed cohort-level heterogeneity with a slope coefficient of 1.069, dominating lifetime average inflation and lagged inflation as predictors.

What is the general equilibrium model, and how do heterogeneous expectations enter the IS curve?

The model is a Blanchard–Yaari perpetual-youth OLG New Keynesian economy. Each surviving cohort solves a standard Euler equation using the experience-based expectations operator rather than rational expectations, yielding a history-dependent IS curve in which the effective real rate depends on the weighted average of each cohort’s reference inflation. Intermediate goods producers set prices under Calvo frictions with rational expectations, yielding a standard New Keynesian Phillips curve. The central bank follows a Taylor rule. The IS curve’s history-dependence means that past inflationary episodes — absorbed into cohort memory — affect present aggregate demand.

What do the impulse responses show under experience-based versus FIRE expectations?

Under a taste (demand) shock, experience-based expectations generate lower inflation on impact — agents anchor to the low-inflation steady state — but inflation remains elevated for longer as the shock is incorporated into cohort memory. Under a cost (supply) shock, two forces compete: anchoring to the steady state damps initial price pressure, but rational firms can raise prices by more because the IS curve becomes more inelastic; the net effect requires a stronger interest rate response than under FIRE. In both cases, household expectation dynamics are hump-shaped — initial underreaction followed by gradual build-up — consistent with evidence in Angeletos et al. (2021) and Pfajfar and Roberts (2018).

How does the optimal Taylor rule change under experience-based expectations?

After a cost shock the central bank should be more aggressive than under FIRE. The social cost of tolerating a transitory inflationary episode is much higher under experience-based expectations because it permanently shifts cohort memory upward, creating self-reinforcing dynamics in future periods. An aggressive early response prevents the episode from entering cohort references. After a taste shock the optimal response is similarly strong under both FIRE and experience-based expectations, so the memory channel adds little incremental urgency on the demand side.

What does the model predict about the 2021 high-inflation episode?

Feeding the model with actual monthly data through December 2021, average inflation expectations post-2021 are predicted to be both higher and more persistent under experience-based expectations than under FIRE or diagnostic expectations. Young cohorts, who experienced only low inflation in the 2010s, are updating their memory of inflation upward for the first time, creating a cohort-specific anchoring shift. The model implies that the 2021 episode could have long-lasting effects on consumer price expectations even if the supply shocks that caused it are fully transitory.

All Along the Watchtower: Military Landholders and Serfdom Consolidation in Early Modern Russia

Mon, 01 Jan 0001 00:00:00 +0000

This paper investigates the origins of serfdom in early modern Russia, arguing that the institution consolidated primarily through political economy dynamics between the crown and a landholding military class, rather than from economic fundamentals such as labor scarcity, land-labor ratios, or grain trade opportunities. The central argument is that the prolonged defense of Russia’s southern frontier against Crimean Tatar nomadic raids generated a class of military landholders who possessed both the coercive capacity and the political leverage to press the state into restricting peasant labor mobility.

The mechanism runs as follows. The Russian state, lacking the fiscal capacity to pay soldiers directly, granted frontier lands along the Tula defense line to high-ranked soldiers in exchange for military service under the pomest’e system. These lands were selected for their defensive rather than agricultural value and sat on the forest-steppe boundary roughly 180 km south of Moscow. Since soldiers could not farm while on duty and could not compete in free labor markets given the area’s low agricultural attractiveness, the arrangement was only sustainable if peasants were bound to the land. Military landholders collectively petitioned the Tsar repeatedly — with petition volumes peaking during urban uprisings (9 petitions in 1648, 13 in 1682) when the government’s political vulnerability increased the military’s bargaining power — until serfdom was codified in the Law Code of 1649.

The authors test this theory using newly digitized data from the 1678 household census, which records male population by six legally distinct peasant categories across 172 districts of Muscovy, combined with data on landholder estate counts and sizes. The primary empirical finding is that districts on the Tula defense line had approximately 40% of their population composed of serfs, compared to roughly 14% nationally — a difference of about 25 percentage points that survives the inclusion of geographic and climatic controls (grain suitability, temperature seasonality, precipitation, terrain ruggedness, river location, distance to Moscow, and regional fixed effects). Placebo tests confirm this pattern is specific to the most legally dependent peasant groups: the defense line is negatively associated with royal peasants and statistically insignificant for church peasants, free peasants, and non-Russian peasants.

To address potential endogeneity of the defense line’s location, the authors construct an instrumental variable using a novel geospatial algorithm. The algorithm computes optimal nomadic invasion routes from Crimea to Moscow via topographic cost rasters (using flow accumulation values as proxies for river-crossing barriers), then intersects these routes with the historically stable forest-steppe boundary (identified through FAO/UNESCO soil types — Podzoluvisols versus Chernozems). Districts at this intersection were 70 percentage points more likely to host the actual defense line. Two-stage least squares estimates confirm and slightly exceed the OLS magnitudes, supporting the causal interpretation.

The paper further tests two canonical alternative explanations and finds them insufficient. Domar’s (1970) labor-scarcity hypothesis predicts serfdom should be higher where population density is lower; the data show the opposite sign, contradicting this prediction. The Baltic grain trade hypothesis yields only a small, unstable positive interaction between river access to the Baltic and grain suitability, which disappears when the defense line variable is included. A horse race including all variables simultaneously shows the defense line coefficient at approximately 24 percentage points remains stable while alternative predictors become insignificant.

Mechanism tests show that defense line districts had 3.2 more estates per 100 square kilometers than the national average of 2.3, with the excess concentrated in very small (up to 5 serf households) and small (6–25 households) estates — consistent with the state’s strategy of maximizing soldier count by allocating the minimum serf labor sufficient to sustain a cavalryman. A bigram similarity analysis of collective petitions versus the 1649 Law Code yields a correlation coefficient of 0.7 for the top twenty bigrams between a 1637 petition and Chapter 11 (restricting peasant mobility), with no comparable similarity to other chapters. Persistence is documented through 1719, 1795, and 1858 censuses: defense line districts maintained the highest serf concentration through to three years before emancipation in 1861.

In depth

Q1. What is the paper’s central argument about the origins of Russian serfdom?

A: The paper argues that serfdom consolidated primarily due to political economy dynamics: the crown’s dependence on a landholding military class for frontier defense against steppe nomads gave that class sufficient political leverage to secure the legal restriction of peasant labor mobility. The military landholders’ coercive capacity and proximity to their small estates made labor coercion a viable complement to their military function. This explanation dominates alternative accounts based on labor scarcity, grain trade, or soil quality in all specifications tested.

Q2. What was the Tula defense line and why was it located where it was?

A: The Tula defense line (Great Abatis Line) was a chain of about 40 fort towns stretching over 500 km east-west, centered on Tula approximately 180 km south of Moscow, erected in the 1560s using felled trees, earth mounds, ditches, and watchtowers. Its location on the forest-steppe boundary was determined by two military-logistical constraints: it had to block the main nomadic invasion routes from Crimea, and it had to lie within the forest zone where timber was the cheapest construction material and which provided natural shelter. The paper documents that the defense line area did not differ from the rest of Muscovy in agricultural suitability, annual precipitation, seasonality, or terrain ruggedness — its distinctive feature was purely defensive.

Q3. How large is the estimated effect of defense line proximity on serf concentration?

A: In the unconditional specification, defense line districts had a 30 percentage point higher share of serfs than the rest of the country. After adding geographic controls (grain suitability, seasonality, precipitation, terrain ruggedness, river dummy, distance to Moscow, and regional fixed effects), the coefficient stabilizes at approximately 25 percentage points. Given that serfs averaged about 14% of total population nationally but about 40% in defense line districts, the estimated effect is substantial relative to the baseline.

Q4. How do the authors address endogeneity of the defense line location?

A: They construct an instrumental variable defined as the intersection of two variables: districts lying on the computed optimal nomadic invasion routes (covering 98 of 172 districts, or 57% of the sample), and districts on the forest-steppe soil boundary (38 districts, or 22% of the sample). Their interaction covers 23 districts and is the excluded instrument. In the first stage, this interaction term raises a district’s probability of hosting the actual defense line by 70 percentage points, while the linear terms become essentially zero once the interaction is included. The 2SLS second-stage estimates of the serf-share effect are slightly higher than OLS and statistically significant, confirming the direction and approximate magnitude of the OLS results.

Q5. What does the paper find about Domar’s labor-scarcity hypothesis?

A: The paper finds no support for Domar’s (1970) prediction that serfdom should be more prevalent where labor is scarcer (lower population density). Controlling for grain suitability and geographic factors, population density enters with a positive and statistically significant coefficient at the 5% level — the opposite sign from what Domar’s theory predicts. When the defense line dummy is added, population density becomes insignificant while the defense line coefficient remains at approximately 25 percentage points, consistent with the baseline.

Q6. What does the paper find about the Baltic grain trade hypothesis?

A: An exogenous measure of Baltic trade potential — a dummy for districts with river access to the Baltic, interacted with grain suitability — yields a small and marginally positive effect on serf share in Baltic districts with higher grain suitability. However, this effect disappears when the defense line dummy is included, and is also sensitive to alternative spatial clustering (becoming insignificant at the 300 km clustering radius even without the defense line dummy). The authors interpret this instability as inconsistent with grain trade being a primary driver of serfdom.

Q7. What is the evidence for the estate-size mechanism?

A: Defense line districts had on average 3.2 more estates per 100 square kilometers than the national average of 2.3 per 100 square kilometers. Among estate-size brackets, very small (up to 5 serf households) and small (6–25 serf households) estates were disproportionately concentrated in defense line districts, while the location of medium-sized and large estates was statistically independent of the defense line. This pattern is consistent with the state’s strategy of allocating minimum viable serf endowments to maximize the number of soldiers supportable along the line.

Q8. What is the textual evidence linking military petitions to the 1649 Law Code?

A: A bigram similarity analysis between a 1637 collective petition and Chapter 11 of the 1649 Law Code reveals a correlation coefficient of 0.7 for the top twenty bigrams. The five most common bigrams appear in both texts: “runaway peasants,” “commoner peasants,” “census books,” “search years,” and “tsar’s decree.” This correlation does not extend to other chapters of the Law Code that regulate non-peasant matters, establishing specificity of the legislative influence.

Q9. How does the timing of collective petitions relate to political crises?

A: Over a corpus of 96 petitions between 1608 and 1698, landholders petitioned on average once per year, but activity spiked sharply during domestic uprisings: 9 petitions in 1648 (the “Salt Riot” urban uprising) and 13 petitions in 1682 (the musketeers’ revolt). These peaks coincide with moments when the government’s political vulnerability increased the military’s bargaining power, and in both cases were followed by legislative concessions — the 1649 Law Code and new decrees in 1683–85 on harsher punishment for harboring runaways, respectively.

Q10. What do the placebo tests show?

A: Regressions of non-serf peasant shares on the defense line dummy show that the defense line is negatively associated with royal peasants and statistically insignificant for church peasants, free peasants, and non-Russian peasants. A placebo test replacing military landholders with merchants and artisans shows no significant defense line effect on the latter group, while Moscow has an 11 percentage point higher merchant/artisan share. The specificity of the defense line effect to legally dependent peasants and military landholders supports the military-political mechanism rather than a generic frontier-area effect.

Q11. How persistent was the spatial distribution of serfdom after 1649?

A: The authors estimate their baseline equation with serf share from the 1719, 1795, and 1858 censuses as dependent variables. Defense line districts maintained disproportionately higher serf densities in all three periods, including when the sample is restricted to the original Muscovite districts to exclude post-18th century territorial acquisitions. By 1858, three years before emancipation, the spatial distribution of serfs remained similar to that observed 200 years earlier at the time of serfdom’s consolidation — despite the defense line having been militarily obsolete for over a century.

Q12. What explains the persistence of serfdom beyond its original military rationale?

A: The persistence reflects a mutually beneficial exchange between the crown and former military landholders. Landholders provided local state capacity — overseeing tax collection, administering military conscription, and adjudicating peasant disputes through estate courts — in lieu of a centralized bureaucracy. In return, the crown granted successive expansions of landholder rights: Peter I equalized military landholdings with hereditary estates in 1714, and Peter III in 1762 freed landholders from military service obligations while retaining their property rights over land and serfs. This fiscal-administrative dependency is also cited as a reason for the late timing and unfavorable-to-peasants terms of the 1861 emancipation reform.

Q13. How does this paper’s explanation relate to Eastern/Western European institutional divergence?

A: The paper argues that while the military revolution in Western Europe generated fiscally capable centralized states with regular infantry armies, Russia’s peripheral nomadic threat prolonged the feudal cavalry model supported by land grants and serf labor. This delayed the formation of Weberian bureaucracy and entrenched what the authors term a “garrison state” — one whose institutions and social structure were shaped primarily by military-security considerations. The paper positions military factors alongside existing divergence explanations emphasizing land property rights, political institutions, demographic regimes, and Enlightenment ideas.

Q14. What is the methodological contribution of the optimal invasion route algorithm?

A: The algorithm uses flow accumulation rasters (proportional to river width and basin size) as a cost function to compute the lowest-cost travel paths from Crimea to Moscow, iteratively penalizing cells within 15 km of each computed route and re-running the path search to generate four distinct routes per origin point (eight total, including routes from the Don River steppe). This produces a high-resolution, geographically continuous measure of military threat exposure that the authors argue provides statistical power in contexts where terrain ruggedness or simple distance measures lack variation — particularly relevant for flat plains with a single threat origin correlated with other variables.

Pomest’e system: The institutional arrangement by which the Russian state granted frontier lands to high-ranked soldiers in exchange for military service, under the rule that “the land must not leave the service.” Unlike hereditary estates, pomest’e holdings were conditional on active service and could not be passed to heirs unless sons continued military service. This system enabled the formation of a permanent cavalry force despite the state’s low fiscal capacity, but required binding peasants to the land to make the arrangement viable for the soldier-landholders.

Serfs (bobyli and dvorovye): In the paper’s 1678 census framework, serfs are defined as the two most legally dependent subgroups of private peasants — cotters (bobyli), who owned no property and worked full-time for their landlord in exchange for payment in kind, and servants (dvorovye), who performed household and support functions on the estate. These groups constituting about 14% of total population nationally were totally dependent on their landlord and could not retain the marginal product of any part of their labor. After the 1649 Law Code, villeins (krest’yane) gradually converged to this status as well.

Collective petitions (chelobitnye): The primary institutional channel through which the military landholder class communicated collective interests and applied political pressure on the crown in 17th-century Muscovy. The paper documents 96 such petitions between 1608 and 1698, showing that their volume, timing (peaking during urban uprisings), and textual content (closely matching Chapter 11 of the 1649 Law Code) were the proximate mechanism by which landholders converted military leverage into legal codification of serfdom.

Optimal defense line (instrumental variable): The paper’s constructed instrument, defined as the intersection of computed optimal nomadic invasion routes (based on topographic cost rasters approximating river-crossing barriers) and the forest-steppe soil boundary (Podzoluvisols/Chernozems boundary from the FAO/UNESCO Soil Map). This instrument captures the geographically and militarily determined placement of defensive fortifications, purging variation in actual defense line location that might reflect agricultural or economic value.

Garrison state: Used by the authors (adapting Lasswell’s term) to describe a state whose institutions and social structure are shaped primarily by military security considerations. In the Russian context, this refers to the persistence of a feudal cavalry system, land-grant-based military compensation, and labor coercion that together delayed centralized state formation and Weberian bureaucracy relative to Western European states undergoing the military revolution toward regular infantry armies.

Labor coercion complementarity: The paper’s mechanism whereby employers with high coercive capacity (proximity to weapons, military training) can deploy that same capacity to restrict workers’ outside options and extract labor surplus. In the defense line context, soldiers’ military skills and armament made them effective at preventing serf flight and enforcing labor obligations — creating a complementarity between military capacity and serfdom that was absent among merchants or church institutions with comparable landholdings elsewhere.

Bank Information Production Over the Business Cycle

Mon, 01 Jan 0001 00:00:00 +0000

Bank Information Production Over the Business Cycle

Research Question

Banks produce private information about borrowers that is inherently unobservable to outside researchers. Howes and Weitzner ask whether the quality of this private information is countercyclical — that is, whether banks invest more in learning about borrowers when local economic conditions deteriorate — and whether any such cyclicality reflects endogenous information production incentives rather than exogenous changes in the information environment.

Data and Methodology

The paper uses the Federal Reserve’s Y-14Q Schedule H.1 confidential regulatory data, which covers commercial and industrial (C&I) loans exceeding $1 million originated by bank holding companies with $50 billion or more in total assets. This universe covers 85.9% of all banking sector assets and approximately 70% of all C&I loan volume (as documented by Bidder, Krainer, and Shapiro (2020)). A distinctive feature is that qualifying banks must report their internal probability of default (PD) estimates for each loan to the Federal Reserve. The sample is restricted to newly originated loans from 2014Q4 through 2019Q1 — the window over which PD data are well populated — with at least one year of subsequent observation to allow defaults to materialize. The outcome variable is a binary default indicator equal to one if the borrower defaults within two years of origination (0.41% of firms in the sample).

The measure of information quality is defined as the OLS coefficient on PD when regressing realized default on the bank’s internal PD estimate. A larger coefficient indicates that the bank’s private risk assessment carries more predictive content for realized default outcomes, above and beyond observable firm and loan characteristics. The authors identify cyclical effects by exploiting cross-sectional variation in county-level unemployment rates across the US at each point in time, controlling for bank-by-quarter fixed effects (to absorb supply-side bank-level factors), industry-by-quarter fixed effects, and bank-by-county fixed effects. The key interaction is between PD and the local unemployment rate.

Main Findings

The paper establishes three main results:

Banks’ PDs predict default and contain private information. Even after controlling for firm size, leverage, profitability, tangibility, log loan size, loan maturity, loss given default (LGD), loan type fixed effects, bank-quarter fixed effects, and industry-quarter fixed effects, PD remains a statistically and economically significant predictor of realized default. A one-percentage-point increase in PD increases the probability of default by approximately 25 basis points (coefficient of 0.245).
Information quality is countercyclical. A one-percentage-point increase in the local county unemployment rate increases the sensitivity of realized default to PD by approximately 8 basis points — roughly one-third of the average unconditional PD coefficient. When the unemployment rate is above a county’s median, the PD coefficient is approximately three times as large as during low-unemployment periods. Correspondingly, during high-unemployment periods, the total R-squared of a regression predicting default from observable firm and loan characteristics falls (from 0.311 to 0.264 — an 18% decline), while the marginal contribution of PD to the R-squared increases. This pattern is consistent with observable characteristics doing a worse job at predicting default in bad times, which in turn incentivizes banks to invest more in their internal risk assessments.
The cyclicality is driven by newly originated loans and more information-sensitive loans. The triple interaction between PD, the new-loan indicator, and the unemployment rate is positive and statistically significant across all specifications; the interaction between PD and unemployment for previously issued (non-new) loans is consistently less than half the size of the triple interaction term. The cyclical sensitivity also decreases by more than 0.1 (against a base of 0.08) in the year after origination and continues to fall over the loan’s life. Additionally, a one-standard-deviation increase in log loan size (approximately 1.29) increases the sensitivity of realized default to PD by about 0.085 — roughly one-quarter of the unconditional effect — and a one-standard-deviation increase in LGD (0.158) increases the PD coefficient by 0.098, or about one-third of the unconditional effect. Both the loan-size and LGD interactions are amplified when the local unemployment rate is high, consistent with Dang, Gorton, and Holmstrom (2012). The cyclical sensitivity of information quality is statistically significant only for firms in nontradeable industries (e.g., utilities, construction, retail, professional services), not for tradeable-sector firms.

Scope Conditions

Results are conditional on: large US bank holding companies ($50bn+ in assets) lending to non-financial, non-public domestic corporate borrowers with at least $100k in reported assets; a sample period from 2014Q4 to 2019Q1, covering a predominantly expansionary phase of the US business cycle; and county-level rather than aggregate time-series variation in economic conditions.

Policy Implications

Countercyclical information production implies that bank lending stimulus policies — including interest rate cuts, liquidity facilities, and asset purchase programs — may be less effective in recessions because banks simultaneously increase screening intensity. The marginal borrowers who gain access to credit from stimulus will differ across states of the cycle: in downturns, banks grant credit to fewer but higher-quality firms, so the incremental impact of expanding the credit supply on the number and type of firms funded may be attenuated. The authors connect this mechanism to prior empirical evidence that monetary policy is less effective in recessions (Tenreyro and Thwaites (2016)) and to LTRO and QE program evidence showing no increase in lending to riskier firms.

In depth

Q1. What is the precise definition of “bank information quality” used in this paper, and why is this measure preferred over alternatives?

Information quality is defined as the OLS coefficient β on the bank’s internal PD estimate when predicting realized two-year default in a regression that also includes firm and loan characteristics and a rich set of fixed effects. A higher coefficient indicates that the bank’s private risk assessment contains more predictive content for actual default beyond what is captured by observable firm and loan characteristics. This approach is preferred because it directly quantifies the marginal information content of the bank’s private assessment and can be estimated at the loan level using the cross-sectional variation in county-level economic conditions, rather than relying on aggregate time-series variation that would confound bank supply-side factors.

Q2. How do the authors establish that the PD estimates contain genuine private information rather than merely reflecting publicly observable characteristics?

Column (1) of Table 3 shows a PD coefficient of 0.245 in a regression predicting default without controls. Columns (2) and (3) add firm and loan characteristics (size, leverage, profitability, tangibility, log loan size, maturity, LGD, and loan type fixed effects) plus bank-quarter, industry-quarter, and bank-county fixed effects, and also add the interest rate as an additional control; the PD coefficient remains statistically and economically significant across all specifications. This demonstrates that PD retains predictive power for realized default even after absorbing all variation captured by observable firm-level fundamentals and pricing signals, implying the PD estimate contains private information not contained in observables.

Q3. What is the baseline magnitude of the cyclicality finding, and how is it identified?

A one-percentage-point increase in the county-level unemployment rate increases the PD coefficient by approximately 8 basis points (Table 5, Column 1). This represents about one-third of the average unconditional PD coefficient estimated in Section 3.1. Identification uses bank-by-quarter fixed effects so that the effect is estimated by comparing two loans made by the same bank at the same time to borrowers in counties with different unemployment rates, ruling out bank-level supply-side confounders such as changes in a bank’s cost of capital or risk appetite.

Q4. How does the split-sample analysis (above/below county-median unemployment) further characterize the cyclicality?

Columns (3) and (4) of Table 4 show that, when predicting default with PD alone (no controls), the PD coefficient is approximately three times as large during high-unemployment periods as during low-unemployment periods, and the R-squared is substantially higher for high-unemployment observations. The R-squared from a regression of default on observable controls alone is 17.8% higher when unemployment is low (0.311 versus 0.264), while the marginal contribution of PD to the R-squared is higher when unemployment is high (going from 0.264 to 0.267, versus 0.311 to 0.313). This pattern — observables explain less but PD explains more in bad times — is consistent with information frictions being more severe in downturns, which in turn raises banks’ incentives to invest in private information production.

Q5. How do the authors distinguish endogenous information production from a purely exogenous improvement in information quality during downturns?

Three tests are designed to be difficult to rationalize under a purely exogenous information channel. First, the cyclicality is concentrated in newly originated loans: the triple interaction term (PD × unemployment × new-loan indicator) is positive and statistically significant, while the PD × unemployment interaction for previously originated loans is less than half the size of the triple interaction. If information quality improved exogenously during downturns, there is no clear reason why this improvement would be far larger for loans where the bank is making a new capital commitment. Second, the cyclicality declines by more than 0.1 (relative to a base of 0.08) in the year after origination and continues to fall — simultaneously, the unconditional predictive power of PD increases over the loan life. This divergence is inconsistent with a purely exogenous mechanism. Third, the cyclical sensitivity is concentrated in loans that theory (Dang, Gorton, and Holmstrom (2012)) predicts to have higher information production incentives: larger loans, higher-LGD loans, and loans to nontradeable-sector borrowers.

Q6. How do loan characteristics (size and LGD) relate to information quality, and how does this relationship evolve over the business cycle?

Table 7 shows that a one-standard-deviation increase in log loan size (approximately 1.29) increases the sensitivity of realized default to PD by about 0.085, or roughly one-quarter of the unconditional PD coefficient. A one-standard-deviation increase in LGD (0.158) increases the PD coefficient by 0.098, or about one-third of the unconditional effect. Table 8 shows that both of these interaction coefficients have the same sign and are amplified during periods of high unemployment, consistent with Dang, Gorton, and Holmstrom (2012)’s prediction that information production decisions become more sensitive to loan features following negative aggregate shocks.

Q7. What does the tradeable versus nontradeable industry test contribute?

Because nontradeable-sector firms (utilities, construction, retail, transportation, accommodation, food services, information and communication, professional services) are more likely to depend on local demand, the same change in the county-level unemployment rate will have a larger impact on their default probability. Table 9 shows that the cyclical sensitivity of PD’s predictive power — the PD × unemployment interaction — is statistically significant only for nontradeable-sector firms, not for firms in tradeable industries. This provides additional evidence that the mechanism operates through local economic conditions affecting borrower riskiness in a way that raises information production incentives, rather than through some aggregate or bank-level mechanism.

Q8. Do composition effects (changes in the pool of borrowers) account for the main findings?

Table 11 shows that observable loan characteristics — average loan size, interest rate, LGD, and maturity — do not vary meaningfully with the local unemployment rate. Realized default rates increase slightly with unemployment but the effect is not statistically significant. The PD itself increases by only about 3 basis points for a one-percentage-point increase in unemployment (significant only at the 10% level). Loan volume declines: a one-standard-deviation increase in the unemployment rate (1.3 percentage points) leads to a 1.6% decrease in loan volume and a 5.46% decrease in the number of loans. The minimal variation in the risk profile of loans actually granted suggests that composition effects in the pool of approved borrowers are unlikely to explain the main result.

Q9. What are the implications of countercyclical information production for monetary policy transmission?

When unemployment is high, banks screen potential borrowers more intensively, which changes the composition of firms that gain access to credit. Policies designed to expand credit supply — interest rate cuts, liquidity facilities, asset purchase programs — face a more heavily screened pool of potential recipients during downturns. This means the marginal firms that receive additional credit following a stimulus in a recession will be of higher quality than the marginal recipients in an expansion, implying the credit transmission of monetary policy reaches a different — and potentially smaller — set of firms in recessions. The authors connect this to Tenreyro and Thwaites (2016)’s finding that monetary policy is less effective in recessions, and to evidence from the Eurosystem’s LTRO program that aggregate lending rose but lending to riskier firms did not, and to UK QE evidence finding no stimulation of bank lending.

Becker, Bos, and Roszbach (2020) also find that bank credit ratings predict default better in bad economic times, using data from a single Swedish bank and relying on aggregate time-series variation. The present paper differs in three ways. First, it uses cross-sectional variation across US counties within each time period, exploiting bank-by-quarter fixed effects to rule out bank supply-side confounders. Second, it uses loan-level rather than firm-level data, enabling the analysis of how loan characteristics (size and LGD) interact with information quality and cyclicality. Third, Becker, Bos, and Roszbach interpret the cyclicality as exogenous; Howes and Weitzner provide evidence against this interpretation — specifically, the concentration in newly originated loans and in loans with characteristics that theoretical models predict should generate higher endogenous information production.

Key Concepts

Bank Information Quality (as used in this paper) The size of the OLS coefficient on a bank’s internal probability of default (PD) estimate in a regression predicting realized loan default. A larger coefficient means the bank’s private risk assessment carries more predictive content for actual default beyond observable firm and loan characteristics. It is a measure of how much private information the PD encodes about borrower risk, not a measure of accuracy in an absolute sense.

Probability of Default (PD) — Y-14Q Internal Estimate Banks’ own model-based estimate of each corporate borrower’s likelihood of defaulting, reported confidentially to the Federal Reserve under Y-14Q Schedule H.1 filings. In the paper, PD is used as the observable proxy for the bank’s private risk assessment; its predictive power for realized default is the object being studied, not the PD level itself.

Countercyclical Information Production The property that banks’ incentives to invest in learning about borrower quality increase as economic conditions deteriorate. In the theoretical literature the paper tests empirically, the returns to distinguishing between borrower types rise in downturns (because the distribution of borrower quality widens and the consequences of adverse selection increase), inducing banks to produce more private information at loan origination. The paper uses “information quality is countercyclical” to mean that the predictive content of PD for realized default is higher when the local unemployment rate is higher.

Information Sensitivity (of a loan) The degree to which the value of a loan depends on information that is privately held by potential borrowers. Following Dang, Gorton, and Holmstrom (2012), loans are more information-sensitive when they are larger (larger potential loss from adverse selection) or when they have higher loss given default (lower expected recovery value). The paper uses loan size and LGD as proxies for information sensitivity and tests whether banks invest more in information about higher-information-sensitivity loans.

Loss Given Default (LGD) The bank’s estimate of the fraction of the loan’s value that would be lost if the borrower defaults, reflecting the expected recovery value of collateral and other loan features. In the paper, higher LGD (lower recovery) is a proxy for higher information sensitivity, since the consequences of lending to a bad borrower are larger when recovery is low.

Bank-by-Quarter Fixed Effects A set of fixed effects that absorbs all variation in outcomes attributable to a particular bank at a particular point in time. In the context of this paper, including bank-by-quarter fixed effects means the cyclicality results are identified from variation across counties for loans made by the same bank in the same quarter, ruling out supply-side explanations such as changes in a bank’s cost of capital, risk appetite, or credit standards that affect all of its loans uniformly.

Endogenous versus Exogenous Information Quality A core distinction in the paper. Exogenous information quality would mean banks passively receive more precise signals about borrowers during downturns regardless of their investment in screening. Endogenous information quality means banks actively choose to invest more in information production during downturns because the returns to distinguishing borrower types are higher. The paper argues its results — especially the concentration of cyclical effects in newly originated loans and in loans with characteristics that theory predicts should generate higher screening incentives — are consistent with the endogenous channel and are difficult to rationalize under a purely exogenous mechanism.

Bank Opacity and Safe Asset Moneyness

Mon, 01 Jan 0001 00:00:00 +0000

This paper studies when a bank is more effective as a supplier of privately produced money-like safe assets (repo, commercial paper), finding that a bank produces safer, more liquid assets when (1) its return on equity (ROE) is relatively lower, and (2) it is relatively more opaque about its balance sheet. A three-period model is presented in which safe asset investors focus on the left tail of the bank asset value distribution that ultimately determines the debt’s moneyness: a higher ROE signals riskier investment activities with higher return volatility, exposing investors to greater left-tail risk and lowering the moneyness of the bank’s debt. Bank opacity mitigates the strength of the ROE-moneyness relationship because opacity limits investors’ ability to infer asset risk, making it optimal for the banking system to maintain a certain level of opacity. Empirical tests on dealer banks and money market mutual funds’ (MMFs) funding relationships confirm that higher ROE leads to MMF withdrawal due to lower moneyness of safe assets.

In depth

Q1. Why does higher ROE lower the moneyness of a bank’s safe assets?

Higher ROE signals that a bank is more likely to be engaging in riskier investment activities with higher return volatility, which exposes safe asset investors—who care almost entirely about the left tail of the bank asset value distribution—to a higher likelihood of complete insolvency, lowering the moneyness of the bank’s debt. The intuition is asymmetric: for a debt holder, the upside is limited to the contracted interest rate, while the downside involves potential total loss if the bank becomes insolvent. A higher ROE thus signals higher left-tail risk rather than higher credit quality from the safe asset investor’s perspective, contradicting the positive signal that higher ROE sends to equity investors.

Q2. How does the model formalize the moneyness concept?

In the three-period model, the bank issues a money-like safe asset (deposit) to finance itself, and the household holds it both to transfer wealth intertemporally and to use it as a medium of exchange; moneyness captures both the safety and the liquidity of the asset as experienced by the holder. The model embeds the Gorton-Pennacchi (1990) and Dang-Gorton-Holmström (2012) notion that money-like assets are purposefully designed to be information-insensitive, so that investors have little incentive to acquire private information about them. The model shows how ROE—a piece of public information—nonetheless predicts moneyness and triggers withdrawal.

Q3. Why is bank opacity an equilibrium feature that improves moneyness?

Bank opacity mitigates the predictive power of ROE for the moneyness of safe assets because if investors cannot observe detailed information about the bank’s asset side, they cannot fully infer the riskiness of the investments backing the bank’s debt from the ROE signal, making it optimal for the banking system to maintain a certain level of opacity to preserve the information-insensitive character of its safe assets. This result is consistent with Dang et al. (2017)’s argument that banks are intentionally opaque: opacity is not merely a byproduct of complexity but a deliberate design feature that preserves the moneyness of privately produced safe assets.

Q4. What is the empirical evidence using MMF and dealer bank data?

Empirical tests using data on MMF funding of dealer banks confirm that higher bank ROE leads to MMF withdrawal from the bank, consistent with the model’s prediction that higher ROE reduces the moneyness of the bank’s safe assets for institutional investors; the relationship is attenuated for more opaque banks, consistent with the model’s opacity mechanism. The wholesale banking sector (dealer banks and institutional investors like MMFs) is the natural testing ground because its participants are more informed than retail depositors and therefore more sensitive to signals about the riskiness of the assets backing the bank’s debt.

Key concepts

moneyness of safe assets : the degree to which a financial asset is safe and liquid—traded at par with no questions asked; determined in this paper by how well a bank’s debt protects investors against the left tail of the bank asset value distribution. return on equity (ROE) as a risk signal : the paper’s key insight that, for safe asset investors (debt holders), higher bank ROE signals riskier investments with higher return volatility rather than lower credit risk; this contrasts with the positive signal ROE sends to equity investors. information-insensitive safe asset : a financial asset purposefully designed to be immune to private information acquisition by investors (Gorton-Pennacchi 1990; Dang et al. 2012); bank opacity preserves this property by limiting investors’ ability to infer asset-side risk from public signals.

Banking with Inside Money: An Efficiency Analysis

Mon, 01 Jan 0001 00:00:00 +0000

Layer 1: Overview

This paper demonstrates that the canonical efficiency result of Diamond and Dybvig (1983) — that banks using maturity transformation can decentralize the first-best risk-sharing allocation — breaks down when banking is conducted with inside money rather than real contracts. The paper constructs a minimal modification of the Diamond-Dybvig (DD) model in which output requires combining labor (supplied by workers) and technology (owned by entrepreneurs), so that bank deposits arise as inside money created ex nihilo when loans are extended, and shows three results: (1) non-contingent nominal demand deposits cannot reproduce the first-best allocation, because the constraint that nominal deposits earn the same real return as the productive technology prevents banks from providing state-contingent real payoffs; (2) state-contingent deposit rate contracts, which are proposed as an efficiency fix in the DD tradition, also fail to reach the first best — Proposition 2 establishes that contingent deposit rates produce a consumption allocation inconsistent with efficiency (specifically, aggregate consumption at each date cannot satisfy the efficiency ratio required by equation 8), and the allocation under contingent contracts is no better in welfare terms than the non-contingent baseline; (3) allowing entrepreneurs to liquidate loans before maturity (Proposition 3) likewise leaves the equilibrium inefficient, because competition equalizes deposit and lending rates in a way that prevents supply of goods from matching the efficient schedule across periods. The paper then characterizes when central bank intervention can improve welfare and shows that outside money is not demanded in the baseline economy, limiting the central bank’s leverage, and that the lender-of-last-resort function can prevent bank runs even when efficiency is unachievable.

In depth

Q1. What is the core model and how does inside money arise?

The paper adds a single departure from the original DD real model: output requires labor from workers and technology from entrepreneurs, which introduces a motive for money to be valued — entrepreneurs borrow units of account (inside money/deposits) from banks at date 0 to pay workers’ wages, and these deposits then circulate as a means of payment for consumption goods at dates 1 and 2. Unlike the outside-money models in Allen and Gale (1998), Skeie (2008), and Allen et al. (2014), inside money is created ex nihilo on the bank’s balance sheet when loans are extended — deposits do not represent a transfer of pre-existing funds but are liabilities created through lending. Banks in this model are price-takers and cannot take direct decisions on real investments or liquidations, which are the responsibility of entrepreneurs. This is the key distinction from the DD and subsequent literature: it is the production of deposits in the provision of loans that generates inside money, and it is the impossibility of making these nominal claims produce state-contingent real payoffs that prevents efficiency.

Q2. Why can’t non-contingent nominal deposits achieve the first best?

In any competitive equilibrium with valued deposits, the no-arbitrage condition requires that the real return on deposits equals the real return on the productive technology R in each period, so the ratio of patient-to-impatient consumption (c₂/c₁) for workers must equal R — but the first-best allocation requires c₁ and c₂ to satisfy the planner’s Euler equation u′(c₁) = Ru′(c₂), which for coefficient of relative risk aversion greater than 1 implies 1 < c₁*/c₂* < R, not c₂/c₁ = R.** This is formalized by comparing the equilibrium allocation (Proposition 1 and the Corollary) — where workers’ consumption satisfies cᵢW(1) = 1/p₁ and cᵢW(2) = R/p₁ with p₁ ∈ (0.5, ∞) — against the efficiency condition (equation 8). Because the real value of deposits is pinned by the price level in the goods market, and competitive banks have no power to engineer the price adjustments needed to create state-contingency, the nominal deposit contract is generically inefficient. This contrasts with Allen and Gale (1998) and Skeie (2008), where central bank control over either prices or real investment liquidation allows efficient outcomes.

Q3. What is the formal result on state-contingent deposit contracts?

Proposition 2 establishes that introducing contingent deposit rates (paying a higher rate to impatient depositors, id₂(1) > id₂(2)) yields an aggregate allocation in which total consumption at date 1 is at most 2 (the liquidation value) and total consumption at date 2 is at least 2R — the same aggregate feasibility constraints as the non-contingent case — and this allocation is incompatible with efficiency and no better in welfare terms than the baseline. The reason is structural: for goods to be supplied at both dates 1 and 2, the rate id₂(2) must satisfy id₂(2)·(P₁/P₂) < R ≤ id₂(1)·(P₁/P₂), but this means only impatient entrepreneurs supply goods at date 1, leaving the aggregate supply schedule identical to the non-contingent case. Even if banks had perfect information about depositor types and could implement contingent contracts without incentive compatibility concerns, the first-best allocation would remain outside the consumption possibility set of the competitive equilibrium.

Q4. What is the result on early loan liquidation?

Proposition 3 shows that allowing entrepreneurs to choose how much of their loan to repay early (at date 1 versus date 2) produces a unique equilibrium in which entrepreneurs are indifferent about when to liquidate, equilibrium deposit and loan rates satisfy id₁ = ib₁ = 0 and (1 + ib₂)(P₁/P₂) = (1 + id₂)(P₁/P₂) = R, and the resulting allocation remains inefficient. The key constraint is unchanged: competition across banks drives both deposit and lending rates to equalize in real terms, so the supply of goods at each date is still not controlled by the bank and cannot reproduce the first-best schedule. Allowing borrowers to prepay their loans does not alter the fundamental tension between fixed nominal contracts and state-contingent real outcomes.

Q5. When can banks be welfare-dominated by bilateral trade?

In the symmetric equilibrium (P₁ = D₁), the banking allocation gives E(uB) = λu(1) + (1−λ)u(R), which is welfare-dominated by the bilateral labor market allocation E(uLM) whenever the coefficient of relative risk aversion and/or the technology return R exceed a threshold — specifically, when agents are risk averse enough that the midpoint consumption available under bilateral bargaining (2R/(R+1)) is preferred to the lottery {1 with probability λ, R with probability 1−λ} — contradicting the presumption that bank intermediation is necessarily superior to direct contracting. This result, formalized by condition (41), implies that the social value of banking as an institution depends on the degree of risk aversion and the illiquidity premium R: the banking allocation is preferred when agents are relatively risk tolerant and/or R is large (so the lottery’s spread is attractive), but bilateral trade may dominate when agents are risk-averse and R is modest.

Q6. What role can central banks play and what is the lender-of-last-resort result?

The paper shows that in the baseline nominal economy, outside money is not demanded by any agent — deposits dominate cash in rate of return and the interbank payment flows net to zero — so the central bank has no leverage to affect real allocations through open-market operations; efficiency is out of reach even for a central bank. However, the paper identifies a limited but important role for central bank intervention: the lender-of-last-resort function can prevent bank runs that would otherwise be self-fulfilling equilibria in the model, even though the central bank cannot restore the first-best allocation. This is because the existence of an emergency liquidity backstop eliminates the coordination failure that makes runs self-fulfilling, without requiring the central bank to replicate the state-contingent real payoffs needed for efficiency. A central bank could potentially be incorporated into an extended model with an uneven distribution of payment flows across banks (creating a demand for reserves), but the paper argues that even then, competition across banks would still prevent contingent deposit rates from achieving efficiency.

Key Concepts

inside money : bank-created deposits that arise ex nihilo when loans are extended to borrowers and circulate as means of payment between agents; the paper’s key departure from the prior banking literature, which modeled deposits as outside money (central-bank-issued fiat money) intermediated by banks rather than money created through lending.

consumption possibility set : the set of feasible allocations achievable by the competitive equilibrium with inside-money banking; the paper’s central result is that the efficient first-best allocation — satisfying u′(c₁*) = Ru′(c₂*) — lies outside this set, so the inefficiency is not correctable by improving incentive design within the existing contract space.

nominal deposit contract : a demandable deposit that specifies a fixed nominal interest rate independent of the realization of individual liquidity preference shocks; the paper’s analysis shows that such contracts cannot produce the state-contingent real payoffs required for efficient risk-sharing in an inside-money economy, even when supplemented with contingent rates or early loan liquidation.

lender of last resort : the central bank’s capacity to provide emergency liquidity to banks facing runs by coordinating expectations away from the bank-run equilibrium; the paper’s limited positive result for central bank policy — it can prevent runs even when it cannot achieve efficiency.

Beliefs About the Economy are Excessively Sensitive to Household-Level Shocks: Evidence from Linked Survey and Administrative Data

Mon, 01 Jan 0001 00:00:00 +0000

Bridges

Mon, 01 Jan 0001 00:00:00 +0000

This paper measures the causal effects of land transport infrastructure on economic activity, exploiting quasi-experimental variation in bridge construction over the Mississippi and Ohio Rivers in the United States. The central empirical puzzle motivating the study is a hump-shaped relationship between per capita income and distance to major land transport routes in contemporary U.S. data: income peaks around 5 km from a transport route, with an elasticity of 0.072 closer than 4.1 km and -0.096 at greater distances, so that 85% of Americans live where local income increases with distance to transport routes rather than decreasing. The question is whether this pattern reflects causal effects of infrastructure, selection, or sorting.

The paper develops two complementary identification strategies. The first exploits tributary confluences — where smaller rivers join larger rivers, sharply raising downstream flow rates and bridge construction costs — to generate quasi-random variation in bridge location. Because bridge construction costs increase convexly with river flow (maximum bending moment scales with span length squared), bridges are disproportionately built just upstream of confluences. The median upstream census tract lies 0.7 km from a bridge versus 2.3 km for the median downstream tract, making upstream tracts on average 60% closer to bridges and 27% closer to the nearest major land transport route. This asymmetry dates to at least 1880 and persists to 2010. Despite this persistent connectivity advantage, by 2010 upstream tracts have 13% lower per capita incomes and 63% higher population densities than downstream neighbours. The implied elasticity of per capita income with respect to distance to land transport, scaling the income effect by the distance-to-transport effect, is approximately 0.44. Income density (income per unit area) is higher upstream, though the difference is not statistically significant. Historical placebo tests using pre-bridge-construction data show no asymmetry in land values or population upstream versus downstream, supporting the identification assumption.

The second strategy exploits variation in the timing of bridge construction. Because major bridge projects involve decades of planning, financing, design, and construction — the Wheeling Suspension Bridge was chartered in 1816 but opened in 1849 — the precise opening date is argued to be exogenous to short-run deviations from local growth trends. Using a county-level panel from 1860 to 2010 (432 counties, 14–19 states), the paper estimates event-study regressions around the first time a county experiences a 50% reduction in distance to a bridge. After such a reduction, farm land values (the best available consistent proxy for total economic activity in historical data) rise immediately and cumulatively by approximately 9% over 30 years. Population rises by approximately 5% over the same period. The proportionally larger rise in land values than population implies higher per capita economic activity in better-connected counties after 30 years.

These two sets of results are reconciled through a narrative account of development. Better bridge access drives industrialization — manufacturing employment shares rise in counties experiencing improved connectivity — and urbanization. Cities form around historical transport routes and expand. Richer households then sort away from historical city centres into lower-density suburban areas, while lower-income households remain near or selectively migrate to the historical transport corridors. This within-city sorting produces the observed cross-sectional gradient: areas nearest transport routes end up with higher population density but lower per capita incomes. The negative local income effect of proximity to transport routes is larger in more urbanized areas and areas with higher income inequality, and is concentrated among non-white and low-education populations.

The paper also contributes a new dataset covering every road and rail bridge (237 total) ever constructed over the Mississippi and Ohio Rivers from 1849 to 2010, assembled from the National Bridge Inventory and extensively cross-checked with satellite imagery and historical sources.

Q: What is the motivating empirical puzzle about transport infrastructure and income?

A: In contemporary U.S. census data, per capita income does not monotonically increase with proximity to land transport routes. Instead, the relationship is hump-shaped: income peaks around 5 km from a major transport route, with a positive elasticity of 0.072 within 4.1 km and a negative elasticity of -0.096 beyond that distance. Population density, by contrast, falls monotonically with distance to transport routes. As a result, 85% of Americans live in places where local mean income increases with distance to transport infrastructure rather than decreasing.

Q: How does the tributary confluence identification strategy work?

A: Tributary confluences — where smaller rivers join the main river — cause sharp, localized increases in river flow rates and thus in bridge construction costs, because cost scales convexly with required span length. This makes bridges systematically more likely to be built just upstream of confluences than just downstream. The strategy compares census tracts located upstream versus downstream of the 27 major tributary confluences identified on the Mississippi and Ohio Rivers, controlling for nearest-tributary fixed effects and distance to the confluence.

Q: What is the magnitude of the connectivity difference between upstream and downstream census tracts?

A: Upstream census tracts are approximately 60% closer to a bridge than downstream tracts (coefficient of 0.91 in log distance to bridge, p < 0.01), and consequently 27% closer to the nearest major land transport route (coefficient of 0.32, p < 0.10). This asymmetry is established by 1880 and persists through 2010. The advantage arises approximately equally from proximity to railroads and primary roads.

Q: What are the causal effects of this connectivity advantage on per capita income and population density?

A: Despite being better connected, upstream census tracts have 13% lower per capita incomes (coefficient 0.14 on the downstream indicator in log per capita income, p < 0.05) and 63% higher population densities (coefficient -0.49 on the downstream indicator in log population density, p < 0.05) in 2010. Income density is higher upstream, but the difference is not statistically distinguishable from zero. Scaling the income effect by the effect on distance to land transport implies an elasticity of approximately 0.44.

Q: What pre-bridge-era placebo tests support the identifying assumption for the tributary confluence strategy?

A: Matching modern census tracts to county-level historical data from 1840 and 1850 (before substantive bridge construction began), the paper finds no statistically significant asymmetry in land values or population density upstream versus downstream of tributary confluences. Asymmetric patterns emerge only after bridge construction begins. Ferry crossing locations, traced through place names in the USGS Geographic Names database, also appear equally frequently upstream and downstream, suggesting ferries did not differentially locate upstream.

Q: How does the timing-based identification strategy work, and what is its key assumption?

A: The strategy uses a county-level panel from 1860 to 2010 and estimates event-study regressions around the first time a county experiences a 50% reduction in distance to a bridge. County fixed effects and county-specific quadratic time trends absorb all fixed differences across counties and average changes in trends. The key assumption is that the exact opening date of a bridge is exogenous to short-run deviations from local long-run growth trends — supported by the argument that major bridges involve decades-long planning processes that evolve independently of local economic fluctuations. Pre-trend tests show no significant differences in outcomes before the event.

Q: What are the quantitative effects of a major improvement in bridge access on land values and population?

A: After a county first experiences a 50% reduction in distance to a bridge, farm land values rise immediately and cumulatively by approximately 9% (cumulative effect on log land values of about 0.09) over 30 years, relative to counties with no such change. Population rises by approximately 5% (cumulative log effect of about 0.05) over the same period. The proportionally larger effect on land values than on population implies that per capita economic activity is higher in better-connected counties 30 years after the event. The divergence between land value and population effects grows over time, suggesting productivity advantages accumulate.

Q: Why does the paper use farm land values rather than other income measures in the historical panel?

A: Farm land values — the total value of farm land and buildings — are the best consistently measured proxy for total economic activity available throughout the 1860–2010 census panel. The paper notes explicitly that as the economy industrializes and urbanizes, farm land values increasingly miss urban land values, implying that the estimated effects on farm land values are likely lower bounds on the true effects on total economic activity.

Q: How does the paper address the concern that bridge timing might reflect anticipated local growth?

A: The paper shows that results hold when restricting to counties whose distance to a bridge is only affected by bridges constructed in other counties, addressing the concern that local planners might time construction in anticipation of local growth. The results are also insensitive to controlling for pre-period trends, and outcomes of interest are uncorrelated with future changes in distance to a bridge in preferred specifications.

Q: How does the paper reconcile the negative local income effect (tributary confluence strategy) with the positive aggregate effect (timing strategy)?

A: The reconciliation proceeds through a narrative account combining industrialization, urbanization, and within-city sorting. Better bridge access drives a shift toward manufacturing employment and attracts population, consistent with a productivity advantage enabling exploitation of economies of scale. Cities form around historical transport routes. As cities mature and expand, richer households sort into lower-density suburban areas further from the historical transport corridor, while lower-income households remain near or migrate to the city centre. This within-city sorting produces lower per capita incomes near transport routes even as aggregate economic activity is higher in better-connected areas.

Q: What evidence supports the within-city sorting mechanism specifically?

A: The negative income effect of proximity to transport routes is larger in more urbanized areas and in areas with higher income inequality. The effect is concentrated in areas that were more rapidly urbanizing in the 19th century, and it is stronger for non-white and low-education populations. Upstream census tracts simultaneously show higher manufacturing employment shares and higher population densities, consistent with cities having formed around transport routes, followed by residential sorting away from the core.

Q: What are the two novel identification strategies and their broader applicability?

A: The tributary confluence strategy exploits discontinuities in bridge construction costs generated by sharp increases in river flow rates at confluences; it requires only that bridges are more likely built upstream of confluences than downstream, an asymmetry the paper shows is detectable elsewhere in the world from satellite imagery. The timing strategy exploits the multi-decade planning and construction process for major bridges as a source of near-exogenous variation in opening dates. Both strategies can be applied in other settings where major rivers form substantial barriers to land transport networks.

Q: What does the paper contribute to the debate about whether early U.S. transport infrastructure followed or led economic development?

A: The results support the view that early investments in land transport infrastructure led to meaningful changes in economic geography rather than merely following pre-existing growth patterns. However, the paper finds a moderate level of responsiveness — population density responds to bridge access over several decades, not immediately — consistent with a broader literature documenting sluggish population responses to changes in economic conditions.

Tributary confluence: A location where a smaller river (tributary) joins a larger river, causing a sharp, localized increase in downstream flow rates and therefore a discontinuous increase in bridge construction costs, generating the quasi-experimental variation in bridge location exploited in the paper.

Within-city sorting: The process by which, as cities expand around historical transport routes, richer households differentially relocate to lower-density suburban areas further from the transport corridor while lower-income households remain near or migrate to the historical city centre, reversing the income gradient at small spatial scales.

Income density: The product of population density and per capita income, corresponding to total economic activity per unit area; the paper finds income density is higher in better-connected upstream census tracts even when per capita income is lower, reflecting the dominant effect of higher population density.

Farm land values: The total value of farm land and buildings, used as the best consistently available proxy for total economic activity in the 1860–2010 historical county panel; the paper treats estimated effects on farm land values as lower bounds on effects on total economic activity because farm values increasingly miss urban land as the economy industrializes.

Structural transformation: The shift in the composition of employment away from agriculture and toward manufacturing, which the paper documents occurring in counties that experience improved bridge access, interpreted as evidence that transport infrastructure provides a productivity advantage attracting industrial activity.

Distance to a bridge (as proxy for land transport access): In the study area along the Mississippi and Ohio Rivers, where all land has comparable water access, distance to the nearest bridge strongly predicts distance to the nearest major land transport route (rail or primary road), allowing bridge distance to serve as a consistent measure of transport connectivity throughout the entire study period.

Market access: A measure of economic connectivity that captures both the state of the transport network and the size of accessible markets; the paper notes that log distance to a bridge explains 46% of the variation in market access in 1890 (from Donaldson and Hornbeck’s data) with an elasticity of approximately 0.1, and that halving distance to a bridge increases market access by approximately 7%.

Business, Liquidity, and Information Cycles

Mon, 01 Jan 0001 00:00:00 +0000

The paper studies how the two roles of stock markets — revealing information about firms’ fundamentals (which guides capital allocation) and providing liquidity — interact, arguing that when stocks are used more intensively for liquidity, their prices reveal less information about fundamentals. The authors build a Grossman-Stiglitz-style trading model with two types of rational traders (‘day’ traders who value liquidity and ’night’ traders who value fundamentals) that generates endogenous noise in prices, derive an analytical measure of price informativeness (PI), and structurally estimate PI from firm-level panel data for 16 countries over 1984-2022, finding that PI declines in periods of insufficient funding liquidity (such as the Great Recession and the COVID-19 pandemic) and that these fluctuations are explained mostly by changes in trading activity rather than information quality. Integrating the trading module into a real business cycle model with heterogeneous firms calibrated to the United States, they simulate recessions: a stand-alone recession is ‘cleansing’ — prices become more informative and allocation improves, mitigating output losses by 4.4% — whereas a recession coinciding with banking distress is ‘sullying’ — agents rely more on stocks for liquidity, prices become less informative, and worsened misallocation magnifies output losses by 22%. A counterfactual with exogenous (rather than endogenous) information implies output would fall about 43% more than in the benchmark, which the authors read as evidence that endogenous information acquisition lets stock markets ’lean against the wind’ in recessions. All magnitudes are model-based and specific to the U.S. calibration.

In depth

Q1. What interaction between stock-market roles does the paper study?

The paper studies how the liquidity role of stock markets affects their information role: if stocks are used more intensively for liquidity, prices reveal less information about firms’ fundamentals. While the information and liquidity roles of stock markets are each well studied, their interaction is less understood; the authors ask whether using stocks for liquidity enhances or weakens their information role, how distress in other liquidity sources (such as banks) affects price informativeness, and how this contributes to the depth of recessions.

Q2. How does the trading model generate the information-liquidity tradeoff?

The authors extend Grossman and Stiglitz (1980) by replacing noise traders with two types of rational traders — ‘day’ traders interested in liquidity and ’night’ traders interested in fundamentals — so that each type’s trades act as endogenous noise for the other. In equilibrium a linear pricing function exists in which price informativeness depends on the relative weights of fundamental versus liquidity information in prices, and those weights are determined by how many day and night traders operate, their information choices, and how aggressively they trade. When funding markets malfunction, the economy relies more on stocks for liquidity, there are more day traders, and price informativeness declines.

Q3. What is Price Informativeness (PI), and how is it estimated?

Price Informativeness (PI) is defined analytically as a function of the dispersion of firm productivity, the dispersion of stock-price fluctuations, and their respective price loadings; in a high-PI market, a firm’s high relative stock price is a strong signal of positive information about its fundamentals. The authors estimate PI structurally using firm-level panel data from 16 countries spanning 1984 to 2022. The linear relationship among stock prices, earnings, and stock liquidity holds independently of general-equilibrium considerations, which is what makes the structural estimation tractable.

Q4. What are the empirical cyclical properties of PI?

PI exhibits cyclicality and, more importantly, declines in periods of insufficient funding liquidity, such as the Great Recession and the COVID-19 pandemic. Decomposing PI into its four components, the authors show its fluctuations are mostly explained by changes in trading activity rather than by changes in information quality or the amount of information acquired.

Q5. How is the trading module embedded in a general-equilibrium model and disciplined?

The trading module is integrated into a real business cycle model with heterogeneous firms in which stock prices guide capital allocation, calibrated to the United States with two possibly correlated aggregate shocks — one to aggregate productivity and one to funding liquidity — to capture recessions with and without banking distress. The calibrated model replicates the cyclical properties of the empirical PI measure without targeting them. The authors also discipline how much new information prices convey using price-investment correlations across firms and over time, concluding that new stock-price information is roughly as important as what decision makers already know.

Q6. What are the quantitative real effects in recessions?

In a stand-alone recession, increased uncertainty induces all traders to acquire more information, raising price informativeness and improving allocation, which mitigates output losses by 4.4% (‘cleansing’); when a recession coincides with funding-market distress, heightened liquidity-driven trading makes prices less informative and worsens allocation, magnifying output losses by 22% (‘sullying’). The authors interpret the 22% figure as a sizable real effect of banking problems operating through a novel channel: the weakening of the information and allocative role of stock markets.

Q7. What do the information-structure counterfactuals show?

If information were exogenous rather than endogenously acquired, liquidity distress would reduce PI by more and output would decline about 43% more than in the benchmark — implying endogenous information acquisition lets stock markets ’lean against the wind’ during recessions. The authors further find that halving the cost of information about fundamentals would make output declines about 5% smaller, whereas halving the cost of information about a stock’s liquidity would make declines about 2% larger, leading them to conclude that the welfare effect of transparency is nuanced — easier access to one type of information can make it harder to infer another.

Q8. What are the main limitations and scope conditions?

The authors flag two limitations: the framework assumes no feedback from the real economy back to financial markets (prices affect investment, but investment does not affect prices), and the counterfactuals focus on how the information environment affects price informativeness, abstracting from other channels through which information affects production. Adding two-way feedback would sacrifice the tractability of linear pricing but could introduce additional magnification forces. All quantitative magnitudes are specific to the U.S. calibration.

Key concepts

price informativeness (PI) : the extent to which stock prices reveal to an outside observer the information that informed traders hold about firms’ fundamentals; defined in the paper as an analytical function of productivity dispersion, price-fluctuation dispersion, and their price loadings, and estimated structurally.

day traders vs. night traders : the paper’s two types of rational traders — day traders trade to satisfy liquidity needs, night traders trade on information about fundamentals — whose trades act as endogenous noise for one another, replacing the exogenous noise traders of Grossman-Stiglitz.

funding liquidity vs. market liquidity : funding liquidity is liquidity provided by intermediaries through credit; market liquidity is the ability to trade stocks to meet liquidity needs; when funding liquidity is scarce, agents substitute toward market liquidity, raising liquidity-driven trading.

cleansing vs. sullying recessions : in the paper’s usage, a cleansing recession improves allocation (here via more informative prices), while a sullying recession worsens it; a recession is cleansing without banking distress and sullying when it coincides with funding-market distress.

Capital Income Taxation and Self-Fulfilling Aggregate Instability

Mon, 01 Jan 0001 00:00:00 +0000

Layer 1: Overview

This paper overturns the longstanding consensus established by Schmitt-Grohé and Uribe (1997) that relying on capital income tax adjustments to balance the government budget immunizes the economy against self-fulfilling aggregate instability. The key departure from the prior literature is endogenous capital utilization: when the capital income tax rate adjusts to close budget imbalances and capital utilization is an optimal decision by households, a “fiscal increasing returns” mechanism emerges in which higher economic activity lowers the tax rate, raises the after-tax return to capital, and induces further expansion — rendering the economy prone to sunspots-driven fluctuations. Calibrated to the United States, United Kingdom, and Japan using effective tax rates and public debt-to-GDP ratios, the paper finds that all three economies lie within the indeterminacy region under their current capital income tax rates and capital depreciation allowances of approximately 0.2; stabilization would require raising the depreciation allowance rate from 0.2 to 0.76 or reducing income tax rates by 39–52 percent. Capital depreciation allowances serve as a stabilization device: full allowances (allowance rate = 1) make indeterminacy entirely impossible regardless of the tax rate, because they extinguish the fiscal increasing returns mechanism, and the paper also shows analytically that public debt can be destabilizing rather than stabilizing when capital taxes are used for fiscal adjustment.

In depth

Q1. What is the fiscal increasing returns mechanism that overturns the Schmitt-Grohé-Uribe result?

When the government adjusts the capital income tax rate to balance the budget, higher labor input raises output and the capital tax base, allowing a lower tax rate; under endogenous capital utilization, this triggers an additional channel in which a lower after-tax depreciation cost induces firms to utilize capital more intensively, further raising the effective capital stock and output — generating fiscal increasing returns to scale and a factor share redistribution from capital to labor that together make indeterminacy possible. In log-linearized terms, the effective output-labor elasticity in the equilibrium aggregate production function exceeds unity for tax rates in the interval (τ̄, τ̂) where τ̄ = ρ/(ρ+δ) and τ̂ is the Laffer-curve peak, and this greater-than-unity elasticity is the formal condition for indeterminacy (Corollary 1). With a constant utilization rate as assumed in prior work, both the factor share redistribution and fiscal increasing returns effects vanish, the effective output-labor elasticity falls below unity, and indeterminacy becomes impossible — confirming that endogenous capital utilization is the essential ingredient.

Q2. What are the formal conditions for indeterminacy under the baseline capital tax rule?

Proposition 1 establishes that the fiscal policy with capital income taxation induces indeterminacy of equilibrium if and only if the long-run capital income tax rate τk lies strictly in the open interval (τ̄, τ̂), where τ̄ = ρ/(ρ+δ) and τ̂ is the unique Laffer-curve peak. Under the standard calibration (ρ = 0.04, δ = 0.1, α = 0.3), this interval is (0.286, 0.717) — a wide range covering the effective capital income tax rates of the U.S., UK, and Japan. The determinant of the Jacobian of the linearized dynamic system is positive and the trace is negative over this interval, implying that both eigenvalues are negative, which is the condition for indeterminacy with one predetermined variable (capital) and one jump variable (marginal utility of income).

Q3. How do capital depreciation allowances serve as a stabilization device?

When the taxable capital income base is reduced by a fraction γ ∈ [0, 1] of depreciation expenses, the effective degree of fiscal increasing returns to scale decreases strictly with γ, and the lower bound of the indeterminacy interval τ̄D strictly rises with γ; with full depreciation allowances (γ = 1), the quadratic equation characterizing the lower bound has a repeated unit root, the indeterminacy interval becomes empty, and multiplicity of equilibria is entirely impossible regardless of the capital income tax rate. Corollary 2 formalizes this result analytically. The intuition is that depreciation allowances reduce the procyclicality of the effective tax burden on capital, so the after-tax return to capital responds less strongly to activity, weakening the self-fulfilling loop. Partial allowances — even well below γ = 1 — can sufficiently shrink the indeterminacy region to require implausibly high tax rates for instability.

Q4. What is the role of public debt and what new result does the model deliver?

Contrary to the established view that public debt can serve as an automatic stabilizer that exempts balanced-budget fiscal policy from beliefs-driven instability (Schmitt-Grohé and Uribe 1997, Huang et al. 2018), this paper shows that public debt can be destabilizing when capital income taxes adjust to balance the budget: a higher public debt-to-GDP ratio expands the indeterminacy region, and this destabilizing effect is amplified when capital depreciation allowances are low. Figure 5 in the paper illustrates numerically that raising the public debt-to-GDP ratio from an average of 0.975 (US/UK average) to 1.429 (Japan) dramatically widens the indeterminacy region, particularly at low depreciation allowance rates. This novel result — that public debt destabilizes rather than stabilizes under capital income tax adjustment — constitutes a third main contribution of the paper alongside the indeterminacy result and the stabilization role of depreciation allowances.

Q5. What are the quantitative results for the US, UK, and Japan?

Under the calibrated depreciation allowance rate of approximately 0.2 (the GDP-weighted European average from D’Erasmo et al. 2017, also consistent with the US), all three large economies lie within the indeterminacy region at their current effective income tax rates; stabilization requires either raising the depreciation allowance rate to 0.76 for all three, or reducing income tax rates by 47% for the US, 52% for the UK, and 39% for Japan from their calibrated levels while holding depreciation allowances at 0.2. Less dramatic combination policies also work: for the US, a 10% income tax cut combined with raising the depreciation allowance to 0.67 would suffice, as would a 5% tax cut combined with raising the allowance to 0.70. These calculations are calibrated to effective factor income tax rates from Mendoza et al. (1994) updated to 1996 and public debt-to-GDP ratios from OECD Economic Outlook (2014).

Q6. How does the paper relate to and contribute to the broader indeterminacy literature?

The paper’s mechanism — fiscal increasing returns arising from the interaction of optimal capital utilization and capital income taxation — is novel relative to both strands of the indeterminacy literature: unlike Benhabib-Farmer-style models that require the aggregate production function to have increasing returns as a primitive assumption, and unlike the Schmitt-Grohé-Uribe labor-tax indeterminacy that also does not require increasing returns but found capital taxation immune, this paper shows that increasing returns can emerge endogenously from a constant-returns-to-scale production technology via fiscal policy, requiring no externalities or other non-standard features. The mechanism provides a policy-based micro-foundation for aggregate increasing returns that resolves the empirical criticism of the prior indeterminacy literature; it also distinguishes the result from Huang et al. (2018), who showed that endogenous capital utilization under labor income tax adjustment raises indeterminacy likelihood but leaves the production function at constant returns to scale.

Key Concepts

fiscal increasing returns : the mechanism in this paper whereby higher economic activity lowers the capital income tax rate (via a higher tax base), raises the after-tax return to capital, and induces greater capital utilization and further output expansion; operationally defined by the effective output-labor elasticity exceeding unity in the equilibrium aggregate production function.

equilibrium indeterminacy : the existence of multiple rational-expectations equilibria converging to the same steady state, arising from the fiscal increasing returns mechanism and permitting self-fulfilling sunspots fluctuations unrelated to economic fundamentals; characterized by both eigenvalues of the Jacobian being negative (both predetermined structure of the dynamic system).

capital depreciation allowance : the fraction γ ∈ [0, 1] of capital depreciation costs deductible from the taxable capital income base; the stabilization device the paper identifies, which works by attenuating the procyclical component of the effective capital tax burden and thereby reducing the fiscal increasing returns to scale.

factor share redistribution : in this paper, the shift of the effective factor income share from capital to labor that results from endogenous capital utilization interacting with the capital tax rule; contributes to indeterminacy by raising the effective output-labor elasticity above the share of capital in the production function.

Catastrophes, Delays, and Learning

Mon, 01 Jan 0001 00:00:00 +0000

This paper develops a general model of experimentation under catastrophe risk in which the catastrophe is triggered when a stock variable exceeds an unknown threshold, but occurs only after a stochastic delay. The central contribution is the concept of the “legacy of the past”: at any planning date, past experiments may have already triggered a catastrophe that has not yet materialized, and the planner cannot observe whether triggering has occurred. The legacy is formally defined as the probability, conditional on survival, that a catastrophe was triggered in the past.

The model unifies two canonical but previously incompatible approaches in the literature. In the hazard-rate approach, the catastrophe is bound to happen and the planner manages its timing and severity. In the unknown-threshold approach, learning is instantaneous and the catastrophe is certainly avoided if the stock has not yet exceeded the threshold. Neither approach captures the intermediate case where the planner remains uncertain about whether the catastrophe is already underway. By introducing a delay governed by an exponential distribution with parameter α, the authors show that both approaches are limiting special cases: as α → ∞ (no delay), the legacy vanishes and the unknown-threshold approach is recovered; when the legacy is set permanently to one (catastrophe triggered with certainty), the hazard-rate approach is recovered.

Three benchmark stock levels anchor the analysis. QN is the long-run target absent any catastrophe risk. QD (“Damages”) is the optimal stabilization target when the planner knows a catastrophe was triggered in the past — it lies weakly below QN because the planner trades off current gains against the discounted marginal damage from raising the stock at the moment of eventual catastrophe occurrence. QE (“Experimentation”) is the stock level below which stabilization is suboptimal when the planner is certain no triggering has occurred — it also lies weakly below QN.

The paper’s two main theorems are distinguished by the ranking of QD and QE, which reflects whether mitigation strategies are effective.

Theorem 1 (QE < QD): When damage is not highly sensitive to the stock level at catastrophe time — so mitigation is relatively ineffective — optimal paths are monotonically increasing and converge to a long-run stock level Q∞ ∈ [QE, QD]. The stopping condition equates the marginal benefit of experimentation to a weighted average of the expected cost under the unknown-threshold approach (weight 1 − π) and the marginal damage under the hazard-rate approach (weight π), where π is the legacy at stopping time. A higher legacy at the stopping time is associated with a higher long-run stock level. A higher initial legacy induces fatalism: since the catastrophe is more likely already triggered, the planner shifts priority toward current consumption rather than caution, leading to more total experimentation.

Theorem 2 (QD < QE): When damage is highly sensitive to the stock level — so mitigation is valuable — the long-run target is uniquely QE regardless of the initial legacy. However, the short-run path is non-monotonic: for a sufficiently high initial legacy, the planner first reduces the stock sharply (lockdown, emissions cut) to mitigate pending catastrophe damages, then, as the legacy declines because no catastrophe occurs, gradually allows the stock to rise back toward QE. The direction of caution reverses relative to Theorem 1: a higher legacy now induces more caution, not less.

Applications include pandemic management (stock = infected population, catastrophe = health system collapse) and climate change (stock = cumulative CO2 emissions or atmospheric pollution stock). In the disease control application, whether a planner prioritizes economic production or mortality reduction determines which theorem governs, with the key ratio being production losses relative to mortality increases. For pandemic policy, Theorem 2 produces a formal learning-based rationale for non-monotonic “hammer-and-dance” policies (strict early lockdown followed by relaxation) that differs from prior explanations in the literature. In the carbon budget application, Proposition 5 formally proves that higher initial legacy raises the optimal carbon budget under Theorem 1 conditions, and can imply unbounded consumption (certainty of catastrophe) above a critical legacy threshold π*. Under Theorem 2 conditions (Proposition 6), the optimal policy can involve first reducing then expanding the stock before stabilizing, with both transition dates increasing in the initial legacy.

Q: What is the “legacy of the past” and how is it computed? A: The legacy πt is defined as the probability, conditional on survival to date t, that a catastrophe was already triggered by past experiments. Formally, πt = 1 − [1 − F(Qt)] / pt, where Qt is the highest stock level ever reached, F is the prior distribution over the threshold, and pt is the survival probability. A past experiment at time t’ contributes to the current legacy with weight exp[−α(t − t’)], so recent experiments matter more than distant ones. As time passes without catastrophe, the legacy of any fixed past experiment declines geometrically at rate α.

Q: How do the three benchmark stock levels QN, QD, and QE relate to each other? A: QN is the optimal long-run stock without any catastrophe. QD is defined by the condition where the marginal net benefit of increasing the stock — ν(Q) − [α/(α+δ)]D’(Q) — equals zero, and satisfies QD ≤ QN. QE is defined by ν(Q) − [α/(α+δ)]ρ(Q)D(Q) = zero, and also satisfies QE ≤ QN. The ranking between QD and QE depends on whether damage is more sensitive to the marginal increase in stock at catastrophe time (which pushes QD below QE) or to the level of the stock at triggering (which pulls QD above QE).

Q: What is the key optimality condition in Theorem 1 and how does it unify prior approaches? A: The stopping condition (equation 15) states: ν(QT) = [α/(α+δ)] × [(1 − πT)ρ(QT)D(QT) + πT D’(QT)]. When πT = 0 (no legacy, unknown-threshold limit), this reduces to the experimentation stopping condition of Tsur and Zemel, governed by the hazard rate ρ(QT) times expected loss D(QT). When πT = 1 (full legacy, hazard-rate limit), it reduces to the damage-mitigation condition governed by marginal damage D’(QT). The legacy at stopping time thus serves as the mixing weight between the two canonical approaches, embedding both as special cases.

Q: How does the initial legacy affect total experimentation under Theorem 1 versus Theorem 2? A: Under Theorem 1 (QE < QD), a higher initial legacy π0 leads to more total experimentation (higher Q∞), because the planner becomes fatalistic — since the catastrophe is more likely already triggered and mitigation is relatively ineffective, current consumption is prioritized. Proposition 5 formally proves this for the carbon budget application: the optimal stopping date T and optimal budget QT are nondecreasing in π0. Under Theorem 2 (QD < QE), a higher legacy triggers more caution in the short run (larger reduction in the stock during the mitigation phase), but the long-run target QE remains the same regardless of π0.

Q: What generates non-monotonic policies in Theorem 2, and what does this look like in the pandemic application? A: Non-monotonicity arises because the optimal response to a high legacy is first to reduce the stock sharply to limit catastrophe damages (since damage is sensitive to the stock level), and then, as time passes without catastrophe and the legacy declines, to allow the stock to recover. In the disease control application with high mortality weight, a complete lockdown is optimal in the first phase whenever the legacy is strictly positive. As the legacy declines, the lockdown is gradually relaxed, and eventually the infection level returns to its pre-lockdown level. Figures 3 and 4 show that a higher initial legacy (π0 = 0.1, 0.5, or 0.9) leads to a longer lockdown and slower recovery, though all paths converge to the same long-run infection level.

Q: How does the model’s disease control application determine which theorem governs? A: Lemma 2 states that if 1 / [1 + (Y(r+d) − Y*) / (wµdI^D)] < ρ(I^D), then I^E < I^D and Theorem 1 applies; otherwise I^E > I^D and Theorem 2 applies. The key ratio is (Y(r+d) − Y) / (wµ*d), the production loss relative to mortality increase. A planner who weights economic activity heavily (large production loss ratio) falls under Theorem 1 and tolerates rising infections; a planner who weights mortality heavily falls under Theorem 2 and imposes an initial lockdown.

Q: What is the carbon budget result under Theorem 1 (Proposition 5)? A: Under the condition u1 > [α/(α+δ)]v0 (marginal consumption value exceeds discounted marginal damage), Theorem 1 applies and there exists a critical legacy threshold π* such that: below π*, the planner consumes maximally (qt = q-bar) until a finite date T and then stops, with QE < QT < QD; above π*, the planner consumes maximally forever, triggering the catastrophe with certainty. The stopping date T and the optimal budget QT are nondecreasing functions of initial legacy π0, formally proving that higher past emissions (captured through legacy) justify higher future carbon budgets in this model.

Q: What is the carbon budget result under Theorem 2 (Proposition 6)? A: Under condition u1 < [α/(α+δ)]v0, QD < QE and Theorem 2 applies. Starting from Q0 above QE, if π0 is small enough (specifically u1 > π0[α/(α+δ)]v0), the optimal policy is to stabilize the stock forever at Q0. Otherwise, there exist two finite dates t1 < t2, both increasing in π0, such that the planner first reduces the stock at maximum rate (qt = q-bar-negative) for t < t1, then expands at maximum rate for t1 < t < t2, then stabilizes at Q0 forever. The optimal carbon budget is Q0 in all cases, showing that the long-run target is independent of legacy under Theorem 2.

Q: How does the model relate to the hazard-rate literature formally? A: Papers such as Nordhaus and others that use an exogenous hazard rate h(Qt) for catastrophe — yielding survival probability pt = p0 exp(−∫h(Qτ)dτ) — are shown to be equivalent to the special case where the catastrophe was triggered in the past (legacy = 1 permanently). Their formulation corresponds to assuming α is constant and the legacy is identically one, which reduces the law of motion for pt to pt = p0 exp(−αt). The key difference is that in the hazard-rate approach the planner can reduce the arrival rate by lowering the stock (h is increasing in Q), whereas in the authors’ model the delay parameter α is constant and policy affects only damages.

Q: What is the role of the exponential delay distribution assumption? A: The assumption that the delay τ follows an exponential distribution with parameter α is made for tractability. Under this assumption, the entire past trajectory of the stock (Qt)t≤0 can be summarized by just two state variables — the highest stock on record Q0-bar and the initial legacy π0 — because the exponential “memoryless” property means that the additional expected waiting time until catastrophe occurrence does not depend on how long the triggering has already been in effect. Without this assumption, the full chronicle of past experiments would be required as a state variable, making the problem intractable.

Q: What happens when the delay parameter α approaches zero or infinity? A: When α → ∞ (instantaneous catastrophe upon triggering), pt = 1 − F(Qt) and the legacy is identically zero, recovering the Tsur-Zemel unknown-threshold approach (Proposition 3). The optimal path converges to QE0 from below or stabilizes if already above QE0. When α → 0 (infinite delay, effectively no catastrophe), QE = QD = QN and the problem reduces to the simple stock-flow problem (Proposition 1), with the optimal path converging monotonically to QN.

Q: Does the model allow for damage mitigation after triggering but before occurrence? A: Yes, this is a key feature. The continuation payoff after catastrophe occurrence is V(QT) where QT is the stock level at the time of occurrence T, not at triggering time T(S). This means the planner can reduce the stock after triggering to lower damages — analogous to a skater turning back toward shore after the ice first cracks. The assumption that V depends on the stock at occurrence rather than at triggering or at the maximum historical level is what allows this mitigation channel and is explicitly noted as a modeling choice.

Legacy of the past (πt): The probability, conditional on survival to date t, that past experiments have already triggered a catastrophe. Formally πt = 1 − [1 − F(Qt)] / pt. Recent experiments contribute more to the legacy than distant ones, with contribution decaying at rate α. The legacy is zero when α → ∞ and is the central state variable bridging the paper’s two canonical extremes.

QE (“Experimentation” threshold): The stock level at which the net marginal gain from further experimentation, defined as ν(Q) − [α/(α+δ)]ρ(Q)D(Q), equals zero, under the assumption that no catastrophe has been triggered. Below QE, stabilization is suboptimal; above QE, the planner does not experiment further when the legacy is zero.

QD (“Damages” threshold): The stock level at which the net marginal benefit from holding the stock, defined as ν(Q) − [α/(α+δ)]D’(Q), equals zero, under the assumption that the catastrophe is known to have been triggered. QD ≤ QN and represents the optimal long-run target when the hazard-rate approach applies.

Marginal payoff ν(Q): Defined as uq(0, Q) + (1/δ)uQ(0, Q), it measures the net gain from marginally increasing the flow when the stock is stabilized at Q. It is strictly decreasing in Q under Assumption 1 and equals zero at QN.

Damage function D(Q): Defined as (1/δ)u(0, Q) − V(Q), it measures the welfare loss from catastrophe occurrence when the stock is Q at occurrence time, relative to permanent stabilization at Q. Assumed weakly positive and weakly increasing in Q.

Survival probability (pt): The probability, computed from prior beliefs F at the beginning of times, that the catastrophe has not yet occurred by date t. Its law of motion is ṗt = α[1 − F(Qt) − pt], driven solely by the catastrophe parameter α and the current maximum stock Qt.

Fatalism (under Theorem 1): The policy implication that a higher legacy — meaning a higher probability the catastrophe is already triggered — leads the planner to increase the stock further and accept more experimentation, because mitigation is relatively ineffective (QE < QD) and current consumption must be enjoyed before the catastrophe arrives.

Central Bank Digital Currency with Collateral-Constrained Banks

Mon, 01 Jan 0001 00:00:00 +0000

The paper analyzes the implications of introducing a retail central bank digital currency (CBDC) that competes with commercial bank deposits for household liquidity, in a model where banks must post government bonds as collateral to access central bank lending. The authors revisit Niepelt’s (2022) “equivalence of payment systems” result and find that equivalence survives even under a collateral constraint: the central bank can still offer loans to banks that replicate the no-CBDC equilibrium allocation, but at a lending rate lower than Niepelt’s unconstrained rate, because tighter terms are needed to incentivize sufficient loan uptake when banks must redirect portfolio holdings toward government bonds to qualify. A structural cost remains: banks must hold government bonds as collateral at the expense of extending credit to firms, so equivalence in allocation does not imply full neutrality — banks’ business models and the government’s intermediation role change even when aggregate output and prices are unchanged. In the dynamic extension where the central bank does not sterilize the CBDC introduction, banks respond by narrowing deposit spreads to attract inflows, with the result that a CBDC ramp-up to 5 percent of steady-state output expands rather than contracts bank credit to firms.

In depth

Q1. What is the equivalence of payment systems result and how does the collateral constraint change it?

Brunnermeier and Niepelt (2019) and Niepelt (2022) established that the central bank can neutralize the real effects of CBDC introduction by lending to banks at an appropriate rate to replace lost deposit funding, a result the present paper revisits by adding a collateral requirement on central bank lending — specifically, that banks must hold eligible government bonds up to a fraction θb of their central bank loan value. Under this constraint, Proposition 1 shows that equivalence survives: there exists a central bank lending rate that replicates the no-CBDC equilibrium allocation and price system. However, this lending rate is lower than Niepelt’s unconstrained rate by a factor increasing in the restrictiveness of the constraint (lower θb requires a lower lending rate), because when banks are collateral-constrained, cheaper terms are needed to induce them to borrow enough from the central bank to offset deposit outflows.

Q2. What is Corollary 1 and why does “full neutrality” fail?

Corollary 1 states that even when the central bank achieves allocation equivalence by setting the appropriate lending rate, banks must redirect portfolio holdings from firm loans to government bonds to meet the collateral requirement — crowding out bank credit to firms by an amount equal to the bond uptake, with the crowding-out diminishing as the collateral constraint becomes less restrictive (higher θb). This is the sense in which “full neutrality” fails under the collateral constraint: aggregate output and prices are unchanged, but the composition of credit changes — banks extend less to firms and hold more government bonds — and the government or household sector must absorb the gap in firm financing. In the limiting case where CBDC and deposits are equally valuable to households (λ = 1), the government alone compensates for the reduction in bank loans, effectively expanding its own intermediation role.

Q3. What does the dynamic extension show about bank disintermediation?

Simulating a gradual and near-permanent increase in CBDC to 5 percent of steady-state output without central bank sterilization, the paper finds that banks respond by narrowing their deposit interest spread to attract deposit inflows, such that total deposits do not fall and bank loans to firms expand rather than contract — the opposite of the disintermediation hypothesis. The mechanism relies on the assumption that banks have market power in their regional deposit markets (each bank is a monopsonist): in response to CBDC competition, the bank voluntarily reduces the rent it extracts on deposits (the spread between the risk-free rate and the deposit rate), attracting more deposit inflows. This deposit inflow, combined with central bank loan uptake, expands the bank’s balance sheet and increases credit extension to firms. The result stands in contrast to models with competitive deposit markets, where banks cannot respond to CBDC competition through deposit pricing.

Q4. What changes even if credit is not reduced?

Even when the dynamic model shows credit expansion rather than contraction, the paper establishes that CBDC introduction alters banks’ balance sheet composition and business model: banks shift toward holding more government bonds and away from firm loans, the government assumes a larger credit intermediation role, and the aggregate distribution of capital ownership changes — constituting the form of non-neutrality that survives even when total credit is unchanged. This is what Corollary 1 calls the failure of “full neutrality”: the real allocation equivalence holds at the aggregate level, but the sectoral distribution of who provides credit to firms shifts from the banking sector toward the public sector. The paper interprets this as a structural consequence of the collateral requirement on central bank lending that is absent in the frictionless equivalence benchmark.

Key concepts

equivalence of payment systems : the theoretical result (from Brunnermeier-Niepelt 2019 and Niepelt 2022) that the central bank can ensure the same equilibrium allocation whether or not CBDC exists, by adjusting its lending terms to banks; this paper revisits and extends the result to environments with a collateral constraint.

collateral constraint (θb) : the requirement in this model that banks hold eligible government bonds as a fraction of the central bank loans they take on; adding this friction to Niepelt’s framework preserves equivalence in allocation but requires a lower central bank lending rate and crowds out bank loans to firms.

disintermediation : the concern that CBDC adoption would cause households to shift en masse from bank deposits to CBDC, reducing bank funding and contracting bank credit; the paper finds this does not occur in either the equivalence analysis or the dynamic extension.

monopsony in deposits : the market structure assumption that each regional bank is the sole deposit provider in its region, giving it pricing power over deposit rates; this is what enables banks in the dynamic model to narrow the deposit spread in response to CBDC competition, generating deposit inflows rather than outflows.

full neutrality : a stronger invariance result requiring that not only the equilibrium allocation but also banks’ balance sheet composition and business model are unchanged by CBDC introduction; the paper shows this fails under the collateral constraint even when allocation equivalence holds.

Central Bank Independence at Low Interest Rates

Mon, 01 Jan 0001 00:00:00 +0000

This paper constructs a new measure of political pressure on the Federal Reserve from textual analysis of Fed Chairs’ testimonies at Humphrey-Hawkins congressional hearings, and documents that the use of non-traditional monetary policy instruments at the effective lower bound (ELB) led to increased political criticism that predicts legislative actions threatening central bank independence. A model is developed in which the probability of the monetary authority’s future loss of independence is increasing in the use of non-traditional instruments, leading to attenuated monetary responses and higher inflation volatility. The attenuation can be mitigated under an institutional framework with clearly defined targets where the central bank is evaluated by how efficiently it achieves its goals.

In depth

Q1. What is the new measure of political pressure and what does it capture?

The paper constructs a measure of political pressure on the Federal Reserve by analyzing the evolution of critical questions and statements directed at Fed Chairs during semi-annual Humphrey-Hawkins Act testimonies to Congress, and finds that the number of critical statements specifically referencing non-traditional instruments increased significantly following the 2008 financial crisis. The measure tracks not only the volume of criticism but also its content—distinguishing criticism that specifically references the ELB tools from general discontent associated with low interest rate environments—allowing the paper to isolate the effect of unconventional policy use from other factors associated with the ELB subsample.

Q2. What is the empirical link between political criticism and legislative threats?

Following Hess and Shelton (2016), the paper analyzes bills introduced to Congress that threaten the powers of the Federal Reserve, and finds that the new measure of congressional criticism correlates highly with the introduction of such threatening legislation; moreover, the number of threatening bills specifically mentioning unconventional monetary policy is predicted by the amount of criticism referencing new policy tools. This provides an empirical chain from the use of non-traditional tools to political blowback to concrete legislative risk to Fed independence, motivating the theoretical model.

Q3. How does the threat to independence affect monetary policy in the model?

In the model, when the probability of future loss of independence is increasing in the use of non-traditional instruments, the optimal monetary authority chooses attenuated responses—using non-traditional tools less aggressively than the unconstrained inflation-minimizing policy would prescribe—thereby generating higher inflation volatility as a consequence of the political risk. The model captures the democratic reality that a central bank’s independence is inherently revocable by the legislature; a central bank that interprets congressional criticism as a credible signal of independence risk will internalize this constraint in its policy decisions.

Q4. How can institutional design mitigate the attenuation?

An institutional framework with clearly defined targets where the central bank is evaluated by how efficiently it achieves its goals—rather than by discretionary judgments about the appropriateness of its tools—mitigates the attenuation of monetary responses by narrowing the scope for politically motivated criticism of non-traditional instruments. If critics must evaluate the central bank against transparent targets, they face a higher evidentiary bar for threatening its independence when non-traditional tools are being used to meet those targets; this reduces the political risk of using such tools and restores the unconstrained optimal policy.

Key concepts

Humphrey-Hawkins testimony measure : the paper’s text-based measure of political pressure on the Fed, constructed from the volume and content of critical questions and statements directed at Fed Chairs during semi-annual congressional testimonies; found to predict threatening legislative actions. attenuation of monetary responses : the reduction in the aggressiveness of non-traditional monetary policy use relative to the unconstrained optimal policy, arising from the central bank’s internalization of the political risk of independence loss associated with using non-traditional instruments. clearly defined institutional targets : an institutional framework in which the central bank’s mandate is operationalized as specific measurable targets and the bank is evaluated by its efficiency in achieving them; shown here to mitigate the political risk of non-traditional instruments and restore optimal monetary responses.

Choice and Opportunity Costs

Mon, 01 Jan 0001 00:00:00 +0000

Layer 1 — Overview

This paper develops a unified choice-theoretic framework in which agents evaluate alternatives not in isolation but relative to their opportunity costs — the alternatives they forgo. The central departure from classical theory is the relaxation of additive separability between benefits and costs. In the standard additive model, accounting for opportunity costs is behaviourally equivalent to simple utility maximisation: a decision maker who correctly perceives the feasible set and maximises an additively separable utility will make identical choices whether or not opportunity costs are explicitly considered (the paper calls this the irrelevance of opportunity costs under additivity, formally establishing it as a general result). Once additive separability is relaxed, however, opportunity costs become non-trivial and generate a genuinely distinct theory of choice.

The primitive of the model is a net preference — an asymmetric binary relation on pairs (x, y) of distinct alternatives, where (x, y) ≻ (w, z) means the agent strictly prefers obtaining x while forgoing y over obtaining w while forgoing z. Because the opportunity cost of a chosen alternative depends on what else the agent would choose, and vice versa, choice emerges from an intrapersonal equilibrium rather than from direct maximisation.

The paper defines and axiomatically characterises two nested models. The Recursive Opportunity Model (ROM) adopts a behavioural definition of opportunity costs: the cost of the chosen alternative x in menu A is c(A \ x), the alternative that would actually be chosen were x unavailable; the cost of every unchosen alternative is x itself. This recursive structure is completely characterised by a single observable condition — Weak Path Independence (WPI): if x is chosen when added to a menu A, then x must also be chosen in a pairwise comparison against c(A). WPI is shown to imply Always Chosen (AC) — that a Condorcet winner is always selected — but it permits pairwise cycles of choice (failures of No Binary Cycles). Rationality within the ROM requires additionally that the net preference be a strict order satisfying Congruence, an acyclicity condition on the gross preference induced by the net preference. Even then, the utility function being maximised need not coincide with the gross preference naturally implied by the underlying psychological net preference, raising a welfare identification problem.

The Opportunity Model (OM) generalises the ROM by allowing the opportunity cost of the chosen alternative to be any unchosen alternative rather than the recursively determined one. This relaxation permits both pairwise cycles and menu effects (Condorcet violations). The OM is completely characterised by Never Chosen (NC): an alternative that loses every pairwise comparison within a menu (a Condorcet loser) cannot be chosen. Imposing a strict order and Congruence on the net preference of an OM rules out only pairwise cycles, leaving menu effects intact. Full rationality within the OM is restored only with the additional assumption that opportunity costs are non-decreasing in the induced gross preference as the feasible set expands (the Increasing Opportunity Model).

Extensions characterise multivalued versions of both models (M-ROM and M-OM) via adapted axioms on choice correspondences, and show that several known behavioural models in the literature — including list-rationalizable choice and game-tree rationalizable choice — satisfy WPI and thus are instances of ROM. Applications demonstrate that OMs can represent the attraction effect and the multiple decoy effect, providing a preference-maximisation account without appealing to bounded cognition, and that ROMs can represent intransitive pairwise choices via smooth parametric net preferences, avoiding the discontinuities of lexicographic semiorder models.

Q: What is the paper’s foundational definition of opportunity cost, and how does it differ from the standard textbook definition? A: The paper defines the opportunity cost of the chosen alternative x in menu A as the alternative that would actually be chosen from A \ {x} — that is, c(A \ {x}). The opportunity cost of any unchosen alternative y is the actual choice x. The standard textbook definition — “the next-best feasible alternative” — presupposes context-independent, additively separable preferences, precisely the assumption the paper relaxes. The behavioural definition is grounded directly in the agent’s own choice function, making it consistent with non-separable evaluations.

Q: Under what conditions do opportunity costs become irrelevant, and why? A: If preferences admit an additively separable utility representation u, then for any finite menu A and any two alternatives x and y, u(x) ≥ u(y) if and only if u(x) − max_{a ∈ A{x}} u(a) ≥ u(y) − max_{a ∈ A{y}} u(a). Net utility maximisation and gross utility maximisation rank alternatives identically. Opportunity costs become non-trivial only when additive separability is relaxed — at that point, the agent’s comparative evaluation of (alternative, cost) pairs can produce choices that no gross utility function rationalises.

Q: What is the Recursive Opportunity Model (ROM) and what single axiom characterises it? A: A choice function c is a ROM if there exists a net preference ≻ such that for every menu A and every unchosen alternative x, the chosen alternative evaluated at its opportunity cost is preferred to x evaluated at c(A). This is equivalent to the choice function satisfying Weak Path Independence (WPI): if x ∉ A and x = c(A ∪ {x}), then x = c({x, c(A)}). WPI is necessary and sufficient for a ROM (Theorem 1). It is not sufficient for full rationality, as it permits pairwise cycles while ruling out menu effects.

Q: What kinds of irrationality can a ROM exhibit, and what kinds does it preclude? A: The paper establishes (Corollary 1) that WPI implies Always Chosen — a ROM always selects the Condorcet winner when one exists. Therefore, the only admissible form of irrational behaviour in a ROM is pairwise cycles (failures of No Binary Cycles). Condorcet violations (menu effects) are precluded. A ROM becomes fully rational if and only if it additionally satisfies No Binary Cycles.

Q: What additional condition on the net preference guarantees that a ROM is rational? A: Theorem 2 establishes that a choice function is rational if and only if it is a ROM generated by a net preference that is a strict order (complete, asymmetric, transitive) satisfying Congruence. Congruence requires that the induced binary relation P≻ on alternatives — defined by xP≻y whenever there exists z such that (x, z) ≻ (y, z) or (z, y) ≻ (z, x) — is acyclic. For a (u, v)-additive net preference, Congruence holds if and only if u and v are ordinally equivalent.

Q: Can rational behaviour generated by a ROM be welfare-analysed using revealed preference in the standard sense? A: No — and this is a key warning in the paper. Even when a ROM with a strict order and Congruence produces fully rational behaviour, the utility function being maximised need not coincide with the gross preference P≻ naturally induced by the underlying net preference. The paper provides an explicit example (Remark 1, equation 10) in which the choice-rationalising order P is xPyPz while the induced preference is xP≻zP≻y. The utility “revealed” by choice may diverge from the psychological primitive driving that choice, undermining the normative authority of standard revealed preference welfare analysis.

Q: What is the Opportunity Model (OM) and how does it extend the ROM? A: The OM relaxes the recursive assumption by allowing the opportunity cost of the chosen alternative to be any unchosen element of the menu rather than specifically c(A \ c(A)). This breaks the recursive structure while preserving the intrapersonal equilibrium character (the choice still affects the net value of alternatives). The OM is completely characterised by Never Chosen (NC): no Condorcet loser can be chosen (Theorem 3). Unlike the ROM, an OM may fail to select the Condorcet winner, permitting both pairwise cycles and Condorcet violations.

Q: What is the Increasing Opportunity Model and when does it restore full rationality? A: An IOM is an OM in which the opportunity function o is monotone in the sense that if A ⊃ B and o(A) ≠ o(B), then o(A) is ranked higher than o(B) in the induced gross preference P≻. Intuitively, opportunity costs do not decrease as the feasible set expands. Theorem 5 establishes that a choice function is rational if and only if it is an IOM generated by a net preference that is a strict order satisfying Congruence. Full rationality within the OM thus requires both the internal consistency of the net preference (strict order, Congruence) and this monotonicity of opportunity costs.

Q: How does the paper explain the attraction effect using the OM? A: In the canonical formulation, c({x,y}) = x, c({y,d}) = y, c({x,d}) = x, and c({x,y,d}) = y, where d is a decoy. This pattern is incompatible with gross preference maximisation. The paper represents it as an OM with opportunity function o({x,y,d}) = d and a strict net preference order yd ≻ xy ≻ yx ≻ xd ≻ dx ≻ dy. The psychological interpretation is that the introduction of the decoy shifts the comparator for y from x to d; y looks more favourably comparable to d than x does, so the equilibrium where y is chosen is selected. No bounded cognition or imperfect attention is assumed.

Q: How does the framework account for multiple decoys? A: With decoys dx and dy specific to x and y respectively, the observed pattern c({x,y}) = x and c({x,y,dy}) = y and c({x,y,dx,dy}) = y can be represented as an OM with a transitive net preference satisfying xdx ≻ ydy ≻ xy ≻ yx ≻ dyy ≻ dxx and opportunity function o({x,y,dx,dy}) = dx, o({x,y,dy}) = dy. The paper notes this net preference can be extended to a strict order while preserving the choice pattern. This accommodates a phenomenon that poses a challenge to standard theoretical choice literature (per Masatlioglu, Nakajima and Ozbay [25]).

Q: How does the ROM explain intransitive choices more smoothly than lexicographic semiorder models? A: The paper shows that the Tversky (1969) cyclical pattern c({x,y}) = x, c({y,z}) = y, c({x,z}) = z with x=(115,7), y=(117,3), z=(120,0) can be generated by net preferences that admit smooth parametric representations. Specifically, for any two alternatives w=(a,b) and z=(c,d), the paper proposes (w,z) ≻ (z,w) iff (max{a−c, b−d})² > k(min{a−c, b−d})², where k is a relative sensitivity parameter. For k=1/2 this yields the required cycle. Lexicographic models require sharp discontinuities in preference and systematic avoidance of trade-offs, which are often viewed as implausible within the standard economic paradigm; the smooth parametric form avoids these features.

Q: What is the relationship between ROMs and previously studied choice models in the literature? A: Several known models satisfy WPI and are therefore, by Theorem 1, instances of ROMs: specifically, Rationalizability by Game Trees (Xu and Zhou) and List-Rationalizable Choice (Yildiz) are shown to satisfy WPI. The two-stage choice model of Bajraj and Ulku satisfies NC but not WPI, making it an OM but not a ROM. The net preference being maximised in each case can in principle be recovered using the explicit construction in the proof of Theorem 1.

Q: How does the ROM relate to Koszegi-Rabin personal equilibrium? A: Both models involve preferences that depend on a variable determined endogenously by choice, requiring an intrapersonal equilibrium concept in which the agent’s conjectures about their own behaviour must be internally consistent. The key difference is that in Koszegi-Rabin the psychological primitive is a set of reference-dependent preferences ≻r on alternatives in X (where r is the reference point), and equilibrium requires c(A) ≻{c(A)} y for all y ∈ A \ c(A). In the ROM, the primitive is a preference on pairs of distinct alternatives, and the opportunity cost differs for each alternative being compared (the chosen alternative has one opportunity cost, each unchosen alternative has a different one, namely c(A) itself).

Net preference: An asymmetric binary relation on pairs (x, y) of distinct alternatives, where (x, y) ≻ (w, z) means the agent strictly prefers to be in a situation where they choose x while forgoing y over a situation where they choose w while forgoing z. The primitive is defined on X = {(x, y) ∈ X × X : x ≠ y}, without imposing additive separability.

Recursive Opportunity Model (ROM): A choice function c is a ROM if there exists a net preference ≻ such that for every menu A and every unchosen x, the pair (c(A), c(A \ c(A))) ≻ (x, c(A)). The opportunity cost of the chosen alternative is defined recursively as c(A \ c(A)); choice results from intrapersonal equilibrium rather than simple maximisation.

Opportunity Model (OM): A generalisation of the ROM in which the opportunity cost of the chosen alternative can be any unchosen alternative in the menu (not necessarily the recursively determined one). Characterised by Never Chosen: no Condorcet loser can be chosen. Permits both pairwise cycles and Condorcet violations.

Weak Path Independence (WPI): The axiom characterising ROMs: if x ∉ A and x = c(A ∪ {x}), then x = c({x, c(A)}). Equivalently, if an alternative is chosen upon being added to a menu, it must also win in a pairwise comparison with what was previously chosen from the original menu.

Congruence: A consistency condition on net preferences requiring that the induced binary relation P≻ — defined by xP≻y whenever there exists z such that (x,z) ≻ (y,z) or (z,y) ≻ (z,x) — is acyclic. For a (u,v)-additive net preference, Congruence holds if and only if u and v are ordinally equivalent. Together with a strict net preference order, Congruence in a ROM is equivalent to rational choice.

Intrapersonal equilibrium: The concept underlying both models: an agent is in equilibrium when selecting x from A if they correctly anticipate their own contingent behaviour across hypothetical scenarios (i.e., they use the actual choice function c to evaluate what they would choose from A \ {x}), and the chosen alternative is net-preference-maximal given those consistent conjectures.

Never Chosen (NC): The axiom characterising OMs: an alternative that is a Condorcet loser — losing in every pairwise comparison within a menu — cannot be chosen from that menu. NC is weaker than WPI (which implies both Always Chosen and Never Chosen) and is the precise behavioural content of the OM.

Climate Policies, Macroprudential Regulation, and the Welfare Cost of Business Cycles

Mon, 01 Jan 0001 00:00:00 +0000

This paper embeds a carbon pricing sector into an extended DSGE model with a financial accelerator (E-DSGE) featuring heterogeneous firms, bank monitoring, and a borrowing-constraint amplification mechanism, then compares the welfare cost of business cycles under a cap-and-trade (CAT) scheme versus a carbon tax. The central result is that, in the presence of financial frictions, CAT generates lower welfare costs than a carbon tax: under TFP and risk shocks calibrated to US quarterly data, the baseline welfare cost of business cycles is 0.6178 percent of consumption under CAT versus 1.5231 percent under a carbon tax — roughly 2.5 times larger under a tax. The mechanism is that permit prices under CAT are procyclical (they fall in downturns, reducing firms’ carbon compliance burden precisely when balance sheets are most stressed), acting as an automatic stabilizer for financial amplification, while the carbon tax holds a fixed price and provides no such buffer. A countercyclical optimal carbon tax rule that reacts vigorously to output (optimal sensitivity parameter τ = 52.2245) can mimic CAT’s stabilizing behavior, but even optimized environmental rules leave a significant welfare gap between regimes. Reserve requirement macroprudential regulation narrows this gap substantially: a static 2 percent reserve requirement brings CAT welfare costs to 0.1957 and carbon tax costs to 0.3863; an optimal dynamic rule keyed to credit growth or asset price growth brings both regimes below 0.20, effectively aligning them. A deposit interest rate subsidy can also narrow the gap when combined with a dynamic subsidy rule, but a static subsidy actually worsens welfare costs because it raises leverage and amplifies shocks around a more fragile steady state.

In depth

Q1. What is the model structure and how does the environmental policy sector integrate with financial frictions?

The model is an E-DSGE built on the Christiano, Motto, and Rostagno (2014) financial accelerator framework, extended to include a carbon price instrument and heterogeneous firms that face both standard borrowing constraints and carbon compliance costs. There is a representative household and three firm sectors: a continuum of capital-producing entrepreneurs, retailers, and a goods sector. Banks extend loans to entrepreneurs at a spread over the risk-free rate; the external finance premium is endogenous because bank monitoring is costly and borrowers face costly state verification (as in Bernanke, Gertler, and Gilchrist, 1999). Environmental policy is introduced through a carbon permit or tax that enters firms’ marginal cost, so the carbon price affects both production decisions and the entrepreneur’s net worth, which in turn feeds back into the spread through the financial accelerator. Calibration uses US quarterly data (Table 1 in the paper), and the model is solved by log-linearizing around a deterministic steady state.

Q2. Why do financial frictions create a welfare advantage for cap-and-trade over carbon taxes?

Under a carbon tax, the tax rate is fixed by the regulator regardless of macroeconomic conditions; when a TFP or risk shock contracts output and reduces firm net worth, the fixed carbon cost amplifies the contraction by reducing the entrepreneur’s retained earnings, worsening the external finance premium, and deepening the financial accelerator loop. Under a CAT scheme, the equilibrium permit price is endogenous: it falls when aggregate activity and emissions decline, automatically lowering the compliance cost burden for firms at exactly the moment when balance sheets are most constrained. This procyclicality of permit prices functions as an automatic stabilizer, partially offsetting the financial accelerator’s amplification. The paper shows this via impulse response functions (Figures 1 and 2 in the paper) to TFP and risk shocks: under CAT, the responses of investment, bankruptcy, spread, and output are systematically more muted than under a carbon tax. Quantitatively, the baseline welfare cost of business cycles is 0.6178 percent of consumption under CAT versus 1.5231 percent under a carbon tax — a gap of nearly 0.91 percentage points of consumption.

Q3. How do optimal environmental policy rules affect welfare costs, and do they close the gap between regimes?

An optimal flexible CAT rule that allows permit prices to respond countercyclically to a macroeconomic indicator (net output) reduces welfare costs from 0.6178 to 0.4528 percent; an optimal flexible carbon tax rule reduces costs from 1.5231 to 1.1811 percent. In both cases, the optimal rule specifies vigorous countercyclical response: the optimal sensitivity parameter for the carbon tax rule is τ = 52.2245, meaning the tax rate must decrease sharply in recessions to mimic the automatic procyclicality of permit prices under CAT. Despite these improvements, the welfare gap between the two regimes persists even under optimal environmental rules: the optimized CAT still generates roughly 0.73 percentage points lower welfare costs than the optimized carbon tax. The paper concludes that countercyclical environmental policy can reduce but not eliminate the inherent stabilization advantage of CAT in the presence of financial frictions — because the fundamental mechanism (endogenous permit prices vs. fixed tax rate) cannot be fully replicated by a tax rule with a single output-gap indicator.

Q4. How do reserve requirement macroprudential regulations interact with the carbon pricing choice?

Introducing a static 2 percent reserve requirement (banks can loan out only 98 percent of deposits) already strongly reduces welfare costs and partially aligns the two regimes: CAT welfare costs fall from 0.6178 to 0.1957, and carbon tax costs fall from 1.5231 to 0.3863 (Table 5 in the paper). The mechanism is that reserve requirements limit bank credit expansion, lowering equilibrium leverage and reducing the severity of the financial accelerator — when firms’ balance sheets are less leveraged, adverse shocks cause smaller spirals in net worth and spreads. Dynamic reserve requirement rules — keyed to credit growth (optimal ψ_B ≈ 1.047) or asset price growth (optimal ψ_Q ≈ 0.722) — reduce welfare costs further, to 0.1207 under CAT and 0.2300 under a carbon tax with a credit-growth rule, effectively narrowing the gap to around 0.10 percentage points. The optimal policy mix (jointly optimizing both the macroprudential and environmental rules) achieves minimal additional improvement beyond the macroprudential optimum alone, suggesting the dominant stabilizing role is played by financial regulation rather than the choice of carbon pricing instrument when both are available and optimally calibrated.

Q5. How does macroprudential regulation affect the volatility of emissions and permit prices under each regime?

Table 6 in the paper reports coefficients of variation (CVE for emissions volatility, CVP_E for permit price volatility) across policy combinations. Under baseline CAT with no macroprudential regulation, CVE = 0 (the cap fixes aggregate emissions by construction) and CVP_E = 8.3578 — permit prices are very volatile. Adding a static reserve requirement reduces CVP_E to 2.5125; an optimal credit-growth rule reduces it to 1.0935, a reduction of nearly 87 percent from baseline. Under baseline carbon tax, CVP_E = 0 (the tax price is fixed by regulation) but CVE = 0.0574 — emissions are volatile. Adding a static reserve requirement reduces CVE to 0.0273; an optimal credit-growth rule reduces it to 0.0153. The paper interprets this as macroprudential regulation fostering convergence between the two instruments in their business cycle properties: it substantially stabilizes permit prices under CAT and substantially stabilizes emissions under a carbon tax, reducing the distinguishing uncertainty of each pricing approach. The optimal policy mix under a carbon tax with a dynamic subsidy achieves CVE = 0.0090 and CVP_E = 0.4956, showing that well-designed financial regulation can make a carbon tax nearly as emissions-stable as a CAT while also reducing permit price volatility.

Q6. What happens under an interest rate subsidy to depositors as an alternative macroprudential tool?

A static deposit interest rate subsidy of the welfare-maximizing level (1 percent) worsens welfare costs of business cycles — from 0.6178 to 1.1028 under CAT and from 1.5231 to 3.5597 under a carbon tax — because the subsidy moves the economy to a higher-leverage steady state, around which financial amplification is more severe (Table 7 in the paper). The intuition is that the subsidy encourages saving by raising the return on deposits, which raises equilibrium loan supply, which raises leverage; a more leveraged economy is more sensitive to adverse shocks. A dynamic subsidy rule that responds countercyclically to credit growth (optimal κ ≈ 1.319) mitigates this problem: it discourages saving when credit is expanding and encourages it when credit is contracting, partially stabilizing leverage dynamics. The dynamic subsidy reduces welfare costs substantially — to 0.2506 under CAT and 0.4706 under a carbon tax — and a joint optimization of the subsidy and the carbon pricing rule achieves 0.1926 under CAT and 0.4366 under a carbon tax with a dynamic subsidy. The authors note that the static subsidy result illustrates a general principle: macroprudential policies that move the steady state toward higher leverage can amplify cycle costs even while achieving efficiency gains around the steady state, and that distinguishing between steady-state and fluctuation welfare effects is essential when comparing such policies.

Q7. What are the main welfare and policy conclusions?

The paper establishes three conclusions. First, climate policy instrument choice has macroeconomic stabilization consequences in financially frictionous economies: CAT dominates a carbon tax for welfare when financial frictions are operative and macroprudential policy is absent or limited. Second, macroprudential regulation — particularly dynamic reserve requirement rules — is the more powerful tool for reducing the welfare cost of business cycles under both carbon pricing regimes, and can largely align the two regimes, making the instrument choice less consequential when macroprudential policy is well-calibrated. Third, the interaction between financial regulation and carbon pricing is non-trivial: the optimal sensitivity parameters for macroprudential rules differ depending on whether the economy uses CAT or a carbon tax, because the endogenous procyclicality of permit prices changes how financial shocks propagate through the economy.

Key concepts

financial accelerator: the mechanism by which adverse shocks to entrepreneurial net worth raise the external finance premium (the spread between the loan rate and the risk-free rate), reduce investment and output, further depress net worth, and generate amplified cycles; the core friction in the E-DSGE model and the channel through which carbon pricing affects welfare costs.

procyclical permit prices: the endogenous tendency of permit prices under a CAT scheme to fall when aggregate economic activity and emissions decline; the paper’s central mechanism through which CAT acts as an automatic stabilizer for the financial accelerator — permit prices fall precisely when firms’ balance sheets are most stressed, reducing compliance costs and partially offsetting amplification.

welfare cost of business cycles (Lucas measure): the percentage of consumption that a representative household would be willing to give up to move from a world with business cycle fluctuations to one without, evaluated relative to the deterministic steady state; in the paper’s baseline calibration, this is 0.6178 percent under CAT and 1.5231 percent under a carbon tax.

reserve requirement macroprudential regulation: a regulatory constraint requiring banks to hold a fraction of deposits in reserves, limiting loan supply; implemented in the model as Φ_t ∈ (0,1] where lower Φ_t requires banks to hold more reserves; a static 2 percent reserve requirement already substantially narrows the welfare gap between carbon pricing regimes, and an optimal dynamic rule nearly closes it.

E-DSGE (Environmental DSGE): the paper’s model class — a DSGE with financial frictions (Christiano-Motto-Rostagno financial accelerator) and a carbon pricing sector; used to analyze the interaction between environmental policy instruments and macroprudential regulation in an economy with both climate and financial externalities.

coefficient of variation of emissions (CVE) / permit prices (CVP_E): volatility measures used to assess how macroprudential regulation affects the business-cycle properties of each carbon pricing instrument; macroprudential regulation substantially reduces CVP_E under CAT and CVE under a carbon tax, making each instrument’s uncertainty properties more symmetric.

Closing Gender Gaps Through Workplace Diversity: The Intergenerational Effects of World War I

Mon, 01 Jan 0001 00:00:00 +0000

This paper asks whether exposure to greater female representation in the workplace can persistently reduce intergenerational gender gaps in labor market outcomes. The authors exploit the sudden, city-by-department variation in female employment within the U.S. federal government triggered by World War I mobilization. Using the Official Registers of the United States — biennial personnel rosters covering the near-universe of federal employees from 1913 to 1921 — linked to full-count decennial censuses (1900–1940), they construct a granular measure of each office’s (city × department) change in female share between 1915 and 1919, then trace labor force outcomes for the children of incumbent civil servants in the 1940 Census.

WWI caused the female share of the federal civilian workforce to jump by 13 percentage points — a doubling within two years (1917–1919). These wartime female entrants were younger, more likely to be single, more educated, more geographically mobile, and less likely to have been previously employed than their male counterparts, suggesting the war mobilized a previously untapped labor pool. The increase was driven almost entirely by clerical positions: the female share of the federal clerical workforce rose from roughly 30% to nearly 70% within two years.

The main finding is that a one standard deviation (SD) increase in parental exposure to female co-workers reduces the gender gap in labor force participation (LFP) among children of incumbent civil servants by 4.1–4.6 percentage points in the within-city, within-department specification — a decline in the mean gender LFP gap of approximately 8.6–9.6% by 1940. This effect is entirely driven by a higher propensity of daughters to work; sons’ LFP is unaffected. The intergenerational effect operates primarily through exposed fathers, including fathers without working wives, identifying a channel beyond the mother-to-daughter vertical transmission emphasized in prior literature. Children who were teenagers at the time of parental exposure show the largest effects, consistent with formative-years malleability. A placebo test using civil servants who left the same offices before the wartime shock shows no comparable effect, ruling out time-invariant office-level selection.

Parental exposure extends beyond the public sector: the private sector LFP effect is comparable in magnitude to the public sector effect. The gender earnings gap among children of exposed civil servants narrows by 12%, driven by daughters moving into higher-paying, previously male-dominated positions rather than by differences in hours or weeks worked. Marriage, fertility, and schooling differences only partially mediate the LFP effect, with a residual exposure effect remaining after controlling for these proximate determinants.

At the aggregate level, a 1 SD increase in city-level exposure to female federal workers raises overall female LFP by 0.9–1.0 percentage points, with no effect on male LFP, and the effect persists through 1940. A back-of-envelope calculation implies each additional female wartime civil service entrant generated approximately 2.4 additional women entering the workforce — a multiplier effect. Neighborhood-level analysis shows LFP gains are concentrated in enumeration districts where wartime female civil servants resided, and cities with greater female federal employment exposure also saw faster women’s club membership growth after WWI.

The scope conditions are important: the sample covers 70 cities and 8 federal departments with meaningful pre-war staffing; children must have been born by 1917; and the 1940 outcomes reflect adulthood labor decisions in a labor market shaped by subsequent decades of change. The design relies on within-city and within-department residual variation in female share change being conditionally exogenous, supported by lack of correlation with pre-war office characteristics.

Q: What was the scale of the WWI shock to female federal employment? A: The U.S. entry into WWI in April 1917 triggered a near-doubling of total federal civilian employment from roughly 150,000 to over 300,000 workers by 1919. Within this expansion, the share of female civil servants increased by 13 percentage points — a doubling of the female share within two years. The increase was driven almost entirely by clerical positions, where the female share rose from around 30% to nearly 70%.

Q: How do the authors measure parental exposure to female co-workers? A: Exposure is measured as the change in the share of female civil servants at the city-by-department (“office”) level between 1915 and 1919. The sample is restricted to offices with at least 20 civil servants in 1915 and cities with at least two federal departments, yielding 70 cities and 8 departments. The interquartile range of exposure across offices is approximately 10 percentage points, and cross-city and cross-department variation explains 58% of the overall variation, leaving substantial residual office-level variation for identification.

Q: What is the main intergenerational finding and its magnitude? A: A 1 SD increase in parental exposure to female co-workers increases the relative likelihood that a daughter works (compared to a son) by 2 percentage points in the baseline specification, and by 4.1–4.6 percentage points in the preferred within-city and within-department specification. Since daughters of civil servants are on average 48 percentage points less likely than sons to be in the labor force in 1940, this corresponds to closing the mean gender LFP gap by approximately 8.6–9.6%.

Q: Does the effect operate through daughters or sons? A: The effect is entirely driven by daughters. Parental exposure to female co-workers has no statistically discernible impact on the labor force participation of sons. The decline in the gender LFP gap is thus attributable to a higher propensity of daughters of exposed civil servants to work.

Q: What is the key placebo test, and what does it show? A: The authors exploit high-frequency personnel records to identify civil servants who selected into the same offices that would later be exposed but who left before the wartime shock occurred. These pre-departure leavers show no intergenerational exposure effects on their children’s LFP, ruling out the interpretation that time-invariant selection into particular offices drives the results.

Q: Which parent serves as the primary channel of transmission? A: Exposed fathers are the primary conduit. The effect for daughters is precise and sizable even when restricting the sample to fathers without working wives, suggesting the channel does not depend on children observing maternal employment. While the estimated effect through mothers is positive, it is imprecise — likely due to the small sample of female incumbent civil servants. This identifies fathers as a new channel of vertical intergenerational norm transmission, beyond the mother-to-daughter pathway emphasized in prior literature.

Q: How does children’s age at the time of parental exposure moderate the effect? A: The exposure effects are concentrated among children who were teenagers at the time of parental exposure during WWI. Children who were older and more likely to have already left the household or formed fixed beliefs show little to no detectable effect. This pattern is consistent with the formative-years hypothesis that experiences during adolescence shape lifetime economic behavior.

Q: Does the intergenerational effect extend beyond the public sector? A: Yes. The private sector LFP effect for daughters is comparable in magnitude to the public sector effect, with a 1 SD increase in parental exposure having approximately equal effects on LFP within public and private employment. There is also no measurable shift toward clerical occupations specifically, suggesting the channel is a broader change in attitudes toward women working, not transmission of information about specific government or clerical jobs.

Q: What is the effect on the gender earnings gap? A: A 1 SD increase in parental exposure to female co-workers closes the gender earnings gap among children of civil servants by 12%. This is not driven by differences in weeks or hours worked, but rather by daughters of exposed parents selecting into higher-paying and previously male-dominated occupations.

Q: How do the authors address the possibility that the results reflect local labor market conditions rather than parental exposure per se? A: By 1940, 67% of civil servant children lived in a city different from their parent’s WWI-era city. Even among children who moved to the same destination city — and thus face identical labor market conditions — variation in parental exposure at the origin city-by-department remains highly predictive of daughters’ LFP. Comparing children moving from the same origin city to the same destination city, those with parents in higher-exposure departments still show higher LFP, pointing to cultural transmission rather than local labor market demand.

Q: What do the marriage and fertility results indicate about mechanisms? A: Daughters of more exposed civil servants are less likely to be married (a 1 SD increase in parental exposure reduces the relative likelihood of daughters being married by 3.7 percentage points) and tend to have fewer children by 1940. A mediation exercise shows these observable differences in marriage, fertility, and education only partially explain the LFP increase; a statistically significant and economically large residual exposure effect remains, consistent with parental exposure shifting broader gender norms rather than only proximate determinants of labor supply.

Q: What does the spousal work decision evidence contribute? A: A 1 SD increase in male civil servants’ exposure to female co-workers increases the propensity of their subsequent wife to work by 0.5 percentage points after WWI. The effect is driven by marriages formed after the exposure and is not mechanically explained by men marrying their female co-workers. This revealed preference measure supports the interpretation that exposure changed men’s attitudes toward women’s work.

Q: What do naming patterns suggest about changing attitudes? A: Exposed parents are more likely to give daughters names that are less feminine — specifically, names with a lower share of vowels or less likely to end with a vowel — for daughters born after WWI. No comparable effect is observed for sons’ names. This provides supplementary evidence of a shift in paternal attitudes following workplace exposure to female co-workers.

Q: What are the aggregate city-level effects on female LFP? A: In a difference-in-differences design using cross-city variation in female federal worker exposure before and after WWI, a 1 SD increase in city-level exposure raises aggregate female LFP by 0.9–1.0 percentage points, with no effect on male LFP. The effect is persistent through 1940 and city-level exposure is uncorrelated with female LFP prior to WWI. A back-of-envelope calculation implies each additional female wartime entrant generated approximately 2.4 additional women entering the broader workforce — a social multiplier.

Q: Is there evidence of horizontal (non-family) transmission? A: Yes. The aggregate LFP gains are concentrated almost entirely in census enumeration districts where female wartime civil servants resided; neighboring districts without female entrants do not see comparable gains. Cities with greater increases in female federal employees also experienced faster growth in women’s club memberships, with this pattern appearing only after WWI and coinciding with the rise in female LFP. Both findings are consistent with social learning operating through residential proximity and community networks.

Q: How robust are the results to potential selection bias from imperfect census linking? A: The propensity of a civil servant’s child to be linked to the 1940 Census is — conditional on city and department fixed effects — uncorrelated with the parental exposure measure. The authors apply inverse probability weighting (IPW) to ensure the matched sample is balanced on baseline characteristics, and results remain virtually identical. Estimates are also stable across different linking strategies individually.

Q: What instrumental variable strategy is used and what does it find? A: The authors instrument for office-level female share change using the interaction of the 1915 clerical workforce share and an indicator for war-related departments — a pre-determined source of variation in the capacity and demand for female clerical workers. The IV estimates are consistent with the OLS main specification: parental exposure to female co-workers closes the children’s gender LFP gap.

Q: What is the policy implication regarding public sector hiring? A: The paper suggests that increasing gender representation within public sector employment can have labor market implications that extend well beyond the organization itself — across generations through vertical intergenerational transmission and across the broader community through horizontal social spillovers. The findings imply that public sector diversity policies can serve as a lever for broader, persistent reductions in gender gaps in the private labor market.

Office-level exposure: The city-by-department measure of the change in female share of civil servants between 1915 and 1919, capturing the granular intensity of each workplace unit’s contact with wartime female entrants; the interquartile range across offices is approximately 10 percentage points.

Intergenerational gender gap in LFP: The difference in labor force participation rates between daughters and sons of incumbent civil servants measured in 1940 adulthood, used as the primary outcome to capture whether parental workplace exposure transmits to children’s labor supply decisions.

Vertical transmission: The intergenerational channel through which exposed parents — identified here primarily as fathers, including those without working wives — convey changed attitudes or information about female work to their children, closing the gender LFP gap.

Horizontal transmission: The community-level channel through which the increased presence of female civil servants in a city spreads changed norms or information about women’s work to women who are not daughters of exposed co-workers, operating through residential proximity and social networks such as women’s clubs.

Social multiplier: The amplification of the direct effect of hiring female workers through behavioral spillovers; the authors’ back-of-envelope calculation estimates that each additional female wartime civil service entrant generated approximately 2.4 additional women entering the workforce.

Formative years: The period of adolescence during which children are argued to be most malleable in forming preferences and beliefs; exposure effects in this paper are concentrated among children who were teenagers at the time of parental exposure, with older children showing little effect.

Source text origin: The authors’ classification of whether a summary is based on full working paper text (pdf or oa-html) vs. abstract only; in this workflow, abstract-only is a hard block for summary generation.

Coarse Bayesian Updating

Mon, 01 Jan 0001 00:00:00 +0000

This paper introduces and axiomatically characterizes Coarse Bayesian updating, a generalization of Bayes’ rule designed to accommodate the wide empirical evidence that individuals systematically deviate from standard Bayesian belief revision. The research question is: what is the minimal, tractable, axiomatically grounded generalization of Bayes’ rule that can accommodate heterogeneous non-Bayesian behaviors — including under-reaction, over-reaction, asymmetric updating, limited perception, and motivated reasoning — while remaining portable to standard economic settings?

The paper takes as primitive a finite state space Omega = {1, …, N} and an updating rule mu: S -> Delta assigning posterior beliefs to signals, where signals represent likelihood profiles from stochastic information structures. No data are used; the methodology is axiomatic decision theory combined with analysis of the model’s implications in static, dynamic, and decision-theoretic settings.

A Coarse Bayesian agent is characterized by (i) a partition of the probability simplex Delta into convex cells, and (ii) a representative distribution for each cell, one of which is the prior. Upon observing a signal, the agent determines which cell contains the Bayesian posterior and adopts the representative of that cell as his posterior belief. The agent need not point-identify the Bayesian posterior; he merely approximates it by identifying which cell it belongs to.

The central characterization result (Theorem 1) establishes that an updating rule has a Coarse Bayesian representation if and only if it satisfies three axioms: Homogeneity (beliefs depend only on likelihood ratios of the signal, not its scale), Cognizance (if two signals induce the same belief, then a garbled signal indicating one of them was generated also induces that belief), and Confirmation (if a signal is perfect evidence of some feasible belief, the agent adopts that belief). The representation — partition, representative points, and prior — is unique.

Proposition 1 shows that, under mild regularity conditions, strengthening any of the three axioms to an if-and-only-if form forces the agent to be perfectly Bayesian. This identifies the Coarse Bayesian framework as a qualitatively small but substantively rich departure from Bayes’ rule. The converse statements identify three necessary non-Bayesian behaviors exhibited by any proper Coarse Bayesian: (i) treating some signals as equivalent when a Bayesian would not; (ii) collapsing to a default belief when uncertain between two signals the agent would otherwise distinguish; (iii) false extrapolation — arriving at a belief via signals that are not perfect evidence of it.

In dynamic settings, Pooled Coarse Bayesian rules (which apply the full signal history at each period) are invariant to signal ordering and pooling and converge whenever Bayesian beliefs do, though to the representative point of the cell containing the true state rather than the true state itself. Sequential Signal Distortion rules are invariant to signal ordering but not pooling, and beliefs converge almost surely — but not necessarily to the true state (Example 1 illustrates convergence to the wrong state in a two-state setting). Sequential Coarse Bayesian rules need not satisfy either form of path-independence and need not converge at all.

In the decision-theoretic application (Section 4), a Coarse Bayesian’s value of information is posterior-separable and generally violates the Blackwell (1951) information ordering — more informative experiments need not be valued more highly. Two Coarse Bayesians are shown to be identical (same cells and representative points) if and only if they benefit from the same Blackwell improvements, providing a behavioral identification result. Agents with finer partitions are more sophisticated (higher ex-ante value of information), while agents with larger distortions from Bayesian posteriors are more biased (larger worst-case losses relative to a Bayesian). Neither greater sophistication nor lower bias implies being better off at all menus or signal realizations.

Q: What are the three axioms that characterize Coarse Bayesian updating, and what property of Bayes’ rule does each capture? A: Homogeneity requires that beliefs depend only on likelihood ratios of the signal — if two signals are proportional (s ~ t), they induce the same posterior. Cognizance requires that if two signals induce the same belief, then a garbled signal indicating that one of them was generated also induces that belief (mu_{s+t} = mu_s when mu_s = mu_t). Confirmation requires that if a signal is perfect evidence of some feasible belief — i.e., the Bayesian posterior at that signal equals a candidate belief — then the agent adopts that belief. Each axiom is satisfied by standard Bayesian updating.

Q: In what sense is Coarse Bayesian updating a “small” departure from Bayes’ rule? A: Proposition 1 establishes that strengthening any one of the three axioms to an if-and-only-if form forces the agent to be perfectly Bayesian. The converses are: (i) different likelihood ratios lead to different posteriors; (ii) if a garbled signal does not change beliefs, then the two signals must induce the same belief individually; (iii) if a signal induces the same posterior as another, then it must be perfect evidence of that posterior. Any Coarse Bayesian satisfying any one of these is in fact perfectly Bayesian, meaning the three axioms together come very close to fully characterizing Bayesian rationality.

Q: What non-Bayesian behaviors does the model generate as special cases? A: The framework generates under-reaction (representative points of cells close to the prior boundary), over-reaction (representative points at the far boundary), asymmetric updating (favoring one state, making upward revision easier than downward), limited perception (the agent retains the prior unless the Bayesian posterior is sufficiently far from the prior), extreme-belief aversion (the agent applies Bayes’ rule except when posteriors are near degenerate distributions), and reactions to unexpected news (non-Bayesian behavior only when signals have low prior probability). In each case the Coarse Bayesian Representation provides an axiomatic foundation via Axioms 1–3.

Q: What are the three necessary non-Bayesian behaviors exhibited by any proper (non-Bayesian) Coarse Bayesian? A: These follow from the negations of properties (i)-(iii) in Proposition 1. First, there exist signals s and t that are not proportional yet induce the same posterior — the agent treats informationally distinct signals as equivalent. Second, there exist signals s and t such that mu_s ≠ mu_t but mu_{s+t} = mu_s — signals the agent distinguishes individually collapse to a default when the agent is uncertain which one was generated. Third, there exist signals s and t with mu_s = mu_t where t is not perfect evidence of mu_s — a form of false extrapolation. Together, these three biases account for all non-Bayesian behavior the model generates.

Q: How does the model accommodate globally uniform biases like always-under-reaction, and how common does it predict such behavior to be? A: Global under-reaction requires representative points of cells to sit on their cell boundaries (as close to the prior as possible given the partition). This is a non-generic, hairline case — representative points generically lie in the interior of their cells, so a typical Coarse Bayesian under-reacts to some signals and over-reacts to others depending on which cell the Bayesian posterior falls into. The model additionally predicts local stability: if an agent over-reacts to signal s, nearby signals typically produce the same response; if an agent is Bayesian at s, nearby signals are almost surely also Bayesian.

Q: What does the model imply about dynamic updating under sequential signal-by-signal processing versus pooled processing? A: Pooled Coarse Bayesian rules apply the full signal history at each period, are invariant to both signal ordering and signal pooling, and converge almost surely whenever Bayesian beliefs converge — but to the representative point of the cell containing the true state, not necessarily the true state itself. Sequential Signal Distortion rules are invariant to signal ordering but not signal pooling, and also yield almost-sure convergence though potentially to the wrong state (Example 1 shows this for a two-state setting). Sequential Coarse Bayesian rules need not be invariant to either form of path-dependence and need not converge at all.

Q: How does the paper provide a behavioral identification of the model’s parameters? A: Theorem 1 establishes that the partition, representative points, and prior are uniquely determined by the agent’s updating rule alone — they are identifiable from observable updating behavior without additional assumptions. In the decision-theoretic setting of Section 4, a stronger result holds: two Coarse Bayesians are identical (same cells and same representative points) if and only if they benefit from the same Blackwell improvements across all menus (decision problems). This means the model’s parameters can be uniquely identified from menu-contingent rankings of Blackwell-comparable experiments.

Q: Does the Coarse Bayesian framework respect the Blackwell information ordering, and what characterizes when Blackwell improvements are beneficial? A: Unlike Bayesians, Coarse Bayesians typically violate the Blackwell ordering — they need not assign higher ex-ante value to more informative experiments. The paper characterizes the menus (decision problems) for which a given Coarse Bayesian benefits from Blackwell improvements, and shows this characterization runs deep: the complete set of such menus fully identifies the agent’s representation.

Q: How do the sophistication and bias orderings relate to welfare? A: An agent is more sophisticated if he employs a finer partition; more-sophisticated agents have a higher ex-ante value of information. An agent is more biased if his updating rule exhibits larger distortions from Bayesian posteriors; greater bias is characterized by greater worst-case losses relative to a Bayesian. Crucially, neither greater sophistication nor lower bias implies the agent is better off at all menus or signal realizations — welfare improvements require the agent to be perfectly Bayesian on a strictly larger set of signal realizations, giving rise to a third ordering that jointly refines the other two.

Q: How does the model relate to Wilson (2014) and Ortoleva (2012)? A: Wilson (2014) studies optimal updating for a boundedly rational agent with K memory states over binary decisions: each memory state is associated with a convex set of posteriors and a representative, so the optimal protocol is a dynamic Coarse Bayesian updating procedure. However, Wilson’s parameters are endogenous (determined by signal structure, stakes, and the bound K), whereas Coarse Bayesian updating does not require optimality or a bound on the number of cells — the model can accommodate behavior (e.g., Bayesian updating except at “extreme” signals) that Wilson’s model cannot. Ortoleva’s (2012) Hypothesis Testing model applies Bayes’ rule when the prior probability of a signal exceeds a threshold epsilon and otherwise uses a maximum-likelihood criterion; Coarse Bayesian updating can accommodate similar behavior, and the paper shows that Coarse Bayesian rules can be expressed as Maximum-Likelihood rules when there are only two states, but neither class subsumes the other in general — Maximum-Likelihood rules may violate Confirmation.

Q: What are the main limitations of the Coarse Bayesian framework? A: The paper identifies four. First, only likelihood ratios of the realized signal matter — sensitivity to framing and extraneous environmental features are ruled out. Second, beliefs must be probability distributions, so phenomena like the conjunction fallacy (where subjects assign higher probability to a conjunction than a component event) are outside the model’s scope. Third, the model exhibits discontinuities when signal perturbations move the Bayesian posterior across a cell boundary — a feature shared with Wilson (2014), Ortoleva (2012), and related models. Fourth, cells must be convex (driven by Cognizance); dropping Cognizance allows non-convex cells but removes the normative foundation that agents correctly forecast their own updating behavior.

Coarse Bayesian Representation: A pair consisting of a partition P of the probability simplex Delta into convex cells and a profile of representative distributions (one per cell, including the prior), such that the agent’s posterior after observing signal s equals the representative of the cell containing the Bayesian posterior B(mu_e|s).

Homogeneity: The axiom that if two signals are proportional (s ~ t, meaning s = lambda*t for some lambda > 0), they induce the same posterior belief — updating depends only on likelihood ratios, not signal scale.

Cognizance: The axiom that if signals s and t induce the same posterior, then the garbled signal s+t (indicating that either s or t was generated) also induces that belief — the agent correctly forecasts his own updating behavior.

Confirmation: The axiom that if a signal constitutes perfect evidence of some feasible belief (i.e., the Bayesian posterior equals a candidate belief), the agent adopts that belief — candidate beliefs are adopted when the signal confirms them exactly.

Signal Distortion Representation: An equivalent representation of Coarse Bayesian behavior as a function d: S -> S that distorts signals before Bayesian updating is applied (mu_s = B(mu_e|d(s))), satisfying properties analogous to the three axioms; equivalent to the partition representation in static settings but distinct in dynamic settings.

Blackwell Information Ordering: The partial order on experiments under which sigma is more informative than sigma’ if sigma can be obtained from sigma’ by a garbling; Bayesians always weakly prefer more informative experiments in this ordering, but Coarse Bayesians typically do not.

Sophistication Ordering: The partial order under which one Coarse Bayesian is more sophisticated than another if he employs a finer partition; more-sophisticated agents exhibit greater responsiveness to information as measured by ex-ante value of information.

Bias Ordering: The partial order under which one Coarse Bayesian is more biased than another if his updating rule exhibits larger distortions away from Bayesian posteriors; greater bias is characterized by larger worst-case losses relative to a Bayesian benchmark.

Collusion with Optimal Information Disclosure

Mon, 01 Jan 0001 00:00:00 +0000

This paper asks how a third-party intermediary (an “algorithm”) that observes market demand or costs superior to competing firms should optimally disclose that information to maximize the firms’ collusive profit in a repeated Bertrand competition setting. The motivation is the rise of algorithmic pricing intermediaries such as RealPage in apartment rentals, A2i Systems in retail gasoline, and Rainmaker in hotel rooms, as well as offline cartel facilitators like AC-Treuhand.

The model extends the canonical Rotemberg–Saloner (1986) repeated Bertrand framework with stochastic demand. The key technical assumption is that firm profit is affine in the unknown state s, so expected profit depends only on the expected state. This holds for binary states, linear demand with unknown intercept (D(p,s) = s − p), and linear demand with unknown per-unit cost. The algorithm observes s and commits to a known disclosure policy mapping s to a public signal. The solution concept is pure-strategy subgame-perfect equilibrium, and the paper solves for the disclosure policy and equilibrium that jointly maximize collusive profit.

The main result (Theorem 1) is that the unique optimal disclosure policy is upper censorship: there is a cutoff ŝ such that demand states s < ŝ are disclosed and result in the corresponding monopoly price p^m(s), while demand states s ≥ ŝ are pooled — only the event {s ≥ ŝ} is disclosed — and result in the monopoly price for the mean concealed state, p^m(s*), where s* = E[s | s ≥ ŝ]. The reduction to a static information design problem (Lemma 1) is the key technical step: optimal collusive profit equals V*, the greatest fixed point of V = max_{G ∈ MPC(F)} E_G[min{π^m(s), δV/((1−δ)(n−1))}]. The “capped monopoly profit” min{π^m(s), π^max} is convex-then-concave in s, and classical results from the static information design literature (Kolotilin 2018; Dworczak and Martini 2019) then imply upper censorship is uniquely optimal.

Two features of the optimal equilibrium are notable. First, prices are rigid (constant at p^m(s*)) whenever s ≥ ŝ — the opposite of Rotemberg–Saloner’s “price wars during booms.” The logic is that pooling high demand states with a lower average state is more profitable than cutting prices, because pooling reduces the current-period deviation gain without sacrificing as much on-path profit. Second, for demand states s ∈ (ŝ, s*), the equilibrium price p^m(s*) exceeds the monopoly price p^m(s) — supra-monopoly pricing occurs for a range of intermediate states. Monopoly pricing is attainable at each such state in isolation, but recommending the higher price p^m(s*) is necessary to make the pooling incentive-compatible at states s > s*.

Comparing to full disclosure, Proposition 1 shows that optimal disclosure leads to strictly higher prices at every demand state, and hence unambiguously lower consumer surplus. Proposition 3 shows that improving the algorithm’s accuracy (a mean-preserving spread of F) reduces expected consumer surplus whenever consumer surplus under monopoly pricing is concave in s — a natural condition. This result is more pessimistic than prior work (Sugaya–Wolitzky 2018; Miklos-Thal–Tucker 2019), which found ambiguous effects because those papers assumed full disclosure.

Comparative statics (Proposition 2): fewer firms or a higher discount factor δ increases collusive profit V* and makes prices more flexible (raises ŝ). Collusion is impossible if and only if δ < (n−1)/n, the same threshold as under full disclosure.

Extensions maintain the core results. With Markov (persistent) demand (Section 4 / Theorem 2), upper censorship remains optimal but the cutoff ŝ(s) depends on last-period demand s: under positive serial correlation, ŝ(s) is decreasing in s, so the algorithm discloses less information following high demand. With differentiated products under a symmetric linear demand system (Section 5 / Theorem 3), the optimal policy censors an intermediate interval [ŝ_L, ŝ_H] and discloses both the lowest and highest demand states, because at high states the absence of an upper bound on equilibrium profit makes disclosure with price-cutting optimal.

Q: What is the core research question and why is it policy-relevant? A: The paper asks how an informed intermediary should optimally disclose demand or cost information to competing firms to maximize their collusive profit. It is directly motivated by antitrust cases against RealPage (sued by the US DOJ in August 2024), A2i Systems/Kalibrate, and Rainmaker, all of which gather market data from competing firms and recommend prices. The theory also applies to offline facilitators like AC-Treuhand, prosecuted by the European Commission for disclosing competitively sensitive information.

Q: What is the affinity assumption and why does it matter? A: The paper assumes that firm profit π(p, s) is affine (linearly increasing) in the demand or cost state s for each price p. This implies that expected profit for any distribution over states equals profit evaluated at the expected state: E[π(p,s)] = π(p, E[s]). As a consequence, any disclosure policy is equivalent, from a profit standpoint, to choosing a distribution G of the firms’ posterior mean beliefs over s, and G must be a mean-preserving contraction of the prior F (by Blackwell 1953). The assumption is satisfied for binary states, linear demand with unknown intercept, and linear demand with unknown cost.

Q: What is the key reduction result (Lemma 1) and what does it achieve? A: Lemma 1 reduces the problem of finding an optimal repeated-game equilibrium to a static information design problem. Optimal collusive profit equals V*, the greatest fixed point of V = max_{G ∈ MPC(F)} E_G[min{π^m(s), δV/((1−δ)(n−1))}], and this is attained by a symmetric, stationary, grim-trigger equilibrium. The reduction works because, under Bertrand competition, static deviation gains are proportional to on-path payoffs, creating a one-to-one correspondence that allows the repeated-game constraint to be folded into a single-period objective.

Q: Why is upper censorship the uniquely optimal disclosure policy? A: The static information design problem has a “capped monopoly profit” objective: min{π^m(s), π^max}, where π^max = δV*/((1−δ)(n−1)) is the maximum per-period profit that satisfies incentive constraints. Because π^m(s) is convex (as the maximum of affine functions) and the cap π^max is constant, the overall objective is convex for s below the cap and constant (then concave) above it — i.e., convex-then-concave in s. Classical results for linear information design (Kolotilin 2018; Dworczak and Martini 2019) imply that the unique optimal policy for a convex-then-concave objective is upper censorship.

Q: What is the supra-monopoly pricing result and why does it arise? A: For demand states s ∈ (ŝ, s*), the equilibrium price is p^m(s*) > p^m(s), meaning firms charge above the monopoly price for the current state. This arises because the pooling policy must recommend a single price for all states s ≥ ŝ, and the recommended price is p^m(s*) where s* = E[s | s ≥ ŝ]. At intermediate states s ∈ (ŝ, s*), this price exceeds the local monopoly price. The algorithm accepts lower profit at these states because it is necessary to maintain the pooled recommendation at higher states where monopoly pricing would otherwise require a price cut.

Q: How does optimal disclosure compare to full disclosure in terms of consumer surplus? A: Proposition 1 shows that collusive prices under optimal disclosure are strictly higher at every demand state compared to full disclosure (Rotemberg–Saloner). In Rotemberg–Saloner, high demand states trigger price cuts (“price wars during booms”) to deter deviation; under optimal disclosure, high states are pooled and prices are instead rigid at p^m(s*). Because prices are higher at all states, consumer surplus is unambiguously lower under optimal disclosure.

Q: What does Proposition 3 say about the effect of algorithmic accuracy on consumer surplus? A: Proposition 3 states that if consumer surplus under monopoly pricing, CS(s), is concave in s, then a mean-preserving spread of F (i.e., improved algorithmic accuracy) reduces expected consumer surplus. This result is more pessimistic than prior work by Sugaya–Wolitzky (2018) and Miklos-Thal–Tucker (2019), which found ambiguous effects. The difference is that those papers assumed full disclosure, so better accuracy tightened incentive constraints and sometimes forced price cuts. Under optimal selective disclosure, a more accurate algorithm always raises average prices because the algorithm withholds information that would have forced price cuts.

Q: What are the comparative statics with respect to the number of firms and the discount factor? A: Proposition 2 establishes that a decrease in the number of firms n or an increase in the discount factor δ increases collusive profit V* and makes collusive prices more flexible (raises ŝ). The intuition for fewer firms making prices more flexible is that with fewer firms, incentive constraints bind for a narrower range of demand states, so less pooling is needed. Collusion is impossible if and only if δ < (n−1)/n, the same threshold as under full disclosure.

Q: How does the model generate empirically testable predictions distinct from other collusion models? A: The model predicts: (1) the equilibrium price distribution has support on an interval [p^m(s_bar), p^m(ŝ)] plus a single mass point at the higher price p^m(s*); (2) prices are pro-cyclical overall but rigidly fixed at p^m(s*) for all but the lowest demand states; (3) the gap p^m(s) − p(s) is non-monotone — zero at low states, negative (supra-monopoly) at intermediate states, and positive at high states; (4) prices are more flexible when firms are more patient or fewer. The rigid high price combined with a flexible interval of lower prices is described as a distinctive collusive marker not present in other models.

Q: How does the model relate to the empirical literature testing Green–Porter versus Rotemberg–Saloner? A: Rotemberg–Saloner predicts counter-cyclical prices (price wars during booms), while Green–Porter predicts pro-cyclical prices. Empirical tests (e.g., Porter 1983, Ellison 1994) have typically found pro-cyclical prices, favoring Green–Porter. The present model generates pro-cyclical prices through a different mechanism — perfect monitoring plus selectively disclosed demand information — showing that pro-cyclical prices are consistent with perfect monitoring when the information intermediary optimally pools high demand states. The paper suggests that distinguishing the theories requires estimating the gap between price and monopoly price over the cycle: under Green–Porter, collusion succeeds better in high demand states; under this model, collusion succeeds better in low demand states.

Q: What narrative evidence from the RealPage case corroborates the model’s predictions? A: The US DOJ complaint against RealPage states that “in down markets… [RealPage] instills pricing discipline in landlords, curbing normal fully independent competitive reactions by substituting them with interdependent decision-making,” and that RealPage advertised that its AI helps clients “avoid the race to the bottom in down markets.” This is consistent with the model’s prediction of flexible monopoly prices at low demand states and a rigid, supra-monopolistic price in normal times. The Kumatori Contractors Cooperative case (studied by Kawai, Nakabayashi, and Ortner 2024) corroborates the censorship result: that organization took drastic steps to limit bidders’ information about costs on the largest projects — exactly the states where deviation is most tempting.

Q: How do results change with persistent (Markov) demand? A: Theorem 2 shows that upper censorship remains uniquely optimal with Markov demand, but the cutoff ŝ(s) now depends on last-period demand s. Under positive serial correlation, ŝ(s) is decreasing in s: the algorithm discloses less information after high demand because firms are more optimistic and thus more tempted to deviate. Under negative serial correlation, ŝ(s) is increasing. The optimal collusive price is no longer always equal to the monopoly price for the disclosed mean demand, and the expected price conditional on last-period demand can be countercyclical (similar to Rotemberg–Saloner), even though the current-period price is always monotone in current demand.

Q: How does the optimal disclosure policy change with differentiated products? A: With a symmetric linear demand system (Section 5, Theorem 3), the optimal policy censors an intermediate interval [ŝ_L, ŝ_H] and discloses both the lowest and the highest demand states. At high demand states s > ŝ_H, the algorithm discloses the state and recommends a price below monopoly (to satisfy incentive constraints), because with differentiated goods there is no upper bound on equilibrium profit and profit is convex in s at high states, making disclosure with price-cutting optimal. Mathematically, the capped monopoly profit is piecewise-convex rather than convex-then-concave, so the optimal policy is intermediate-interval censorship rather than upper censorship. The Appendix A version extends to general demand systems and capacity constraints with the same qualitative logic.

Q: What are the main limitations and directions for future work acknowledged by the authors? A: The paper identifies three main limitations. First, if profit is not affine in s (i.e., expected profit depends on more than the mean state), the information design problem becomes non-linear and upper censorship is typically suboptimal, though it remains approximately optimal when the problem is close to linear. Second, the model assumes the algorithm’s objective is to maximize industry profit; if the intermediary is a profit-maximizing seller of software (as in Harrington 2022), the objective may instead be to maximize the profit differential between adopters and non-adopters. Third, the model assumes all firms use the algorithm; allowing partial adoption would require modeling firms’ incentives to subscribe. The paper notes that incorporating these considerations “could be an interesting direction for future research.”

Upper Censorship (disclosure policy): A disclosure policy in which demand states below a cutoff ŝ are revealed to firms (along with the corresponding monopoly price recommendation), while states above ŝ are pooled — only the event {s ≥ ŝ} is disclosed — with a single monopoly price recommendation p^m(s*) for the mean concealed state s* = E[s | s ≥ ŝ]. This is the uniquely optimal disclosure policy in the baseline model.

Capped Monopoly Profit: The per-period profit objective in the reduced static information design problem: min{π^m(s), π^max}, where π^max = δV*/((1−δ)(n−1)) is the maximum industry profit attainable in a single period without violating incentive constraints. This function is convex-then-concave in s, which drives the optimality of upper censorship.

Supra-Monopoly Pricing: Equilibrium prices that exceed the monopoly price for the realized demand state. In the model, this occurs for states s ∈ (ŝ, s*), where the algorithm’s pooled recommendation p^m(s*) is above the local monopoly price p^m(s). It arises because the pooled recommendation must be incentive-compatible at the highest concealed states.

Price Rigidity: The feature of the optimal equilibrium in which the collusive price is constant at p^m(s*) for all demand states s ≥ ŝ. The algorithm achieves this by withholding information about high demand states, preventing the “price wars during booms” predicted by Rotemberg–Saloner (1986) under full disclosure.

Algorithmic Accuracy: In the paper’s terms, the informativeness of the algorithm’s signal about s, formalized as the precision of the distribution F. Improving accuracy corresponds to a mean-preserving spread of F (Blackwell 1953). A more accurate algorithm always increases collusive profit; under the concavity condition on consumer surplus, it also reduces expected consumer surplus.

Mean-Preserving Contraction (MPC(F)): The set of distributions G of firms’ posterior mean beliefs over s that are consistent with Bayesian updating of the prior F. By Blackwell (1953), a disclosure policy is feasible if and only if it induces a distribution G ∈ MPC(F). This is the feasibility constraint in the static information design problem.

Affinity in the state: The assumption that π(p, s) is affine (linearly increasing) in s for each price p. This implies E[π(p,s)] = π(p, E[s]), so expected profit is determined entirely by the expected state, enabling the reduction of the disclosure problem to choosing a distribution of posterior means.

Competing under Information Heterogeneity: Evidence from Auto Insurance

Mon, 01 Jan 0001 00:00:00 +0000

This paper studies imperfect competition in selection markets where competing firms have heterogeneous information about consumers — a layer of asymmetry distinct from the classic buyer-seller information gap. The central questions are: how do inter-firm information asymmetries shape equilibrium pricing, consumer sorting, and market efficiency; and whether a centralized bureau that aggregates and equalizes firms’ risk information can promote competition and improve welfare.

The empirical setting is the Italian mandatory motor vehicle liability insurance market (Responsabilità Civile Auto). The authors use the IPER dataset from IVASS, a nationally representative panel of matched insurer-insuree contracts covering 124,428 liability insurance contracts for new customers in the province of Rome from 2013 to 2021. The panel tracks consumers across insurer switches, enabling construction of individual-specific risk estimates from ex-post claim records using Poisson regressions for claim frequency and log-normal regressions for claim severity. The analysis focuses on the top 10 largest firms plus a composite fringe firm.

The paper’s empirical strategy proceeds in three stages. First, individual risk types are estimated from multi-year claim panels. Second, demand parameters — price sensitivity and firm-level unobserved product attributes — are recovered using a novel fixed-point algorithm (extending Berry et al. 1995) that infers the full offered-price distribution from observed transaction prices alone, without parametric restrictions on price distributions across firms. Third, supply-side parameters — pricing coefficients, signal variances, and cost parameters — are identified by exploiting the monotone mapping between offered prices and private signals, borrowing from the nonparametric auction literature.

The model features firms that each draw a private Gaussian signal about a consumer’s true risk type theta, with firm-specific signal standard deviation sigma_j. Lower sigma_j means higher information precision. Firms set prices as a linear function of their posterior risk rating: p_j = alpha_j + beta_j * E(theta | theta_j, D=j). Firms simultaneously choose pricing coefficients to maximize expected profits.

Key empirical findings: (1) Firms differ substantially in how sensitively their premiums respond to realized consumer risk — a reduced-form measure of information precision — with Figure 2 showing wide cross-firm variation in premium-to-risk coefficients. (2) Structural estimation confirms substantial heterogeneity in signal standard deviations sigma_j across all 11 firms. Firms with less accurate risk-rating algorithms (higher sigma_j) tend to have more efficient cost structures (lower claim-processing cost parameter k_j), generating distinct comparative advantages. (3) Baseline pricing coefficients alpha_j and risk-sensitivity coefficients beta_j vary dramatically across firms. (4) Senior drivers are less price sensitive; urban drivers are more price sensitive. Lower-risk consumers show stronger preferences for Firms 3 and 5, while higher-risk consumers disproportionately choose Firm 8.

Counterfactual simulations assess three information policies relative to the baseline. Under a centralized risk bureau — which collects each firm’s signal, aggregates them weighted by precision, and distributes the combined signal equally — average premiums fall by 21.6% and consumer surplus rises by 15.7%. The efficiency benchmark (firms observe true risk perfectly) yields a 25.7% premium reduction and a 16.9% consumer surplus gain, so the bureau recovers almost all the efficiency gap. The privacy benchmark (all firms restricted to the coarsest signal in the market) raises surplus for high-risk consumers by 6.9% but harms low-risk consumers.

The bureau’s price reduction operates through two channels: it eliminates the market power that accrues to firms with superior private information, and it aligns firms’ risk evaluations, enabling sharper undercutting. The bureau also reduces average costs by 12 euros per contract by enabling more efficient insurer-insuree matching — cost-efficient claim processors can better target the consumer types they have a comparative advantage in serving.

The analysis is confined to new customers in Rome’s provincial market to avoid complications from dynamic pricing and consumer-firm learning. The model abstracts away from optional contract clauses (treated as observable characteristics) and does not model the specific mechanisms generating information heterogeneity.

Q: What is the paper’s core research question? A: The paper asks how information asymmetries between competing firms (not just between buyers and sellers) shape equilibrium pricing strategies, consumer sorting, and market efficiency in a selection market, and whether a centralized bureau that equalizes firms’ access to aggregated risk information can improve competition and welfare. This extends the classic Akerlof-Rothschild-Stiglitz framework by introducing a second layer of asymmetry — across sellers themselves.

Q: Why is the Italian auto insurance market well suited for this study? A: Italy mandates liability insurance for all drivers and prohibits rejections, so the analysis focuses entirely on how consumers sort across insurers rather than on participation margins. The IPER dataset from IVASS is a nationally representative panel tracking policyholders even across insurer switches, providing both premium and ex-post claim records needed to construct individual risk types. The market has roughly 50 competing firms using demonstrably heterogeneous pricing algorithms, documented through a survey of major insurers and reduced-form regressions.

Q: How do the authors measure firm-level information precision in the reduced-form analysis? A: They estimate individual-specific risk types from a panel of claim records using Poisson regressions (claim frequency) and log-normal regressions (claim severity), then regress each firm’s premiums on those estimated risk measures. Firms whose premiums respond more sensitively to realized risk are inferred to have higher information precision. Figure 2 shows that these premium-to-risk coefficients vary significantly across firms — for example, Firm 7’s premiums are considerably more sensitive to risk than Firm 8’s — providing reduced-form evidence of heterogeneous information precision before any structural estimation.

Q: What is the structural model’s signal structure? A: Each firm j draws a private signal theta_j ~ N(theta, sigma_j^2) about a consumer’s true risk type theta, where sigma_j is the firm-specific signal standard deviation. A smaller sigma_j means higher precision. Signals are independent across firms conditional on theta, analogous to common-value auctions where firms receive noisy estimates of a shared unknown value (expected claim payouts). The parameter sigma_j is the key structural object the paper identifies and estimates.

Q: What is novel about the demand estimation strategy? A: Standard demand estimation assumes the same price is offered to all consumers or that the full price menu is observed. Here, only transaction prices are observed — the prices of unchosen insurers are not in the data. The authors apply the Wu and Xin (2024) fixed-point algorithm, which jointly estimates consumers’ sorting probabilities, offered price distributions, and demand parameters by adding an outer loop over sorting propensities to the Berry (1994) contraction mapping. No parametric restrictions are imposed on the offered price distributions, and they are allowed to vary fully across firms.

Q: How are firms’ signal variances identified separately from pricing coefficients? A: There is a one-to-one mapping between a firm’s offered price and its signal (prices increase monotonically in the signal, analogous to bids in auctions). After recovering the offered price distribution from the demand step, the authors observe price dispersion at a fixed risk level. By focusing on average prices conditional on each risk level, signal noise averages out, identifying the pricing coefficients beta_j. The residual price dispersion at fixed risk then identifies signal variance sigma_j^2.

Q: What does structural estimation reveal about the relationship between information precision and cost efficiency? A: Firms with higher signal standard deviations (less precise risk evaluation) tend to have lower claim-processing cost parameters k_j — they are more efficient at handling claims. This creates distinct comparative advantages: some firms excel at risk identification but face higher processing costs, while others process claims cheaply but evaluate risk less precisely. This heterogeneity means information-equalizing policies have differentiated firm-level impacts.

Q: What are the quantitative effects of the centralized risk bureau on premiums and consumer surplus? A: The bureau reduces average premiums by 21.6% relative to baseline and increases consumer surplus by 15.7%. The efficiency benchmark — where firms observe consumers’ true risk perfectly — produces a 25.7% premium reduction and a 16.9% consumer surplus gain. The bureau therefore closes nearly all of the gap to the first-best allocation in surplus terms (15.7% vs. 16.9%).

Q: Through what mechanisms does the bureau reduce prices? A: Two distinct channels are identified. First, equalizing information precision eliminates the informational market power held by firms with superior signals, compelling them to compete more aggressively on price. Second, when all firms share the same risk evaluation of a consumer, they can undercut each other more precisely, which intensifies price competition further. Both channels operate simultaneously under the bureau.

Q: How does the bureau affect consumer surplus distribution across risk types? A: The bureau primarily benefits low-risk consumers because improved information allows firms to price discriminate more accurately on risk type, lowering prices for those who are low risk. High-risk consumers see smaller benefits and may face relatively higher premiums. This contrasts with the privacy benchmark, where restricting all firms to the coarsest signal in the market raises high-risk consumers’ surplus by 6.9% — because it becomes harder for firms to distinguish them from low-risk consumers.

Q: What is the cost efficiency effect of the bureau? A: Under the centralized risk bureau, average costs per contract fall by 12 euros. This reflects more efficient insurer-insuree matching: when firms have equal and better information, those with cost advantages in claims processing can better identify and attract the consumer types they are relatively best equipped to serve. The authors note that given the scale of the Italian auto insurance market (approximately 31 million contracts annually), this per-contract saving implies a substantial aggregate impact.

Q: What happens to firm profits under the bureau, and is the impact uniform? A: Average profits decline overall due to lower prices. However, the impact is heterogeneous across firms. Firms that rely most heavily on superior information precision — often smaller, more specialized firms — experience greater profit losses, since the bureau most directly erodes their competitive advantage.

Q: How does the privacy benchmark differ from the bureau scenario? A: The privacy benchmark simulates a regulation that restricts all firms to using only basic consumer information, setting signal variance to the highest level observed in the market. Unlike the bureau (which improves and equalizes information), this benchmark degrades information uniformly. It produces opposite distributional effects: high-risk consumers gain 6.9% in surplus as cross-subsidization from low-risk to high-risk consumers increases, while low-risk consumers are worse off.

Q: Why does the paper focus on new customers only? A: Focusing on new customers avoids complications from dynamic pricing, where insurers update premiums based on accumulated claim history with a specific consumer, and from consumer-firm learning dynamics. This follows standard practice in the empirical asymmetric information literature, as cited in Chiappori and Salanie (2000) and Crawford et al. (2018).

Q: How does this paper relate to and extend prior work on selection markets? A: Prior empirical work on imperfect competition in selection markets — including Einav et al. (2010), Crawford et al. (2018), and related studies — assumes that competing firms have symmetric information about consumers. This paper is described as introducing the first tractable empirical framework for analyzing selection markets where firms have heterogeneous information. It also incorporates multidimensional cost heterogeneity on the supply side, adding to work by Salanié (2017) and Nelson (2025).

Q: What do the reduced-form regressions reveal about pricing heterogeneity across insurers? A: Firm-level regressions of premiums on observable risk factors show R-squared values ranging from 0.39 to 0.59. Estimated coefficients on key risk factors vary dramatically: being one year older reduces premiums by 0.25 to 1.68 euros depending on the firm; a higher bonus-malus class increases premiums by 12 to 32 euros; one additional accident in the previous five years raises premiums by 74 to 181 euros. These ranges reflect genuine differences in actuarial algorithms, not just sampling variation.

Q: What is the bonus-malus system and why does its saturation matter for the paper’s setting? A: Italy’s bonus-malus (BM) system assigns drivers to one of 18 risk classes based on accident history. Because approximately 80% of policyholders are in the best class (BM class 1), the public BM system provides limited granularity for risk evaluation. This saturation creates strong incentives for firms to develop proprietary risk-rating algorithms, which is the institutional basis for the substantial information heterogeneity that the paper documents and models.

Information Precision (sigma_j): In the paper’s model, the firm-specific parameter measuring the dispersion of a firm’s private signal about a consumer’s true risk type. Firm j draws signal theta_j ~ N(theta, sigma_j^2); 1/sigma_j is information precision. A smaller sigma_j means the firm more accurately identifies consumer risk. This is not merely a theoretical construct — the paper identifies and estimates sigma_j structurally for each of the 11 firms.

Heterogeneous Information: The condition where competing firms hold signals of different precision about the same consumer’s unobserved risk type, introducing asymmetry not just between buyers and sellers (as in Akerlof 1970) but among sellers themselves. This is the paper’s central departure from prior literature on selection markets, which assumed symmetric information among firms.

Centralized Risk Bureau: A policy institution that collects each firm’s analyzed risk signal, aggregates them weighted by each firm’s information precision (producing a combined signal more precise than any individual firm’s signal), and makes the aggregated information equally accessible to all firms. The bureau is the paper’s primary policy counterfactual, and it is modeled as equalizing both the level and heterogeneity of information precision across competitors.

Offered vs. Accepted Price Distribution: A distinction central to the paper’s identification strategy. The accepted price distribution is what is observed in transaction data — prices conditional on the consumer having chosen that firm. The offered price distribution is the full set of prices the firm would charge across all consumers, including those who did not select it. The paper recovers the offered distribution from the accepted distribution using a fixed-point algorithm, without imposing parametric restrictions.

Selection Loop: The paper’s methodological extension of the Berry (1994) BLP contraction mapping for mean utilities. An outer loop iterates over consumers’ sorting propensities to jointly recover offered price distributions, sorting probabilities, and demand parameters when only transaction prices are observed. This technique handles the endogeneity of which prices are accepted.

Risk Rating: The firm’s posterior assessment of a consumer’s expected cost, computed as the posterior mean E(theta | theta_j, D=j) — the expected true risk type conditional on the firm’s private signal and the consumer selecting that firm. Firms set prices as a linear function of their risk rating: p_j = alpha_j + beta_j * E(theta | theta_j, D=j).

Comparative Advantage (information vs. cost): The paper’s finding that firms with lower information precision (higher sigma_j) tend to have more efficient cost structures (lower k_j), and vice versa. This cross-sectional negative correlation between information advantage and cost advantage means that policy interventions that equalize information precision shift the basis of competition from information asymmetry to cost specialization.

Competition in a Spatially-Differentiated Product Market with Negotiated Prices

Mon, 01 Jan 0001 00:00:00 +0000

Research Question

How does individually negotiated pricing — where buyers make discrete choices among differentiated products and negotiate transaction-specific prices — affect market power and merger effects in oligopoly markets, and how do these effects differ from the uniform-pricing benchmark?

Data and Setting

The paper estimates the model using 13,788 transactions between the four main UK brick manufacturers and national house-building firms over 2003–2006. For each transaction (defined as a unique buyer-variety-destination-year combination), the data record the chosen product, negotiated price, production and delivery locations, volume, transport costs, and brick characteristics. The market is highly concentrated: four manufacturers held an 85% share of brick sales, with a two-firm concentration ratio of 0.60 and an HHI of 2,113. Spatial differentiation is a central feature — transport costs vary substantially by project location, and prices for the same brick product vary across the different projects of the same buyer depending on local competitive conditions.

Model

The paper develops an empirical model that adapts the Berry, Levinsohn, and Pakes (1995) differentiated-products framework to individually negotiated pricing. In the model, each buyer negotiates simultaneously and bilaterally with the sellers of the first-best and runner-up products (defined by surplus — value minus cost). The equilibrium first-best markup equals the minimum of (i) the unconstrained Nash bargaining solution, bj(wj(1) − w0), and (ii) the first-best seller’s surplus advantage over the runner-up, (wj(1) − wj(2)). Runner-up and lower-ranked sellers earn zero markups in equilibrium. This outcome is shown to be consistent with a range of non-cooperative bargaining models (Binmore 1985, Bolton and Whinston 1993, Manea 2018) and lies in the core of the associated coalition game. The TIOLI posted-price model is nested as the special case where seller bargaining skill equals one. A tractable likelihood for the joint probability of observed product choice and negotiated price is derived under the assumption that idiosyncratic taste terms follow a Generalized Extreme Value (GEV) distribution.

Main Findings

The estimated mean seller bargaining skill is b̄ = 0.41 (s.e. 0.03), and a likelihood ratio test rejects the TIOLI restriction with a chi-squared statistic of 847 (p < 0.001), confirming that buyer bargaining power is economically and statistically significant. The model-implied price-cost margins (Lerner index) are low on average — mean of 0.08 — but vary widely across transactions (coefficient of variation of 0.78). Project location matters: sellers extract higher margins from buyers that are relatively close, taking advantage of their transport-cost proximity. Multi-product ownership also affects markups, but its relevance varies by project.

Switching from negotiated to uniform pricing raises average markups by 34% at the observed market structure. However, effects are heterogeneous: approximately 15% of transactions see markup decreases. Buyers who benefit from uniform pricing are those with relatively little runner-up competition — precisely the buyers who face weak bargaining positions under negotiated pricing, and for whom the seller’s ability to use that position is constrained under a uniform rule.

Under negotiated pricing, a merger affects a transaction’s markup only if it brings the first-best and runner-up products for that transaction under joint ownership. A demerger to single-product manufacturers reduces total manufacturer surplus by 25%. The merger of the two largest firms increases total manufacturer surplus by 19%, but with highly unequal transaction-level effects. Comparing the same mergers across pricing regimes, negotiated pricing abates average markup-increasing merger effects but worsens them for a minority of transactions — those where the merger creates a first-best/runner-up pairing.

Scope Conditions

The model applies to complete-information settings where prices are negotiated transaction-by-transaction, buyers single-source for each discrete purchase occasion, and sellers have multiple spatially differentiated products. It is most directly applicable to business-to-business markets where individual transaction values are large enough to justify project-level negotiation.

Q: What is the fundamental difference between negotiated pricing in this paper and the standard Nash-in-Nash (NiN) bargaining framework? A: In standard NiN (Horn and Wolinsky 1988), a buyer negotiates one price per product and trades positive quantities of all products with negotiated prices, so all negotiated prices are observed in transaction data. In this paper, buyers make discrete single-sourcing choices — each project uses exactly one product — so only the chosen product’s price appears in data; the runner-up product and its counterfactual price are unobserved. Additionally, under NiN, prices are set at the buyer level and apply uniformly to all the buyer’s needs, whereas here prices are negotiated separately for each project, generating intra-buyer cross-project price variation.

Q: What is the equilibrium markup formula, and what determines whether the Nash bargaining solution or the TIOLI constraint binds? A: The equilibrium first-best markup is ρ*j(1) = min[bj(1)(wj(1) − w0), (wj(1) − wj(2))], the minimum of the unconstrained Nash bargaining solution and the first-best seller’s surplus advantage over the runner-up. The TIOLI constraint (surplus advantage) binds when the seller’s bargaining skill is sufficiently high that the unconstrained NBS would exceed the surplus advantage — that is, when bj(1)(wj(1) − w0) > (wj(1) − wj(2)). Runner-up and all lower-ranked sellers earn zero markups in equilibrium because competition from the first-best drives their outside-option constraint to bind.

Q: Why do third-best and lower-ranked sellers have no effect on equilibrium outcomes? A: Because the most attractive offer any seller below the runner-up could make is a zero markup, and the runner-up already offers a zero markup due to competition from the first-best. Since the runner-up at zero markup already offers the buyer at least as much utility as any third-best product, the third-best cannot improve the buyer’s position. Proposition 1 (part iii) shows that the equilibrium markup and choice are invariant to N for N in {2, …, N̄}.

Q: How does the paper address the econometric challenge that the runner-up product and its price are unobserved? A: The paper derives a tractable closed-form likelihood for the joint probability of the observed product choice and the observed negotiated price, integrating out the unobserved idiosyncratic taste terms along with their implications for the identity and surplus of the unobserved runner-up product. The GEV distributional assumption on taste terms is crucial: it ensures that (1) choice probabilities have a closed form, (2) the surplus advantage can be expressed in terms of observed surpluses and GEV terms, and (3) the probability that the NBS is constrained has a closed form. This reduces the full problem to a lower-dimensional numerical integral over the normally distributed random effects.

Q: What empirical evidence motivates the negotiated pricing model over simpler alternatives? A: Four data patterns motivate the model. First, prices vary across projects even after controlling for product identity and buyer identity — intra-buyer cross-project variation that is inconsistent with standard NiN where prices are set at the buyer level. Second, prices are lower, other things equal, when there is greater local competition from manufacturers not chosen for a project — inconsistent with standard NiN where excluded products play no competitive role. Third, buyers have many projects and make a discrete single-sourcing choice for each. Fourth, sellers are multi-product firms with products differentiated spatially and in other dimensions.

Q: What do the price regressions reveal about price determinants? A: Adding year effects to a simple regression explains only a small share of price variation (R² rises from 0.000 to 0.118 for the full sample). Adding variety-year effects raises R² to 0.775 and adding buyer-variety-year effects to 0.918, but still leaves substantial unexplained variation. Panel B regressions show that prices decrease with quantity, increase with input prices (gas price coefficient 27.2, wage coefficient 8.3), decrease with buyer-to-seller size ratio (coefficient −2.51), and decrease with greater local competition (a distance advantage indicator raises price by about 0.48–2.20 and N(DST) count reduces price by about 1.49–1.53 depending on specification).

Q: What do the parameter estimates imply about spatial differentiation and buyer preferences? A: Transport costs have a strongly negative effect on value (coefficient on distance is −1.27, s.e. 0.04), and the interaction of distance with fuel costs is also negative and significant. The nesting parameter σJ is estimated at 0.47, indicating substantial within-group taste correlation across products from the same firm. Product characteristics matter: red and wire-cut bricks are preferred, and there are significant interactions between weather conditions and technical brick characteristics (frost positively interacts with strength; rainfall negatively interacts with absorption), indicating that buyers value bricks whose technical performance is suited to their project’s climate.

Q: How is the mean seller bargaining skill estimated, and how is the TIOLI model rejected? A: The mean seller bargaining skill b̄ is estimated at 0.41 (s.e. 0.03), substantially below one. The TIOLI restriction corresponds to b̄ = 1 (all markup determined by surplus advantage). A likelihood ratio test rejects this restriction with a chi-squared statistic of 847 (p < 0.001), providing strong statistical evidence that buyer bargaining power — not just competitive pressure — constrains markups below the TIOLI level.

Q: What are the main findings regarding the distribution of price-cost margins? A: Price-cost margins (Lerner index form) are low on average, with a mean of 0.08, but vary widely across transactions, with a coefficient of variation of 0.78. Sellers set higher margins to buyers located relatively close to them (lower transport costs make the seller more attractive to the buyer, strengthening the seller’s position). Multi-product manufacturer portfolios also affect markups, but the relevance of multi-product ownership varies across projects depending on whether different products from the same firm compete as first-best and runner-up for a given project.

Q: What does the uniform pricing counterfactual show, and how does it differ from the Hotelling benchmark? A: Switching from individually negotiated to uniform pricing raises average markups by 34% at the observed market structure. However, effects are heterogeneous: approximately 15% of transactions see markup decreases. Buyers who benefit from the switch are those in transactions with relatively weak runner-up competition — who had weak bargaining positions under negotiated pricing — and who gain because uniform pricing prevents sellers from exploiting that weakness. This contrasts with the result from the simple Hotelling linear city model (Thisse and Vives 1988), where switching to uniform pricing raises all markups.

Q: How does the demerger counterfactual quantify multi-product effects? A: Decomposing the observed market to single-product manufacturers reduces total manufacturer surplus by 25%. This large reduction reflects the role of multi-product ownership in determining who the runner-up is for each transaction: when a manufacturer owns multiple products, it can avoid internal competition between its own first-best and runner-up products, preserving its surplus advantage. The impact is highly unequal across individual transactions, however, because the relevance of multi-product effects depends on whether any of a manufacturer’s other products would have been the runner-up for a given project.

Q: What does the merger of the two largest firms imply for markups and surplus? A: The merger of the two largest firms (by market share) increases total manufacturer surplus in the industry by 19%. Markup increases are very unequal across transactions: the merger affects only those transactions for which the merging firms jointly become the first-best and runner-up, which is the mechanism highlighted in the 2010 US Merger Guidelines for negotiated pricing markets. The heterogeneity of effects means that aggregate market-level concentration measures (such as HHI changes) can be poor proxies for merger effects in these markets.

Q: How does the pricing regime interact with merger effects? A: Comparing the same mergers under negotiated versus uniform pricing, negotiated pricing abates the average markup-increasing effects of mergers. However, for a minority of transactions — specifically those where the merger creates a first-best/runner-up pairing that did not exist pre-merger — negotiated pricing makes the merger’s markup effect worse than it would be under uniform pricing. This implies that the direction of the pricing-regime effect on merger harm is not uniform across buyers, and that transaction-level analysis is required for accurate antitrust assessment.

Q: How does the paper relate to the Competition Commission’s 2007 assessment of the Wienerberger/Baggeridge merger? A: The CC (2007) found the market highly concentrated (HHI 2,113, implied HHI increase of 390 from the merger, both exceeding guideline thresholds) but approved the merger, judging profitability to be at or below average for comparable industries and competition to be more intense than the concentration level alone would suggest. This paper’s model provides formal underpinning for that assessment: with negotiated pricing and buyer bargaining power, markups are constrained by the runner-up competitive threat at the transaction level, not by market-wide concentration, and the low mean Lerner index of 0.08 is consistent with the CC’s profitability finding.

Q: What external validity evidence supports the model’s cost specification? A: The paper compares the marginal costs implied by the estimated model to plant-month level production cost data that were not used in estimation. A good match between the two provides external validation of the cost specification and supports the model’s structural interpretation of the markup decomposition.

First-best and runner-up products: Defined at the project level in terms of surplus (value minus cost). The first-best product j(i,1) is the inside good yielding the highest surplus for project i; the runner-up j(i,2) is the highest-surplus inside good not sold by the first-best seller. These two products — and only these two — determine the equilibrium markup and buyer choice; third-best and lower-ranked products are irrelevant.

Surplus advantage: The difference wj(i,1) − wj(i,2) ≥ 0 between the first-best product’s surplus and the runner-up’s surplus for a given project. This is the competitive constraint on the first-best seller’s markup under TIOLI pricing and the binding ceiling on the negotiated markup whenever the unconstrained Nash bargaining solution would exceed it.

Negotiated pricing: A pricing arrangement in which buyers negotiate prices specific to the individual purchase occasion (here, each construction project), as opposed to uniform pricing where the pre-transport price is the same for all buyers. Prices are determined bilaterally between buyer and competing sellers, with the buyer’s outside option — buying the runner-up at its anticipated negotiated price — serving as the competitive constraint.

Outside option principle (Binmore et al. 1989): The principle that a rival offer (outside option) has no effect on a bilateral Nash bargaining problem unless it would leave the receiving party better off than the Nash bargaining solution — i.e., it constrains rather than shifts the disagreement point. In the paper’s model, the runner-up seller’s zero-markup offer serves as the first-best seller’s constraining outside option when seller bargaining skill is high.

GEV (Generalized Extreme Value) taste distribution: The distributional assumption on project-product idiosyncratic match terms that makes the joint likelihood of observed product choice and negotiated price tractable. The GEV structure yields closed-form choice probabilities (nested logit) and allows the surplus advantage — which depends on unobserved runner-up surplus — to be expressed analytically, enabling joint estimation from transaction-level data.

Price-cost margin (Lerner index): Markup (price minus cost) divided by price, used here at the transaction level. The estimated mean Lerner index is 0.08 with a coefficient of variation of 0.78, reflecting wide dispersion driven by spatial variation in local competition and first-best surplus advantage across transactions.

Nash-in-Nash (NiN) vs. single-sourcing bargaining: NiN (Horn and Wolinsky 1988) applies when a buyer trades positive quantities of all products with negotiated prices (multi-sourcing); the paper’s model applies when a buyer makes a discrete single-sourcing choice per occasion, so only the chosen product’s price is observed. The distinction generates different data observability and different competitive mechanisms — in NiN, excluded products play no role; in this paper, the runner-up’s potential zero-markup offer disciplines the first-best seller’s markup.

Competitive Advertising and Pricing

Mon, 01 Jan 0001 00:00:00 +0000

Hwang, Kim, and Boleslavsky study how firms in an oligopoly simultaneously choose prices and advertising strategies, where advertising is modeled as the choice of how much product information to disclose to consumers. The paper extends the canonical Perloff-Salop (1985) random-utility discrete-choice framework — in which n firms engage in Bertrand competition for a consumer whose value for each product is independently drawn from a common distribution F — by endogenizing the information environment: each firm may choose any mean-preserving contraction (MPC) of F as its advertising strategy, with no structural restriction on feasible content. This full flexibility, drawn from the information design literature, allows each firm to choose the consumer’s effective value distribution, ranging from full information (choosing F itself) to complete concealment (a degenerate distribution at the mean). The model is silent on advertising costs, which are assumed to be zero throughout.

The central result is that intense competition forces firms to provide precise product information. Formally, the full information equilibrium — in which every firm chooses F — exists in the advertising game (the subgame in which prices are fixed symmetrically) if and only if F^(n-1) is convex over its support. Because F^(n-1) represents the distribution of the consumer’s best outside option, convexity means the consumer likely faces an attractive alternative, incentivizing each firm to maximize the chance of offering the highest possible value. Crucially, this convexity condition is guaranteed to hold when n is sufficiently large, regardless of the shape of F, because the power function x^(n-1) becomes more convex as n rises. This establishes that under sufficiently intense competition, full information disclosure is the unique symmetric equilibrium.

The general equilibrium advertising strategy G* — which governs cases where full information is not an equilibrium — satisfies two necessary and sufficient conditions: (i) (G*)^(n-1) is convex over the support of G*, and (ii) for almost all values in the support, G* either coincides with F (where the MPC constraint binds, preventing further dispersion) or (G*)^(n-1) is locally linear (where the firm is locally risk-neutral and has no incentive to alter its distribution). The paper proves existence and uniqueness of G* for any F satisfying the stated regularity conditions (density positive, continuously differentiable, bounded, with finitely many peaks). When F has log-concave density, a unique symmetric pure-price equilibrium (p*, G*) exists in the full game.

The paper demonstrates that strategic advertising has ambiguous implications for prices and consumer welfare. Strategic advertising necessarily reduces social surplus through information loss, since consumers select suboptimal products with positive probability when G* differs from F. However, it compresses the support of the value distribution relative to F, which — by a new result (Proposition 3) — tends to lower the equilibrium price. Offsetting this, strategic advertising also redistributes marginal consumers in ways that may raise or lower the price. In the duopoly case with power distributions F(v) = v^alpha on [0,1], strategic advertising lowers the market price if and only if alpha > 1/sqrt(2) (approximately 0.7071), and raises consumer surplus if and only if alpha > 0.7928.

The paper examines three extensions: (1) a binding consumer outside option, (2) multi-unit (k-out-of-n) demand, and (3) asymmetric firms with two types. In all three cases, full information cannot be a strict equilibrium for any finite n under the relevant structural condition, yet the equilibrium distribution G* converges pointwise to F as n tends to infinity, preserving the paper’s core asymptotic insight.

Q: What is the main research question? A: The paper asks how much product information firms will voluntarily disclose when they compete both on price and advertising content in an oligopoly. Unlike the monopoly literature, the oligopoly context creates strategic interdependencies — each firm’s optimal disclosure depends on rivals’ disclosure choices — that the paper characterizes fully.

Q: How is advertising modeled, and why use mean-preserving contractions? A: Each firm’s advertising strategy is modeled as a choice of any mean-preserving contraction (MPC) of the true value distribution F. An MPC preserves the expected value but reduces dispersion, capturing the idea that a firm can selectively conceal information (moving toward a degenerate distribution) but cannot fabricate value dispersion beyond what F allows. Because consumers are risk-neutral and buy based on expected values net of prices, this MPC formulation captures full flexibility in information design without loss of generality.

Q: What is the precise necessary and sufficient condition for the full information equilibrium in the advertising game? A: The full information equilibrium — in which every firm chooses F — exists if and only if F^(n-1) is convex over its support [v, v̄]. The “only if” direction follows from Lemma 1: in any equilibrium, (G*)^(n-1) must be convex, so if F^(n-1) is not convex, F is not an equilibrium. The “if” direction follows because a convex F^(n-1) makes each firm locally risk-loving, so no MPC of F yields a higher payoff than F itself.

Q: Why does sufficiently intense competition force full information disclosure? A: For any distribution F with positive, continuously differentiable, bounded density f with bounded derivative f’, the second derivative of F^(n-1) satisfies F(v)^(n-1)’’ >= (n-1)F(v)^(n-3)[(n-2)epsilon^2 - M], where epsilon = min f(v)^2 > 0 and M = max |f’(v)| < infinity. This expression is strictly positive for n sufficiently large, so F^(n-1) is convex and the full information equilibrium exists. Economically, with many competitors each firm wins the consumer only when it offers the highest possible value, so providing full information is optimal.

Q: What are the two necessary and sufficient properties characterizing the general equilibrium advertising strategy G?* A: First (Lemma 1), (G*)^(n-1) must be convex over the support of G* — this prevents any firm from profitably concentrating mass to reduce dispersion. Second (Lemma 2), for almost all values in the support, either G* = F locally (the MPC constraint binds, preventing further dispersion) or (G*)^(n-1) is locally linear (the firm is locally risk-neutral and indifferent over distributions with the same local mean). Theorem 1 proves these two conditions are both necessary and sufficient, and that G* is unique for any F satisfying the stated regularity conditions.

Q: What structure does G take when F^(n-1) has strictly quasi-concave density?* A: By Corollary 2(1), there exists a cutoff v* in [v, v̄] such that G*(v) = F(v) for v <= v* (full information below the cutoff) and (G*)^(n-1) is linear above v*. As n increases, v* rises, meaning the region of full disclosure expands, and G* increases in convex order — so consumers receive strictly more information. One immediate implication is that consumer surplus strictly increases in n: consumers benefit both from more options and from more accurate information about each product.

Q: What happens when F^(n-1) is concave? A: By Corollary 3, when F^(n-1) is concave, (G*)^(n-1) is linear over the entire support, with lower bound v. In the illustrative Example 1 (truncated exponential with n=2), this yields G* = U[0, 2*mu_F] — a uniform distribution on an interval whose upper bound is twice the mean of F.

Q: Does strategic advertising raise or lower equilibrium prices, and consumer surplus? A: Both effects are ambiguous and depend on the shape of F. Strategic advertising compresses the support of the value distribution (since G* is an MPC of F), which by Proposition 3(1) tends to lower equilibrium prices. But it also reshapes the distribution of marginal consumers, which may raise or lower prices. In the power distribution example (n=2, F(v) = v^alpha on [0,1]), strategic advertising lowers the market price if and only if alpha > 1/sqrt(2) ≈ 0.7071, and raises consumer surplus if and only if alpha > 0.7928. Thus even with deadweight loss from information suppression, consumers can be better off under strategic advertising than under forced full disclosure.

Q: What does Proposition 3 contribute about equilibrium prices in the Perloff-Salop model? A: Proposition 3 delivers two results about how the distribution of marginal consumers (integral (F^(n-1))’ dF) determines equilibrium prices. First, the measure of marginal consumers decreases if F is proportionally stretched over a larger support, confirming that longer support raises equilibrium prices. Second — presented as novel — among all distributions with support in [v, v̄], the power distribution F(v) = ((v-v)/(v̄-v))^(2/n) minimizes the measure of marginal consumers, corresponding to the maximum equilibrium price. The key property is that marginal consumers are uniformly distributed under this power distribution, and any deviation from uniformity allows a “flattening” adjustment that increases the measure of marginal consumers and lowers the price.

Q: Under what condition does the full game (price plus advertising) have a unique symmetric pure-price equilibrium? A: Theorem 2 states that log-concavity of the density f is sufficient for existence and uniqueness of a symmetric pure-price equilibrium (p*, G*) as characterized in Theorems 1 and 2. Log-concavity ensures that the equilibrium distribution G* has a convex-linear structure (as in Corollary 2), which preserves log-concavity of each firm’s profit function even under compound deviations (simultaneous changes to both price and advertising strategy), making the first-order conditions sufficient for global optimality.

Q: Can strategic advertising create or destroy pure-price equilibria relative to the Perloff-Salop benchmark? A: Yes, both directions are possible. When F^(n-1) is convex (so G* = F), equilibrium existence in the Perloff-Salop (PS) model is necessary but not sufficient for existence in the full model, because compound deviations (changing both price and advertising) may be profitable even when pure price deviations are not. Conversely, when G* differs from F, the changed distribution of marginal consumers can sustain an equilibrium in the full model even when none exists in PS. Appendix E of the paper provides a specific example of the latter phenomenon.

Q: What happens with a binding consumer outside option? A: Proposition 4 shows that a full information equilibrium never exists in the advertising game when the consumer has a binding outside option (p* in (v, v̄)). The firm’s value function acquires a discrete jump at p* due to the indicator 1_{v >= p*}, making it optimal to pool mass around p* rather than disclose fully. Nevertheless, Proposition 5 proves that G* converges pointwise to F as n tends to infinity, because the jump of size F(p*)^(n-1) vanishes exponentially fast as n grows.

Q: Does the full information result survive multi-unit demand? A: No. Proposition 6 shows that with k > 1 units demanded (out of n products), the full information equilibrium never exists for any finite n or F. The reason is that phi’(v; F) — the firm’s marginal value of offering value v — is zero at v̄ when k > 1, so the firm can profitably pool values near the top of the support. However, Proposition 7 shows that G* converges pointwise to F as n tends to infinity (with k fixed), preserving the asymptotic full information result.

Q: What happens with asymmetric firms differing in their value distribution supports? A: Proposition 8 shows a sharp dichotomy. If both firm types share the same upper bound of their value supports (v̄_1 = v̄_2), the full information equilibrium exists whenever both F_1^(n1-1) and F_2^(n2-1) are convex. If the supports have different upper bounds (v̄_1 < v̄_2), the full information equilibrium never exists regardless of n_1 and n_2, because type-2 firms face a downward kink in their winning probability at v̄_1 and always have an incentive to pool mass there. The authors conjecture that G*_1 and G*_2 still converge to F_1 and F_2 asymptotically but do not prove this due to technical complexity.

Q: How does this paper relate to Ivanov (2013)? A: Ivanov (2013) also uses the Perloff-Salop framework and shows that full information is an equilibrium when n is sufficiently large, but restricts advertising to rotation-ordered strategies (in the sense of Johnson and Myatt, 2006). The present paper imposes no structural restriction and strengthens Ivanov’s result by: (a) providing a necessary and sufficient condition for the full information equilibrium (not just a sufficient condition for large n); (b) fully characterizing G* when full information is not an equilibrium; and (c) demonstrating robustness across multiple model variants.

Q: What policy implication does the ambiguity result carry? A: The paper warns against assuming that mandating full information disclosure is unambiguously consumer-beneficial. While strategic advertising creates deadweight loss through information suppression, it can simultaneously compress support and alter the marginal consumer distribution in ways that lower equilibrium prices significantly. The power distribution example (alpha > 0.7928) shows consumers can be strictly better off under strategic advertising than under forced full disclosure. This ambiguity is a cautionary tale for disclosure regulation.

Mean-Preserving Contraction (MPC): A distribution G_i is an MPC of F if it has the same mean as F but less dispersion (in the sense of second-order stochastic dominance). In the paper, each firm’s feasible advertising strategies are exactly the set MPC(F) — this captures all informationally feasible disclosures without structural restriction on content.

Advertising Game: A restricted subgame of the full market game in which firms choose their advertising strategies G_i taking the symmetric price as given. An equilibrium in the advertising game is a necessary condition for equilibrium in the full game. The advertising game’s equilibrium uniquely pins down G* independently of the price level (under the baseline model without binding outside option).

Full Information Equilibrium: An equilibrium of the advertising game in which every firm chooses the true underlying distribution F as its advertising strategy. This corresponds to complete, unobstructed product disclosure. The paper’s central result is that this equilibrium exists if and only if F^(n-1) is convex over its support.

Convexity of F^(n-1): The key distributional condition governing advertising equilibria. F^(n-1) is the distribution of the consumer’s best alternative among (n-1) rivals’ products. Convexity of F^(n-1) means its density is increasing, signaling a likely attractive outside option, which makes each firm risk-loving and induces full disclosure. This convexity is guaranteed for n sufficiently large.

Locally Linear (G)^(n-1):* A region of the equilibrium distribution where (G*)^(n-1) has constant slope, making the firm locally risk-neutral. Over such a region, the firm is indifferent among all distributions with the same local mean, and the equilibrium G* need not coincide with F — it is only required to be an MPC of F on that interval. This alternating structure (coinciding with F on strictly convex regions; linear elsewhere) fully characterizes G*.

Marginal Consumers: In the Perloff-Salop pricing formula, the equilibrium price p* = (1/n) / integral [(G*(v)^(n-1))’ dG*(v)]. The integrand (G*(v)^(n-1))’ * g*(v) is the density of consumers who are indifferent between a given firm’s product and their best alternative at value v. A larger measure of marginal consumers implies lower equilibrium prices through greater competitive pressure.

Compound Deviation: In the full game, a deviation by a firm that changes both its price p_i and its advertising strategy G_i simultaneously, rather than varying only one dimension. The possibility of compound deviations is what distinguishes equilibrium existence conditions in the full model from those in the standard Perloff-Salop model, even when G* = F.

Contract Terms, Employment Shocks, and Default in Credit Cards

Mon, 01 Jan 0001 00:00:00 +0000

Layer 1 — Overview

Research Question

This paper asks two related questions bearing on financial inclusion policy in developing countries: (1) How effective are credit card contract term changes — specifically interest rate reductions and minimum payment increases — in limiting default among new borrowers? (2) How large is the effect of formal-sector job loss on default relative to these contract term interventions, and can the difference in magnitudes be explained by differential cash flow impacts?

Setting and Data

The study is set in Mexico during 2007–2009 and exploits a large nationwide stratified randomized controlled trial implemented by a major commercial bank (“Bank A”) on its financial-inclusion credit card — a product that accounted for approximately 15% of all first-time formal-sector loans in Mexico as of 2010. The study card was targeted at borrowers with limited or no formal credit history (the bank’s “C, C- and D” customer segments); 47% of the experimental sample held it as their first formal loan product. A sample of 144,000 pre-existing cardholders was stratified into nine cells based on bank tenure (6–11 months, 12–23 months, 24+ months) and past repayment behavior, then randomly allocated to eight treatment arms combining two minimum payment levels (5% or 10% of the outstanding balance) and four annual interest rates (15%, 25%, 35%, 45%), for 26 months (March 2007 to May 2009). The study sample is representative of the bank’s national portfolio of approximately 1.3 million study card customers. Card-level data run through December 2014 — five years after the experiment ended — allowing examination of both short- and long-run effects. The experimental sample is matched to Mexico’s Social Security database (IMSS), providing monthly formal employment histories from January 2004 to December 2012 for 59% of the sample; and to credit bureau data, allowing observation of defaults across all formal financial institutions.

Main Findings with Quantitative Magnitudes

Result 1 — Interest rate effects are modest in aggregate. A 30 percentage point (pp) decrease in the annual interest rate (from 45% to 15%, a 67% reduction relative to the baseline rate) decreased cumulative default by 2.5 pp over the 26-month experiment, for a default elasticity of +0.20. Over the same 18-month horizon used for unemployment comparisons, the implied effect is 1.03 pp. These magnitudes are substantially smaller than predictions elicited from Mexican central bank regulators (mean predicted decrease: 8.6 pp) and from participants on the Social Science Prediction Platform (mean predicted decrease: 5 pp). Default continued to decline in the lower-rate arm for approximately three years after the experiment ended, reaching −1 pp by March 2012, after which effects became statistically indistinguishable from zero.

Result 2 — No effect on the newest borrowers. For the newest borrowers (those with 6–11 months of tenure when the experiment began — the group with a 36% cumulative default rate over 26 months versus 18% for those with 24+ months of tenure), the interest rate reduction has no effect on default over the 26-month period, with point estimates consistently small and statistically indistinguishable from zero. This is in contrast to older borrowers, who are meaningfully responsive.

Result 3 — Minimum payment increases increase short-run default but reduce long-run default. Doubling the minimum payment from 5% to 10% of outstanding balance increased cumulative default by 0.8 pp by the end of the experiment (26-month elasticity: +0.04; p = 0.016), driven primarily by defaults occurring within the first year. The short-run increase is concentrated among the most liquidity-constrained borrowers — those with the highest baseline debt utilization and those in the minimum-payer stratum (baseline debt utilization rate of 85%). After the experiment ended and all arms were returned to the same 4% minimum payment, the previously higher-minimum-payment arm exhibited persistently lower default, reaching a 1 pp decline by the end of the sample (p = 0.054 at end of study period), relative to a base default rate of 41% at that point.

Result 4 — Job displacement effects are seven times larger than contract term effects. Formal-sector job displacement (identified using mass layoff events at firms with 50+ employees, defined as year-on-year employment contractions exceeding 30% of prior-year average employment) increased cumulative default by 4.8 pp after 12 months and 7.6 pp after 18 months. This is seven times larger than the effect of a 30 pp interest rate decrease (1.03 pp over 18 months) and nine times larger than the effect of doubling minimum payments (0.8 pp). Formal job loss alone can explain approximately 14% of total study card default during the experiment (calculation: 19.8% of formally employed study card borrowers lose their job at least once in the first 18 months; multiplied by the 7.6 pp default increase per spell, this yields 1.5 pp of the 10.8% base default rate at 18 months). Results are corroborated using a nationally representative matched credit bureau–IMSS sample of 600,339 borrowers, which yields 8,723 mass layoff events and similar estimates.

Per-peso normalization. A back-of-the-envelope calculation normalizes all three shocks by their respective cash flow impacts. The interest rate decrease reduces cumulative required minimum payments due by 2,917 MXN pesos over 18 months; the minimum payment doubling increases them by 1,325 MXN pesos; formal job loss reduces total labor earnings by an estimated 21,328 MXN pesos (adjusting formal-sector earnings losses of 77,555 MXN pesos downward by 72.5% to reflect that 82% of workers who lose formal employment transition to informal employment in the following quarter, with total earnings falling only 27.5%). The per-peso default effects are: 0.36 pp per 1,000 MXN pesos for the interest rate intervention; 0.51 pp for the minimum payment intervention; and 0.36 pp for job displacement. The null hypothesis that all three per-peso effects are equal cannot be rejected (p = 0.78).

Interpretation

The authors present a simple two-period optimizing model emphasizing the role of previously accumulated debt and liquidity constraints. The model generates four testable predictions consistent with the data: (1) lower interest rates decrease default via reduced debt burden; (2) higher minimum payments increase short-run default by tightening liquidity constraints; (3) “surprise” minimum payment increases (where borrowers anticipated they would continue) reduce post-experiment default via debt reduction; (4) negative income shocks (modeled as first-order stochastic dominance deterioration in period-2 income) increase default. The per-peso normalization supports the interpretation that cash flow impacts — not differential per-peso susceptibility to shocks — drive the relative magnitudes of the three effects.

In depth

Q1. Why is the interest rate elasticity of default (0.20) so much lower than prior estimates in the literature?

A: The paper contrasts its 26-month elasticity of +0.20 with estimates from Karlan and Zinman (2019) (1.8) and Adams et al. (2009) (2.2), and notes it falls in the same range as Karlan and Zinman (2009) (0.27) and DeFusco et al. (2021) (0.01). The paper proposes that variation in borrower tenure may partly explain cross-study differences, as default elasticities appear to be increasing in bank tenure. The newest borrowers — the most policy-relevant subgroup — show zero elasticity, pulling the overall estimate down. The paper also argues that in this context, interest-rate-driven moral hazard (all channels: debt burden, concurrent, and dynamic) is collectively small.

Q2. What mechanism explains why newer borrowers are entirely unresponsive to interest rate changes?

A: The paper hypothesizes that newer borrowers place a higher continuation value on the card (captured by parameter v in the model) because they have fewer formal credit alternatives; at baseline, only 64% of the 6–11 month stratum held a card with another bank versus 78% of the 24+ month stratum. A higher continuation value implies more muted responses to interest rate changes (formally derived in Appendix E.3). Newer borrowers also respond more strongly to credit limit increases, consistent with tighter liquidity constraints. A regression controlling for age, gender, baseline card ownership, debt utilization, labor force attachment, and earnings cannot explain away the differential treatment effect between new and old borrowers (differential remains significant at p = 0.05), suggesting the tenure gradient in responsiveness is not simply a composition effect.

Q3. Why does increasing minimum payments raise short-run default but reduce long-run default?

A: In the short run, the doubling of minimum payments tightens liquidity constraints for already-constrained borrowers. The increase in default is concentrated among borrowers in the highest baseline debt-utilization tercile and among minimum-payers (baseline debt utilization of 85%), and is preceded by a sharp rise in delinquencies in months 3–5 (which trigger 350 MXN peso fees per occurrence, further worsening the repayment burden). In the long run, borrowers who anticipated continuing higher minimum payments (the experiment ended without advance notice, so borrowers expected the new terms to persist) chose lower debt levels during the experiment. Since all arms were returned to the same low minimum payment when the experiment ended, the lower-debt borrowers in the higher-minimum-payment arm were better positioned to weather subsequent shocks, producing the 1 pp post-experiment decline in default. The hypothesis that this is driven by habit formation in payment behavior is ruled out by the absence of any effect of past higher minimum payments on post-experimental payment levels.

Q4. How is the mass-layoff identification strategy designed and validated?

A: The paper uses the universe of IMSS formal employment records to define a mass layoff at a firm (50+ employees) as the first month in which year-on-year employment declines by more than 30% of average employment in the prior 12 months. An individual is “displaced” if they lost their job in the same quarter as their employer’s mass layoff event. The identification assumption is that, conditional on individual and time fixed effects, the exact timing of the mass layoff is uncorrelated with workers’ potential default outcomes. This is supported by: (1) mass layoffs occurring in every period, making coincidence with credit market shocks unlikely; (2) time fixed effects absorbing common trends; and (3) the absence of statistically distinguishable pre-trends in default between displaced and non-displaced workers. The paper implements both standard two-way fixed effects and the staggered DiD estimator of de Chaisemartin and D’Haultfoeuille (2024), which remains valid under heterogeneous and dynamic effects, and the results are similar across methods.

Q5. How does the paper account for informal employment when estimating the cash flow impact of job loss?

A: Formal-sector earnings losses over 18 months post-displacement are estimated at 77,555 MXN pesos using IMSS wage data in an event-study design paralleling the default equation. However, since more than 4/5 of workers who lose formal employment are informally employed in the following quarter (based on Mexico’s ENOE labor force survey panel), and total labor earnings fall by only an estimated 27.5% over the three post-displacement quarters, the paper scales the formal earnings loss down to 21,328 MXN pesos (≈ 0.275 × 77,555). This brings the estimated earnings loss closer to prior developed-country estimates of displacement costs and is treated as a lower bound relative to the raw formal-earnings loss figure.

Q6. Does the cost of default deter borrowers from defaulting, and what is the cost?

A: The paper argues that defaulters face substantial consequences. Using an instrumental variables strategy (treatment assignment as instrument for default on the study card), the probability of having a new loan one year after default is estimated to be 65 pp lower relative to the non-default counterfactual (p = 0.03). A selection-on-observables approach also shows that study card default is associated with the complete absence of any subsequent credit card for at least four years. These costs should provide strong incentives to remain current, making the high observed default rates primarily attributable to cash flow shocks rather than strategic default. The value of formal credit is further confirmed by the finding that a 100 MXN peso increase in the study card’s credit limit translates into 32 MXN pesos of additional debt (instrumental variable estimates are more than twice as large as OLS), and by the comparison of informal loan terms (annual rates averaging 291%, loan amounts of 3,658 MXN pesos, durations of 0.52 years) with formal loan terms (94 pp lower rates, 9,842 MXN peso average amounts, 1.07 year durations).

Q7. Are the default treatment effects different across the interest rate and minimum payment interventions, or do they interact?

A: The paper tests for and cannot reject separability between the two interventions at standard significance levels. At the end of the experiment (May 2009), the p-value for the null that the minimum payment effect is constant across interest rate arms is 0.44; five years later it is 0.65. The null that the interest rate effect is constant across both minimum payment arms yields p = 0.08 at end of experiment and p = 0.411 five years later. The fully saturated specification yields results indistinguishable from the parsimonious linear-separable specification.

Q8. Are there spillover effects from the contract term changes onto other loans held by study participants?

A: No spillover effects on default on other loans are found, either during the experiment or after it ended, based on credit bureau data covering all formal-sector loans held by the experimental sample. There is also no evidence of crowd-out or crowd-in from other lenders in terms of new loans or loan closures. The only minor exception is a small decrease in default (3%, or approximately 2 pp out of a 61 pp base) on other Bank A loans in the high minimum payment arm.

Q9. Why does the effect of unemployment on default exceed the model’s predictions from cash flow alone?

A: The paper’s back-of-the-envelope normalization finds that the per-peso effects of all three shocks on default are statistically indistinguishable (p = 0.78 for the null that all three λ estimates are equal), with point estimates of λ_IR = 0.36, λ_MP = 0.51, and λ_U = 0.36 pp per 1,000 MXN pesos. This implies that job loss does not have a larger per-peso effect on default than contract term changes; the larger absolute effect of displacement arises entirely from its larger cash flow impact. Additional consequences of job loss beyond cash flow (health, mental health) do not appear to generate additional default beyond what can be attributed to income loss.

Q10. How do the experimental results compare to what experts predicted?

A: Expert predictions were systematically too large. Mexican central bank regulators predicted a mean decrease of 8.6 pp from a 30 pp interest rate reduction at the 18-month horizon, versus the actual estimated effect of 1.03 pp. Social Science Prediction Platform respondents predicted a mean decrease of 5 pp. For minimum payments, regulators on average predicted a 0.4 pp decrease in default from doubling the minimum payment, whereas the actual effect was a 0.8 pp increase. Three-quarters of SSPP respondents correctly predicted the sign of the minimum payment effect (an increase in default), but the predicted mean increase was 6.4 pp, far larger than the estimated 0.8 pp.

Q11. Do the job displacement results generalize beyond the experimental sample?

A: Yes. The paper repeats the displacement event study on the intersection of the nationally representative credit bureau sample (approximately 600,339 individuals with both credit information and employment histories) with the universe of IMSS data for October 2011–March 2014, yielding 8,723 mass layoff events. This sample is representative of the population of Mexican borrowers with formal employment histories, and the estimated effects on default for any loan in the credit bureau are similar in magnitude to the experimental-sample results, providing a measure of external validity.

Q12. What do the debt dynamics during the experiment reveal about the mechanisms for interest rate effects on default?

A: The data show that purchases (net of payments) increase in response to interest rate decreases, consistent with downward-sloping demand for credit; yet total debt declines in lower-rate arms. This is consistent with the model’s prediction that the mechanical compounding effect (lower rate applied to previously accumulated debt) exceeds the behavioral new-purchase response. Confirmed empirically: the debt elasticity to the interest rate is estimated to be positive, with preferred estimates in the range [+0.18, +0.54]. The decline in default is further concentrated among borrowers with the highest baseline debt utilization rates, those for whom the debt compounding effect is strongest — consistent with the debt channel as the primary mechanism.

Key Concepts

Cumulative Default Measure: Default is defined as three consecutive monthly payments each below the required minimum payment due, at which point Bank A automatically revokes the card. The outcome variable is coded as Yit = 1 if borrower i has defaulted in any month s ≤ t and 0 otherwise, making it a cumulative (absorbing) measure. This allows estimation on an unchanging sample, avoiding attrition biases that would arise from conditioning on not having defaulted in the prior period.

Minimum Payment Due (mpd): The paper uses the required minimum payment due to avoid delinquency as its central cash-flow normalization variable. This is a comprehensive measure that incorporates not only the contractually specified fraction of outstanding balance but also interest charges, fees, and endogenous borrower responses (changes in debt and purchases). It serves as the common denominator for benchmarking the cash flow impacts of the two contract term interventions and formal job loss against one another.

Free Cash Flow / Per-Peso Normalization (λ): The paper defines per-peso default effects (λ^IR, λ^MP, λ^U) by dividing each intervention’s average treatment effect on cumulative default (in percentage points) by the cumulative change in the minimum payment due (or equivalent cash flow impact) induced by that intervention over 18 months. The resulting ratio is expressed as percentage points of default per 1,000 MXN pesos of cash flow change. This normalization is explicitly not treated as an instrumental variable estimate; it is a descriptive back-of-the-envelope calculation intended to equate the scale of the three shocks.

Mass Layoff / Displacement: A mass layoff at the firm level is defined as the first month in which year-on-year firm employment declines by more than 30% of average employment in the prior 12 months, restricted to firms with 50+ employees. An individual worker is classified as displaced if they lost formal-sector employment in the same calendar quarter as their employer’s mass layoff event. This definition follows Jacobson et al. (1993) and subsequent literature and is used to isolate plausibly involuntary (exogenous) separations from voluntary quits or individually driven terminations.

Continuation Value (v): In the paper’s two-period optimizing model, v is the reduced-form utility parameter capturing future flow of card benefits, warm glow from card ownership, or the option value of retaining access to formal credit, experienced only if the card is not in default. The paper uses v to rationalize the zero interest-rate response of newer borrowers: ceteris paribus, higher v implies that borrowers will remain current on the card even when interest rates are high, because they value continued access. Higher v thus implies more muted responses to interest rate changes.

Bank Tenure Strata: Borrowers are stratified into three groups based on length of relationship with the study card: “new customers” (6–11 months), medium-term (12–23 months), and long-term (24+ months). Tenure is used both as a stratification variable for the experiment and as a primary dimension of heterogeneity in treatment effects, reflecting differing default rates (36% vs. 18% at 26 months), labor market vulnerability (1.34× higher job loss probability for new vs. long-term), and interest rate responsiveness (zero for new, significantly positive for long-term borrowers).

Debt Burden Channel vs. Concurrent Moral Hazard: The paper distinguishes three channels through which interest rate changes can affect default: (a) the debt burden channel — higher rates mechanically increase the stock of interest-accruing debt, making repayment harder; (b) concurrent moral hazard — higher current interest rates alter the incentive to default on existing obligations, holding debt constant; and (c) dynamic moral hazard — higher future interest rates reduce the benefit of remaining current. The paper’s finding of a modest total effect (elasticity 0.20) implies that the sum of all three channels is small in this context, with the debt burden channel being the primary driver of what effect does exist.

Costly Multidimensional Screening

Mon, 01 Jan 0001 00:00:00 +0000

This paper studies when a principal can improve upon simple one-dimensional mechanisms by also deploying costly nonprice screening instruments — actions that are socially wasteful yet potentially informative about the agent’s private type.

The model features a principal and an agent with quasilinear, additively separable preferences across two components: (i) a productive component, where allocations lie in a one-dimensional compact space X and generate genuine surplus, and (ii) a costly component, where any allocation y in an arbitrary measurable space Y satisfies sB(y, θB) ≤ 0 — it destroys or at best does not create social surplus. The agent’s private type is multidimensional, θ = (θA, θB), drawn from a commonly known distribution. Both components allow for nonlinear valuations and, on the principal’s side, interdependent preferences.

The central result (Theorem 1) establishes that if the agent’s preferences between the productive and costly components are positively correlated — meaning that a higher θA implies a stochastically higher θB — then there exists an optimal mechanism that involves no costly screening. Moreover, if instruments are strictly costly, every optimal mechanism involves no costly screening almost everywhere. Positive correlation is defined in terms of stochastic dominance: θB | θA is stochastically nondecreasing in θA. A sufficient but not necessary condition is affiliation in the sense of Milgrom and Weber (1982).

The intuition centers on two observations. First, under positive correlation, costly instruments can only help relax upward incentive constraints (deterring lower types from mimicking higher types). Second, under the surplus condition — a single-crossing condition on the surplus function sA(x, θA) requiring that if x generates more surplus than x’ at some type, it continues to do so at all higher types — the principal can safely ignore upward incentive constraints at the optimum. The Downward Sufficiency Theorem (Theorem 2) formalizes the second observation: in any one-dimensional screening problem satisfying the surplus condition, there exists an optimal solution to the relaxed program (with only downward IC constraints) that also satisfies all upward IC constraints. Because monetary transfers fully substitute for costly instruments in relaxing downward constraints without destroying surplus, the costly instruments add no value under positive correlation.

The proof proceeds via a monotone path decomposition of the multidimensional type space, exploiting a measurable monotone coupling (Lemma 1) to write θ = (θA, h(θA; ε)) where ε is independent of θA and h is nondecreasing. This reduces the problem to a family of one-dimensional paths, on each of which the Reconstruction Lemma (Lemma 2) shows that any costly mechanism can be weakly improved upon by one with no costly screening that satisfies all downward IC constraints.

A partial converse (Proposition 1) shows that under negative correlation — when some dimension of θB is stochastically nonincreasing in θA — there exist utility functions satisfying the surplus condition for which any mechanism screening only the productive component is strictly dominated.

The paper derives three applications. In monopoly pricing with costly signals (waiting in line, climbing stairs, collecting coupons), profit-maximizing mechanisms require no costly signals when higher-willingness-to-pay consumers also face weakly lower signal costs (Proposition 2). In monopsonistic labor market screening, the firm need not make offers contingent on costly credentials when higher-ability workers find credentialing easier — in contrast to the competitive Spence (1973) model where all screening must occur through costly effort because wages are pinned down by expected output (Proposition 3). In multiproduct pricing, the paper reinterprets bundle components as costly instruments for screening grand-bundle values, recovering Haghpanah and Hartline’s (2021) pure bundling optimality result and extending it to nested bundling (Proposition 4), under conditions that the incremental value of adding items to nested bundles is strictly increasing in type while the value of any non-nested bundle is nonincreasing relative to some nested superset.

Q: What is the paper’s central research question? A: The paper asks whether a principal can improve upon simple one-dimensional mechanisms by also deploying costly nonprice screening instruments when the agent has multidimensional private information. The goal is to characterize conditions under which augmenting a standard price menu with surplus-destroying actions — such as waiting in line, climbing stairs, or obtaining credentials — is or is not beneficial for the principal.

Q: What does “positively correlated preferences” mean precisely in this model? A: Positive correlation means that θB is stochastically nondecreasing in θA: for any θA < θ̂A, the conditional distribution of θB given θA first-order stochastically dominates that given θ̂A — i.e., θB | θA ≤_st θB | θ̂A. Observing a high θA conveys good news about θB in the stochastic dominance sense. A sufficient but not necessary condition is affiliation in the sense of Milgrom and Weber (1982). The condition is asymmetric and does not require full independence or monotone dependence in a deterministic sense.

Q: What is the surplus condition and why does it matter? A: The surplus condition is a single-crossing condition on the productive surplus function: for any x < x̂ and θA < θ̂A, if sA(x̂, θA) > sA(x, θA) then sA(x̂, θ̂A) > sA(x, θ̂A). It says that if a higher allocation generates more total surplus at some type, it continues to do so at all higher types. This condition ensures the existence of a monotone efficient allocation rule, and it is the key enabling condition for the Downward Sufficiency Theorem. It is automatically satisfied when the principal has no interdependent preferences and the agent satisfies increasing differences, and also when sA is strictly increasing in x or has nonnegative cross partial derivative.

Q: What is the Downward Sufficiency Theorem and why is it the key technical result? A: Theorem 2 states that in any one-dimensional screening problem satisfying the surplus condition, there exists an optimal solution to the relaxed program — which ignores all upward IC constraints — that also satisfies all upward IC constraints. This means the principal can solve the easier downward-IC-only problem and the solution is fully incentive compatible. The result is novel and uncovers a general property of one-dimensional screening problems beyond the standard monotone allocation rule setting. It is key because, combined with the observation that costly instruments under positive correlation can only relax upward constraints, it implies there is no benefit to using costly screening.

Q: How does the proof handle the case of multidimensional types? A: The proof uses a monotone path decomposition. By Lemma 1 (measurable monotone coupling), under positive correlation there exists a random variable ε independent of θA and a nondecreasing measurable function h such that θ =^d (θA, h(θA; ε)). This writes the joint type distribution as a family of monotone paths indexed by ε. On each path ε = e, the types are ordered by θA alone, reducing the problem to a one-dimensional screening problem. The Reconstruction Lemma (Lemma 2) then shows that on each such path, any mechanism involving costly screening can be replaced by one without costly screening that weakly improves principal payoff and satisfies all downward IC constraints.

Q: What does the partial converse (Proposition 1) establish? A: Proposition 1 shows that when some dimension i of the costly component satisfies that θi is stochastically nonincreasing in θA (negative correlation), and the type distribution has a density with |X| > 1 and |Y| > 1, then there exist utility functions satisfying the surplus condition for which any mechanism screening only the productive component is strictly dominated by one involving costly screening. This is not a full converse — it establishes existence of cases where costly screening is strictly beneficial, not that it is always beneficial under negative correlation.

Q: How does the insurance example illustrate the two correlation cases? A: In Example 1 (negative correlation), a low-risk type (θA = 0) values insurance at 2, a high-risk type (θA = 1) values it at 3; costs are 0 and 5/2 respectively; and the high-risk type also has higher disutility for the costly action. Without costly screening, the optimal mechanism sells full insurance at price 2 to both types for a profit of 3/4. With costly screening (e.g., requiring the agent to climb stairs to get full insurance), only the low-risk type purchases, yielding profit of 1 > 3/4. In Example 2 (positive correlation), the high-risk type has lower disutility for the costly action; any mechanism using the costly instrument is strictly dominated by simply selling full insurance at price 2 to both types.

Q: How does the labor market application differ from Spence (1973)? A: In Spence (1973), wages are competitive and pinned down by expected output, leaving no room to screen workers via monetary payments, so all screening must occur through costly credentials. In Yang’s model, the monopsonistic firm sets wages and all types face the same outside option, so monetary transfers can screen types. Proposition 3 says that when θB is stochastically nondecreasing in θA — higher-ability workers find credentials easier — no credential is needed in the optimal mechanism. The paper thus shows that costly screening is a feature of competitive, not monopsonistic, labor markets, under positive correlation of preferences.

Q: What is the bundling application and what new results does it yield? A: The paper reinterprets the multiproduct pricing problem by treating the grand bundle as the productive component and sub-bundles as costly instruments (since selling a sub-bundle instead of the grand bundle destroys social surplus relative to selling the grand bundle). Proposition 4 (nested bundling) establishes that a nested menu B of bundles is optimal among deterministic mechanisms if: (i) the incremental value of adding items to move from bundle b to b’ ⊃ b in B is strictly increasing in θ, and (ii) for any bundle b not in B, there exists a nested superset b’ ∈ B such that the value of b relative to b’ is nonincreasing in θ. This extends and complements Haghpanah and Hartline (2021), which is recovered as the special case of pure bundling (Proposition 5).

Q: What are the key scope conditions that delimit when Theorem 1 applies? A: Theorem 1 requires: (i) additive separability of preferences across productive and costly components; (ii) the surplus condition on sA (single-crossing of total surplus in the productive component); (iii) the positive correlation condition (stochastic monotonicity of θB in θA); and (iv) the costly instruments satisfy sB(y, θB) ≤ 0 for all y, θB. The productive allocation space X must be compact and one-dimensional; Y can be any measurable space. The agent’s type space can be multidimensional. The result holds for both private values and interdependent valuations on the principal’s side.

Q: Under what conditions does costly screening arise in practice, according to the model? A: The model predicts that if costly screening instruments are observed in practice, the consumers or agents with higher willingness to pay (or ability) for the productive good must tend to face higher costs for the screening action. For instance, higher-willingness-to-pay consumers who find waiting in line more costly (positively correlated preferences) would not be subjected to waiting as a screening device. If a firm uses waiting in line, it must be because higher-willingness-to-pay consumers find waiting less costly — consistent with negative correlation.

Costly Instruments: Allocations in the space Y such that the ex post social surplus sB(y, θB) = uB(y, θB) + vB(y, θB) ≤ 0 for all y and all θB. These include actions like waiting in line, collecting coupons, or obtaining credentials that destroy social surplus but may convey private information useful for screening.

Productive Component: The one-dimensional allocation dimension X in which both principal and agent derive non-negative surplus, representing the intrinsically valuable output of the mechanism (e.g., insurance coverage, job placement, bundle of goods).

Positive Correlation (Stochastic Monotonicity): The condition that θB is stochastically nondecreasing in θA: for any θA < θ̂A, the conditional distribution of θB given θA first-order stochastically dominates that given θ̂A. Equivalently, observing a higher θA conveys good news about θB. A sufficient condition is affiliation (Milgrom-Weber), but positive correlation is strictly weaker.

Surplus Condition: A single-crossing condition on the total surplus function sA(x, θA) for the productive component: for any x < x̂ and θA < θ̂A, if x̂ generates strictly more surplus than x at type θA, it continues to do so at θ̂A. This ensures a monotone efficient allocation rule exists and is the enabling condition for the Downward Sufficiency Theorem.

Downward Sufficiency Theorem (Theorem 2): The result that in any one-dimensional screening problem satisfying the surplus condition, there exists an optimal solution to the relaxed program (which ignores upward IC constraints) that also satisfies all upward IC constraints. This implies the principal need only enforce downward incentive constraints at the optimum.

Monotone Path Decomposition: A proof technique that writes the multidimensional type distribution as θ =^d (θA, h(θA; ε)) where ε ⊥ θA and h is nondecreasing in θA. Borrowed from dynamic mechanism design (Eso-Szentes, Pavan-Segal-Toikka), it reduces multidimensional IC problems to families of one-dimensional paths indexed by the independent residual ε.

Nested Bundling: A menu B of product bundles that can be totally ordered by set inclusion (b1 ⊂ b2 ⊂ … ⊂ bK). The paper shows that nested bundling is optimal under conditions that the incremental value of nesting is strictly increasing in type for bundles within B, and nonincreasing relative to any nested superset for bundles outside B.

Counterfactual Analysis for Structural Dynamic Discrete Choice Models

Mon, 01 Jan 0001 00:00:00 +0000

Research Question. Discrete choice data identify only differences in agents’ utilities, not utility levels. In dynamic discrete choice (DDC) models this means many policy-relevant counterfactuals — those requiring knowledge of utility in levels — are not point-identified. Kalouptsidi, Kitamura, Lima, and Souza-Rodrigues ask: how much can researchers learn about counterfactual outcomes under mild, verifiable restrictions, without imposing the strong normalizations that are standard in applied work but often hard to justify and potentially sign-reversing in their effects?

Setting and Methodology. The paper works within a canonical infinite-horizon DDC framework where an agent chooses among a finite action set each period, with additively separable per-period payoffs and i.i.d. unobservables. The econometrician observes conditional choice probabilities (CCPs) and state transition functions from panel data, but the payoff vector is underidentified by X free parameters (one per state), which is the source of non-identification of many counterfactuals. The authors characterize the sharp identified set for counterfactual CCPs, for low-dimensional outcomes such as average welfare, and develop both identification theory and a feasible inference procedure.

Main Identification Results. The sharp identified set for counterfactual CCPs is a smooth, connected manifold whose dimension equals the rank of a specific matrix (CJ*QJ) that the econometrician can compute directly from the data. This rank is at most X minus the number of linearly independent equality restrictions imposed. Two classes of commonly used restrictions reduce the dimension further without requiring full point identification: (i) local counterfactuals — experiments affecting only a subset of the state-action space — reduce the dimension to at most the number of eigenvalues of the relevant transformation matrix that differ from one; (ii) parametric payoffs with ηγ free parameters reduce the dimension to at most ηγ. Combining both achieves the tightest bound. Point identification is the special case where the rank equals zero.

For scalar low-dimensional outcomes (e.g., average welfare), the identified set is a compact interval whose endpoints are obtained by solving constrained optimization programs implementable in standard nonlinear solvers (e.g., Knitro), feasible even when the state space is large.

Quantitative Illustration. In the firm entry/exit Monte Carlo with state space X = 4 and a counterfactual entry subsidy removal: under Restriction 1 alone (outside option = 0, non-negative costs, known variable profits), the identified set for the change in the long-run probability of being active is [-0.1235, 0.0000], correctly signed and containing the true value of -0.0638. Adding shape restrictions (Restrictions 1–2) tightens the upper bound to -0.0341; adding the scrap-value exclusion restriction (Restrictions 1–3) tightens it to -0.0421. Analogous patterns hold for consumer surplus (true: -0.0875; bounds narrowing from [-0.1735, 0.0000] to [-0.1735, -0.0573]) and firm value (true: 0.9513; bounds from [0.0000, 1.8229] to [0.6388, 1.8229]). Critically, the authors show that setting scrap values to zero — the standard identifying assumption — is rejected by the data under Restrictions 1 and 2, because that payoff vector does not lie in the identified set.

Empirical Application. Revisiting Das, Roberts, and Tybout (2007) on Colombian exporters, the paper re-examines the horserace among export revenue, fixed cost, and entry cost subsidies. The DRT ranking (revenue subsidies dominate, entry cost subsidies rank last) survives under weaker restrictions than originally imposed, but hinges on the assumption that scrap values do not vary across states. Without that restriction, entry cost subsidies can potentially outperform the other types, reversing the original conclusion.

Inference. The paper develops a subsampling-based inference procedure that is asymptotically uniformly valid (bootstrap fails here due to non-regularity of the set boundary). The confidence set is constructed by inverting a quadratic-form distance test statistic. The critical practical recommendation is subsample size hN = N^{2/3}. The procedure remains feasible in binary choice models with state spaces up to X = 240 (dimension of the optimization problem: 720), where standard moment-inequality approaches are computationally infeasible.

Q: Why are many counterfactuals not point-identified in DDC models, even after the model is estimated? A: Choice data identify only differences in value functions across actions, not utility levels. The identifying matrix M has rank AX, leaving X free payoff parameters undetermined. Counterfactuals that depend on utility levels — such as the welfare impact of an entry subsidy when scrap values are unknown — therefore cannot be recovered uniquely from the data, even with a fully estimated model.

Q: What is the key object the paper characterizes, and what does it look like geometrically? A: The paper characterizes the sharp identified set for the counterfactual CCP vector p̃. Proposition 1 establishes that this set is a smooth, connected manifold with boundary, whose interior dimension equals rank(CJ*QJ). Connectedness is important because it means the set has no gaps and boundary tracing is sufficient to characterize it.

Q: How does the dimension of the identified set depend on the type of model restrictions imposed? A: Equality restrictions (d of them) reduce the maximum possible dimension from X to X–d. Local counterfactuals (affecting L state-action pairs) reduce the dimension further to at most the number of eigenvalues of the payoff transformation H(L) that differ from one, which is at most L. Parametric payoffs with ηγ free parameters cap the dimension at ηγ. Combining local counterfactuals with parametric payoffs gives the tightest bound: at most the number of eigenvalues of a related matrix D that differ from one, which is at most min(L, ηγ).

Q: Under what conditions does the identified set for counterfactual behavior collapse to a point? A: When rank(CJ*QJ) = 0, every payoff vector in the identified set PI maps to the same counterfactual CCP — that is, p̃ is point-identified even though the structural payoff π may not be. This can occur through a combination of equality restrictions and specific structure of the counterfactual experiment, without requiring full identification of all model parameters.

Q: What properties does the identified set for a scalar low-dimensional outcome have, and how is it computed? A: Under continuity of the outcome function φ and boundedness of the payoff identified set, the identified set for a scalar outcome θ is a compact interval [θL, θU]. The endpoints are computed as the minimum and maximum of a constrained optimization program over the joint space of counterfactual CCPs and payoff vectors, subject to the model’s Bellman equations, model restrictions, and equality constraints linking observed to counterfactual behavior. These programs can be solved with standard nonlinear solvers.

Q: What do the Monte Carlo results show about the informativeness of the bounds? A: In the firm entry/exit example with X = 4, the identified sets under only mild restrictions (non-negative costs, known variable profits, zero outside option) are already informative and correctly signed. For the change in the probability of being active (true value: -0.0638), the set under Restriction 1 alone is [-0.1235, 0.0000], establishing that the probability does not increase. Adding shape restrictions and exclusion restrictions progressively tightens the interval. All intervals contain the true parameter value, confirming sharpness.

Q: What does the paper show about the assumption of zero scrap values, which is standard in the entry cost literature? A: The paper shows that setting scrap values to zero can be rejected by the data: in the firm entry/exit example, the payoff vector with s = 0 does not belong to the identified set PI under Restrictions 1 and 2. This is empirically important because Kalouptsidi, Scott, and Souza-Rodrigues (2021) had previously shown that mistakenly setting scrap values to zero not only biases estimated entry costs downward but can also reverse the sign of a subsidy’s predicted effect.

Q: What is the main finding of the empirical application to export subsidies? A: Revisiting Das, Roberts, and Tybout (2007), the paper finds that the DRT ranking — export revenue subsidies dominate, entry cost subsidies rank last — can be confirmed under restrictions weaker than those DRT originally imposed. However, the ranking is not robust to allowing scrap values to vary across states: under that generalization, entry cost subsidies can potentially outperform the other subsidy types, reversing the original policy conclusion.

Q: Why does the bootstrap fail for inference in this setting, and why does subsampling work? A: The test statistic ĴN(θ0) involves the minimum of a quadratic form over a non-regular (kinked), random, and possibly nonconvex set. Bootstrap critical values are not asymptotically uniformly valid in this non-regular setting. Subsampling with subsample size hN → ∞, hN/N → 0 (the paper recommends hN = N^{2/3}) delivers asymptotically uniformly valid critical values under weak conditions, because it does not require regularity of the constraint set boundary.

Q: How does the inference approach handle the high dimensionality of DDC settings? A: The paper develops a computational algorithm specifically tailored to the structure of DDC models, exploiting the linear Bellman equation constraints to reduce the effective dimensionality of the optimization problem. In a binary choice model with X = 90, the joint optimization is over a 270-dimensional space; with X = 240 (as in Blundell, Gowrisankaran, and Langer, 2020), the dimension is 720. Standard moment-inequality inference methods (Kaido, Molinari, Stoye, 2019; Bugni, Canay, Shi, 2017) are computationally infeasible at these scales; the authors’ algorithm remains tractable.

Q: How does the paper relate to Norets and Tang (2014), the closest alternative approach? A: Norets and Tang (2014) partially identify structural parameters and high-dimensional counterfactual CCPs by relaxing the assumed distribution of idiosyncratic shocks, focusing on binary choice models and using a pointwise-valid Bayesian approach. The present paper instead targets low-dimensional policy outcomes (nonlinear functions of payoffs and counterfactual CCPs), accommodates multinomial choice, provides asymptotically uniformly valid frequentist inference via subsampling, and restricts the source of underidentification to the payoff function rather than the error distribution. The two contributions are non-nested and complementary.

Q: What is the practical workflow the paper enables for applied researchers? A: A researcher can (i) select any combination of model restrictions (equality or inequality, parametric or shape), (ii) specify any counterfactual experiment via an affine payoff transformation (H, g), and (iii) define any low-dimensional outcome of interest φ, then directly compute the identified set and a valid confidence interval by solving two constrained optimization programs — without deriving new analytical identification results for each specification. The rank condition for checking the dimension of the identified set is computable from the data.

Dynamic Discrete Choice (DDC) Model. A discrete-time infinite-horizon model where agents choose among a finite action set each period, with per-period utilities additively separable into an observed payoff function π and an i.i.d. unobservable shock, and agents maximize expected discounted lifetime utility. The model is parameterized by payoffs π, transition function F, discount factor β, and shock distribution G.

Conditional Choice Probability (CCP). The probability that an agent selects a given action in a given state, integrating out the unobservable shocks. CCPs and state transitions are directly identifiable from panel data and serve as the sufficient statistics for the identified set, in place of the unidentified payoff vector.

Sharp Identified Set for Counterfactual CCPs. The set PĨ(p, F) of all counterfactual CCP vectors p̃ that are consistent with the observed data (p, F) and the imposed model restrictions, given the specified counterfactual transformation. Characterized as a smooth connected manifold with dimension equal to rank(CJ*QJ).

Local Counterfactual. A counterfactual experiment in which the payoff transformation H modifies only a subset L of the state-action pairs, leaving the rest unchanged. Local counterfactuals reduce the dimension of the identified set relative to global experiments, because only the payoffs in the affected subset matter for the unidentified component of the counterfactual response.

Partial Identification / Identified Set for Outcomes. Rather than seeking a unique estimate of a counterfactual outcome θ, partial identification recovers the set ΘI of all values of θ consistent with the data and restrictions. For scalar outcomes this is a compact interval [θL, θU] whose endpoints solve constrained optimization problems over payoff and counterfactual CCP spaces.

Subsampling Inference. A procedure for constructing asymptotically uniformly valid confidence sets by repeatedly computing the test statistic on subsamples of size hN < N, approximating the sampling distribution of ĴN(θ0) without requiring regularity (smoothness) of the boundary of the constraint set — a requirement that fails here due to the kinked, nonconvex nature of the identified set.

Rank Condition for Dimension. The dimension of the identified set for counterfactual CCPs is determined by the rank of the matrix CJ*QJ, which depends on the counterfactual transformation H, the model restrictions, and the observed data. The econometrician can compute this rank from observables to assess, before imposing any strong assumptions, how many dimensions of freedom remain in the identified set.

Credit Easing versus Quantitative Easing: Evidence from Corporate and Government Bond Purchase Programs

Mon, 01 Jan 0001 00:00:00 +0000

Using security-level data on individual corporate bond prices and the Bank of England’s published purchase quantities across its gilt purchase programs (QE1: £200bn, QE2: £125bn, QE3: £50bn, QE4: £60bn) and Corporate Bond Purchase Scheme (CBPS: £10bn of investment-grade sterling corporate bonds), this paper estimates supply effects of QE and CE on UK corporate bond prices, credit spreads, and new issuance separately, exploiting cross-sectional variation in quantities purchased as identifying variation via an instrumental variables approach. In the case of QE alone, supply effects on corporate bond prices are significant at announcement and larger over the full stock-effect horizon, but pass-through to credit spreads is found to be limited to the default-free component of corporate yields under normal market conditions — an exception is QE1 during the financial crisis, when QE’s cross-asset supply effects also significantly lowered credit spreads in the longer run. CE via the CBPS is found to be more effective than QE in reducing credit spreads for higher-rated investment-grade bonds even under normal conditions, and is the only program that generates a statistically significant increase in sterling corporate bond issuance. The results are consistent with QE and CE working through partially distinct channels — QE primarily affecting the default-free component of corporate yields, CE additionally compressing the credit-spread component — and complementing each other for higher-rated bonds.

In depth

Q1. What is the empirical strategy and why use a security-level approach?

The paper uses a two-stage instrumental variables (IV) approach at the individual corporate bond level, with pre-program bond characteristics — maturity, yield-curve fitting errors, the BoE’s prior ownership share in the gilt bucket — serving as instruments for the expected distribution of purchases across bonds, allowing isolation of the supply channel from signaling and duration channels. The security-level approach offers three advantages over aggregate or event-study methods: it enables construction of “substitute buckets” (bonds whose maturity is close to the purchased bonds’) to estimate cross-asset supply effects; it permits direct comparison of the price elasticity with respect to gilt purchases (cross-asset effect) versus corporate bond purchases (within-asset effect); and it allows estimation of both the announcement-day effect and the stock effect — the cumulative price and spread change over the life of each program — which captures the longer-run portfolio-rebalancing contribution separately from the initial market reaction.

Q2. What are QE’s effects on corporate bond prices and credit spreads?

For QE alone (QE1–3), the instrumented gilt substitute purchases have positive and statistically significant effects on corporate bond prices at announcement across all three programs — in the case of QE1, the average 30 basis-point decline in corporate yields on the announcement day is attributed in full to QE supply effects in the paper’s regression. The stock effect — estimated over the full life of each program — is significantly larger than the announcement-day effect, consistent with gradual portfolio rebalancing as predicted by Greenwood, Hanson, and Liao (2018). However, except for QE1, the supply effects do not carry through to credit spreads in either the short run or the longer run, which the paper interprets as consistent with QE working primarily through the default-free component of the corporate yield: corporate yields fell in line with gilt yields, but spreads over gilts were unchanged.

Q3. When does QE affect credit spreads?

QE1’s cross-asset supply effects significantly lowered credit spreads in the longer run, even though QE2 and QE3 do not generate significant credit spread compression in either the short or long run, suggesting that the supply channel interacts with the liquidity channel specifically under conditions of financial market distress. The paper interprets the QE1 exception as reflecting the severe disruption during the 2008–09 financial crisis: when capital mobility across markets is constrained and liquidity premia are elevated, central bank purchases of safe assets may also improve trading conditions in indirectly targeted, less liquid markets such as the corporate bond market, reducing the liquidity component of corporate spreads. This interaction does not appear to be operative in the more normal market conditions of QE2 and QE3.

Q4. How does CE compare to QE in reducing credit spreads and stimulating issuance?

CE via the CBPS is found to be more effective than QE in reducing credit spreads for higher-rated investment-grade bonds even under normal financial market conditions, and a corporate bond’s price sensitivity to its own CBPS purchases is substantially higher than its price sensitivity to gilt substitute purchases; CE is also the only program with a statistically significant positive effect on new sterling corporate bond issuance. Across QE1–3, there is no statistically significant impact of gilt purchases on sterling corporate issuance, while CBPS purchases have positive and statistically significant effects on new sterling corporate bond issuance. The paper characterizes CE and QE as complementary for higher-rated bonds: CE’s credit-spread reduction layers on top of QE’s default-free component effect, making the total stock effect larger than either program alone.

Q5. What happens for lower-rated investment-grade bonds?

For lower-rated investment-grade bonds, the evidence for both cross-asset QE supply effects and within-asset CE supply effects is weaker, and the paper suggests that CE’s stimulation of new bond issuance may have counterbalanced its positive price effects for these bonds through the dilutive effect of new supply. The mechanism is that CE’s reduction in the cost of corporate bond issuance for lower-rated firms induced enough new bond issuance to partially offset the price increase from CBPS purchases, consistent with the issuance channel being most active for the market segment where CBPS created the largest pricing improvement. This dilution effect implies that the net price benefit of CE for lower-rated bonds is smaller than the gross supply-effect estimate.

Key concepts

stock effect : the cumulative effect of the total quantity of bonds purchased under a program on bond prices and spreads, estimated over the full life of the program; in this paper the stock effect is significantly larger than the announcement-day effect, consistent with gradual portfolio rebalancing.

cross-asset supply effect : the pass-through of government bond (gilt) purchase supply shocks to the prices of corporate bonds — an asset class not directly targeted by QE; the paper provides the first estimates of this cross-market supply channel at the security level.

credit spread : the difference between the yield on a corporate bond and the yield on a risk-free government bond of the same maturity; the paper finds QE pass-through is generally limited to the default-free component of corporate yields rather than the credit spread.

default-free component : the part of a corporate bond’s yield attributable to the risk-free interest rate rather than credit risk; the paper finds that QE supply shocks affect this component but generally leave the credit spread unchanged in normal market conditions.

within-asset substitution effect : the price effect of CE purchases on the bonds directly purchased and their corporate bond substitutes, as distinct from cross-asset effects; the paper finds this effect is substantially larger in magnitude than the cross-asset QE effect on corporate bonds.

issuance channel : the mechanism by which lower corporate borrowing costs induced by CE stimulate new corporate bond issuance; the paper finds this channel operates under CE (CBPS) but not under QE (gilt purchases).

Customer Acquisition, Business Dynamism and Aggregate Growth

Mon, 01 Jan 0001 00:00:00 +0000

This paper asks whether firm-level customer acquisition — distinct from productivity differences — is a quantitatively important driver of aggregate economic growth, and whether ignoring it distorts predictions about growth policy efficacy. The authors build a novel endogenous growth model in which innovating firms must first accumulate customers to sell their products, with two channels of customer acquisition operating simultaneously: costly sales-and-marketing expenditure and below-static-markup pricing (sales-driven accumulation). The model is estimated using indirect inference against a combination of aggregate data (U.S. real GDP per worker growth of 1.43% annually, 1979–2019), Business Dynamics Statistics (BDS) life-cycle profiles, and firm-level data from Compustat matched to Capital IQ’s sales-and-marketing expense records covering 1997–2019.

The benchmark model yields four closed-form propositions. First, a “firm-level market size effect”: higher customer retention raises a firm’s future profit base, strengthening incentives to conduct R&D. Second, an endogenous feedback loop: more productive firms invest more in customer acquisition, which expands their customer base and further strengthens R&D incentives. Third, customer base accumulation raises aggregate growth, but only indirectly — by boosting firm-level innovation rates — since aggregate productivity is a customer-weighted average of firm productivity levels. Fourth, the sensitivity of innovation to R&D subsidies increases with customer base growth, because firms with faster-growing customer bases discount future profits less steeply.

In the quantitatively estimated full model — which relaxes the benchmark’s perfect-scaling restrictions and endogenizes firm entry and exit — the authors conduct two decomposition exercises. In a counterfactual scenario where expected customer retention is reduced to make average customer base growth zero among continuing businesses, firm-level innovation rates fall by approximately 40% relative to the full model. Of this 40% decline, only about 6 percentage points are attributable to the direct firm-level market size effect alone; the vast majority is driven by the endogenous feedback loop between innovation and customer acquisition. In a second decomposition focused on aggregate growth, the firm-level market size effect and a reallocation effect — whereby the feedback loop concentrates customers among high-productivity firms — together account for 44% of aggregate growth in the full model.

On policy, the authors compare R&D subsidies and operational subsidies in the full model against an otherwise identical model that ignores customer accumulation. R&D subsidies are approximately twice as effective at boosting aggregate growth in the full model as in the model without customer accumulation. Conversely, operational subsidies produce a stronger decline in aggregate growth in the full model than in the benchmark-without-customer-accumulation, because aggregate growth in the full model is a customer-weighted average of firms’ productivity growth rates, making the joint distribution of productivity and customer bases the relevant object of study.

Firm-level data support three empirical predictions. Marketing expenditure, R&D intensity, and markups co-move in model-consistent directions both contemporaneously and over the life cycle. The estimated relative weight of marketing versus pricing as channels of customer accumulation is γ = 0.745, indicating marketing is the dominant channel. A model-consistent proxy for the severity of customer-base frictions, estimated in the cross-section of industries, shows that stronger frictions correlate with lower R&D investment, as predicted. The customer-base depreciation rate is estimated at ζ = 0.375, R&D cost scaling at σx = 1.264, and marketing cost scaling at σa = 1.405.

Q: What is the firm-level market size effect and why does it arise? A: When a firm retains more customers, successful innovations apply to a larger market, raising the profitability of each unit reduction in production costs. This increases the marginal benefit of R&D investment. In the benchmark model, Proposition 2(a) shows formally that firm-level innovation increases with customer base growth: ∂x/∂(1−ζ) > 0, where ζ is the customer separation rate.

Q: What is the endogenous feedback loop between innovation and customer accumulation? A: More productive firms have lower production costs and can therefore afford greater investment in marketing and can set lower markups, both of which attract more customers. A larger customer base raises firm value and strengthens R&D incentives further (Proposition 2(b)). This bidirectional feedback means that productivity growth and customer accumulation are jointly determined in equilibrium, not independent processes.

Q: How large is the quantitative effect of customer accumulation on firm-level innovation? A: In the counterfactual where expected customer retention is reduced so that average customer base growth among continuing firms is zero, firm-level innovation rates are approximately 40% lower than in the full model. Of this, only about 6% (of the total drop) is attributable to the direct market size effect in isolation; the feedback loop accounts for the remaining roughly 34 percentage points.

Q: How much of aggregate growth do customer-acquisition channels explain? A: The firm-level market size effect and a customer reallocation effect together account for 44% of aggregate growth in the full model. The firm-level market size effect alone reduces aggregate growth by about one-fifth (20%) in the relevant counterfactual. The reallocation effect — by which productive firms accumulate disproportionate market share — contributes the remainder of the 44%.

Q: What is the reallocation channel for aggregate growth? A: Because highly productive firms can invest more in customer acquisition, the feedback loop endogenously concentrates customers (market shares) among high-productivity firms. Since aggregate productivity in the model is a customer-weighted average of firm productivity levels (equation 16), this reallocation raises aggregate productivity growth beyond what the firm-level R&D incentive effect alone would produce.

Q: How does customer accumulation change the efficacy of R&D subsidies? A: R&D subsidies are approximately twice as effective at raising aggregate growth in the full model (with customer accumulation) as in an otherwise identical model that ignores customer accumulation. The mechanism is Proposition 4(b): faster customer base growth makes firms weight future profits more heavily, increasing their sensitivity to any change in R&D costs, including that brought about by a government subsidy.

Q: What happens to aggregate growth under operational subsidies in the two models? A: Operational subsidies lead to a stronger decline in aggregate growth in the full model than in the model without customer accumulation. The reason is that aggregate growth in the full model depends on the joint distribution of firm productivity and customer bases; operational subsidies alter this distribution in ways that reduce the customer-weighted average of productivity growth rates, an effect absent when customer accumulation is ignored.

Q: How are the two customer-acquisition channels (marketing and pricing) measured empirically? A: Marketing is measured using sales-and-marketing expenses from Capital IQ, available for 48% of the Compustat sample (34% report directly; an additional 14% report advertising or marketing sub-components). Markups are measured following De Loecker et al. (2020) as the inverse share of variable costs in sales multiplied by the cost-output elasticity, with variation across firms identified from balance sheet data under the assumption that cost-output elasticities are constant within industry-year cells.

Q: What is the estimated relative strength of marketing versus pricing in customer accumulation? A: The relative weight on marketing is γ = 0.745, estimated by targeting the coefficient βµ = 0.04 (standard error 0.01) from a reduced-form regression of firm-level sales growth on changes in markups (equation 29). This implies that marketing is the dominant channel, consistent with evidence in Afrouzi et al. (2021) and Fitzgerald et al. (forthcoming).

Q: What is the estimated customer-base depreciation rate and how is it disciplined? A: The depreciation rate ζ is estimated at 0.375, targeted to match average firm-level employment growth from the BDS. This falls toward the lower end of existing estimates, which range from about 0.3 to 0.7 across studies.

Q: How do R&D costs scale with firm size in the estimated model? A: The R&D cost scaling parameter is σx = 1.264, estimated by targeting the reduced-form coefficient of −0.01 from a regression of log R&D intensity on log sales with industry-time fixed effects (equation 28). This is close to the estimate in Akcigit and Kerr (2018).

Q: How do marketing costs scale with firm size? A: The marketing cost scaling parameter is σa = 1.405, estimated by targeting a reduced-form coefficient of −0.01 from a regression of log sales-and-marketing intensity on log sales with industry-time fixed effects (equation 30).

Q: What empirical co-movement evidence supports the model’s predictions? A: In the cross-section of firms, marketing expenditure, R&D intensity, and markups all co-move in model-predicted directions, for both static (contemporaneous) relationships and dynamic (life-cycle) patterns. Additionally, a model-consistent industry-level proxy for the severity of customer-base frictions shows that stronger frictions are associated with lower R&D investment, as the model predicts.

Q: How does endogenous firm exit work in the full model and why does it differ from standard models? A: Firms pay a stochastic per-period operational cost and exit when that cost exceeds a threshold κ*_j = v(q_j, b_j)/W. Unlike standard growth models where exit depends only on productivity, here the exit threshold depends on both productivity and accumulated customers, so customer loss can trigger exit even for relatively productive firms.

Q: What data sources are used and what are their key limitations? A: The three primary firm-level sources are the Census Bureau’s BDS (broad coverage, employment-focused), Compustat (rich financial data but limited to publicly traded firms and lacking direct customer-acquisition measures), and Capital IQ (sales-and-marketing expenses available from 1997, matched to 91% of the Compustat sample). To address Compustat’s non-representativeness, employment-based weights aligning Compustat and BDS firm-size distributions are applied when computing model moments against Compustat targets.

Firm-level market size effect: The mechanism by which higher customer retention raises a firm’s future profit base — because lower production costs from successful innovation apply to a larger market — thereby strengthening incentives to conduct R&D. This is the primary channel linking customer accumulation to innovation.

Customer base (b_j): The mass of household members consuming a firm’s product variety, which varies endogenously across firms. It enters demand directly (equation 4) and serves as a state variable in the firm’s value function alongside productivity.

Endogenous feedback loop: The bidirectional reinforcement between productivity growth and customer accumulation. More productive firms invest more in customers; a larger customer base raises the value of innovation; higher innovation raises productivity further.

Reallocation effect: The concentration of customers (market shares) toward high-productivity firms that arises endogenously from the feedback loop, contributing to aggregate growth because aggregate productivity is a customer-weighted average of firm-level productivity.

Customer-base depreciation rate (ζ): The exogenous rate at which a firm loses its existing customers each period, estimated at 0.375 in the paper’s calibration. It governs the baseline speed of customer attrition and is the key parameter for the firm-level market size effect.

Sales-and-marketing expenses: Expenditures on sales force, brand development, customer service, advertising, and customer data acquisition — measured from Capital IQ — that directly drive marketing-based customer accumulation (the dominant channel with estimated weight γ = 0.745).

Perfect scaling (Assumption 1): The benchmark restriction that R&D and marketing costs, and the sales-driven customer accumulation benefit, all scale one-for-one with a composite of firm productivity and customer base. This assumption enables closed-form solutions and is relaxed in the full model using estimated scaling parameters.

Cyberattacks on Small Banks and the Impact on Local Banking Markets

Mon, 01 Jan 0001 00:00:00 +0000

This paper studies what happens to local banking markets when a small bank suffers a successful cyberattack, using a stacked difference-in-differences design on 16 cyber incidents at small U.S. banks drawn from the Privacy Rights Clearinghouse database over 2005–2017. Attacked small banks experience a deposit growth rate roughly 22 percentage points lower than matched control banks in the two years following a breach, reflecting depositors’ loss of confidence in the targeted institution’s cybersecurity capacity. The deposit attrition is sharply stronger in counties with lower digital literacy, consistent with less-informed depositors placing disproportionate weight on a visible security failure. Deposit losses do not flow evenly to all competitors: positive spillovers accrue only to the dominant or largest banks in the local market, not to other small banks, concentrating market share toward large incumbents. Affected small banks subsequently attract riskier mortgage borrowers — proxied by higher loan-to-value ratios and lower FICO scores — suggesting that the deposit-cost pressure from a cyberattack induces yield-seeking behavior. The aggregate effect is a reduction in credit access for informationally opaque small borrowers that slows local small-establishment growth.

In depth

Q1. What is the identification strategy and what variation does it exploit?

The paper uses a stacked difference-in-differences design that stacks sub-experiments around each of the 16 cyberattack events, comparing the attacked small bank against a matched set of control banks in the same local market that did not experience a breach, with the event window centered on the quarter of the reported breach. The primary data source for cyber incidents is the Privacy Rights Clearinghouse (PRC) database, which records data breaches across industries; the paper restricts attention to incidents at U.S. commercial banks with total assets below a size threshold that classifies them as small. The stacking design allows each attack event to contribute its own two-by-two (pre/post, treated/control) comparison while controlling for time fixed effects across all events, which is important because cyberattacks cluster in certain periods. Identification relies on the parallel-trends assumption: absent the cyberattack, the deposit growth trajectory of the attacked bank would have evolved like that of matched local competitors. The paper validates this assumption with pre-trend tests and provides a battery of robustness checks including alternative matching procedures and excluding events that coincide with other bank-specific news.

Q2. How large is the deposit effect at attacked small banks and what is the direction of deposit flows?

Attacked small banks see deposit growth rates approximately 22 percentage points lower than control banks over the two years following a breach, a decline that is economically large relative to the unconditional mean deposit growth rate in the sample. The market-share impact is of the order of 1 percentage point lower for the attacked bank. Crucially, the deposit outflows do not disperse evenly to all rivals: the paper finds positive and statistically significant deposit spillovers only at the dominant large bank (or banks) in the local market, with no measurable increase at competing small banks. This asymmetric spillover is consistent with depositors fleeing to scale — perceiving large banks as having the technological resources and regulatory scrutiny to maintain cybersecurity — rather than simply seeking any alternative.

Q3. What role does digital literacy play in moderating the deposit response?

The deposit effect is significantly stronger in counties with below-median digital literacy, measured using population-weighted indices of internet connectivity and self-reported computer use from the American Community Survey, suggesting that less-digitally-literate depositors rely more heavily on observable security signals — such as a publicized breach — when assessing bank safety. In high-digital-literacy counties, the average customer may already have some prior belief about cyber risk across institutions and may discount a single breach as less informative, dampening the flight-to-quality response. In low-digital-literacy counties, the breach is a more salient and credibility-destroying event. The heterogeneity is quantitatively meaningful and survives controlling for MSA-level income, education, and urbanization.

Q4. How does the competitive position of large banks in the local market moderate the spillover?

When a large bank holds a dominant market position prior to the attack — measured by having a market share above the 75th percentile of the local deposit distribution — the positive deposit spillover to that large bank is more than 30 percentage points larger than in markets where large banks hold a weaker position, pointing to a flight-to-incumbency effect that operates on top of the flight-to-scale effect. This finding implies that cyberattacks on small banks are particularly concentrating in markets where large banks are already dominant: the attack accelerates an existing market-share gradient rather than creating a new one. The result has policy relevance for local banking market competition: communities that already have concentrated banking sectors are more exposed to structural concentration following cyber events.

Q5. Do deposit rates at attacked small banks rise or fall, and does this signal a funding-cost channel?

Deposit rate evidence in the working paper suggests that attacked small banks do not uniformly raise deposit rates to retain customers, which is consistent with the deposit outflows being driven by non-price concerns about security rather than competitive pricing, and which rules out a simple funding-cost-through-repricing mechanism. The absence of a strong deposit-rate increase at the attacked bank indicates that depositors are responding to a qualitative signal about the bank’s cybersecurity capacity rather than being price-insensitive. This matters for the economic interpretation: the mechanism is loss of depositor confidence rather than increased funding costs passed through from the attack’s direct remediation expenses.

Q6. What happens to the loan portfolio and borrower risk profile of attacked small banks after a breach?

In the post-attack period, affected small banks shift their mortgage originations toward riskier borrowers, with originations showing higher average loan-to-value ratios and lower average FICO scores relative to the pre-attack period and relative to control banks, consistent with yield-seeking behavior driven by the deposit-funding squeeze. This borrower-quality deterioration implies a second-order financial stability concern beyond the immediate deposit loss: attacked banks may take on more risk in the loan book at precisely the moment when their funding base is weakening. The evidence is thus consistent with a mechanism in which the cyberattack triggers a cascade — deposit loss → funding pressure → reach-for-yield → loan-quality deterioration.

Q7. What are the real-economy consequences at the local market level?

Counties that experience a cyberattack on a local small bank show lower subsequent small-establishment growth relative to control counties, measured using County Business Patterns data on establishments with fewer than 20 employees, consistent with reduced small-business credit availability as small banks contract lending. Large banks that absorb deposit inflows from the attacked institution do not offset this credit reduction: the deposit inflows do not translate into proportionate increases in small-business or small-mortgage lending, reflecting the well-documented diseconomy of scale in relationship lending by large institutions. The real-economy effect is concentrated in counties where small banks had a larger pre-attack share of local deposits and credit, consistent with the mechanism that the effect operates through credit-supply disruption rather than demand shocks.

Q8. How does this paper relate to the literature on bank runs and financial contagion?

Unlike classic bank-run models in which depositor withdrawals are self-fulfilling or triggered by sunspot-like coordination failures, this paper’s results suggest that cyberattacks constitute an information event that rationally updates depositors’ beliefs about the attacked bank’s technological competence, generating a fundamentals-based run on the specific institution rather than systemic panic. The results complement the emerging literature on cyber risk in financial institutions (e.g., Kashyap and Wetherilt 2019, Eisenbach et al. 2022) by documenting market-level spillovers and real effects beyond the attacked institution. The finding that large banks absorb deposits following attacks on small banks also connects to the “too-big-to-fail” literature by showing that size confers a competitive advantage in moments of localized financial stress.

Key Concepts

stacked difference-in-differences : an event-study design in which multiple treatment events are each assigned their own pre/post comparison window, the sub-experiments are then stacked into a single dataset, and pooled regressions with event-by-period fixed effects estimate the average treatment effect; used in this paper to exploit variation across 16 separate cyberattack events at small banks.

Privacy Rights Clearinghouse (PRC) database : a publicly available database of data-breach incidents across industries in the United States, which the paper uses as the primary source for identifying confirmed cyberattacks on commercial banks; the paper restricts to incidents classified as hacking or skimming rather than physical theft or accidental exposure.

deposit spillover : the increase in deposit inflows to competitor banks in the same local market following a cyberattack on a rival institution; in this paper, measured as the change in deposit growth at non-attacked banks relative to their own pre-attack trends.

flight-to-scale : the pattern in which depositors shift funds from smaller to larger banks following a cyber incident, driven by the belief that larger banks have superior cybersecurity resources; the paper documents that this flight benefits only the largest local bank rather than all large banks.

digital literacy : a county-level index measuring residents’ familiarity with digital technologies, internet access, and computer use; used in the paper to test whether depositor reactions to cyberattacks are stronger where depositors have less prior information about cyber risk.

reach-for-yield : the tendency of a bank with a weakened funding base to shift its loan portfolio toward higher-yielding, riskier borrowers to maintain net interest margins; documented in this paper as a behavioral response of attacked small banks in the post-breach period.

De Gustibus and Disputes about Reference Dependence

Mon, 01 Jan 0001 00:00:00 +0000

This paper examines whether heterogeneity in individual gain-loss attitudes — the degree to which people weigh losses more or less severely than equivalent gains — contaminates prior tests of expectations-based reference dependence (EBRD). The central question is: do prior experiments that appear to yield mixed or null evidence against EBRD actually reflect a failure of the expectations-based reference point, or instead reflect a methodological flaw — the implicit assumption that all individuals are uniformly loss averse?

All prior tests of EBRD models (e.g., Kőszegi and Rabin 2006, 2007) have proceeded under what the authors call “universal loss aversion,” the assumption that every individual weighs losses more heavily than commensurate gains (λ > 1). The authors argue that this assumption — a form of the classic De Gustibus conjecture — is empirically incorrect and theoretically distorting: within EBRD designs, loss-averse and gain-seeking subjects are predicted to respond in opposite directions to expectations manipulations, so aggregating across them suppresses or reverses treatment effects.

The authors run two pre-registered laboratory experiments totaling 1,524 subjects. The labor supply experiment (N = 500, UC San Diego) uses a two-stage design. Stage 1 elicits each subject’s gain-loss attitude parameter λ_i from their effort responses to fixed versus uncertain piece rates in a real-effort transcription task, exploiting the prediction that loss-averse workers reduce effort under wage uncertainty while gain-seeking workers increase it. Stage 2 manipulates expectations by varying the probability of a high outside payment (p = 0.05 in Condition Low vs. p = 0.45 in Condition High), holding the piece-rate probability constant at 50%; under EBRD, this shifts the reference point and should change effort in a direction governed by λ_i.

The exchange experiment (N = 1,024, University of Bonn, with a pre-registered 2018 replication of N = 417) uses Stage 1 preference statements over randomly endowed objects to estimate λ_i, and Stage 2 manipulates expectations via a 0% vs. 50% probability of forced exchange. Under EBRD, loss-averse subjects should become more willing to exchange in the High condition; gain-seeking subjects should become less willing.

Both experiments document substantial heterogeneity in gain-loss attitudes. In the labor supply study, approximately 70.6% of subjects exhibit loss aversion (λ̂ > 1) and 29.4% exhibit gain-seeking (λ̂ < 1), with an average structural estimate of λ̂ = 1.65 and median 1.66. In the exchange study, 76% are loss averse and 24% are gain-seeking, with mean λ̂ = 1.49 and median 1.34. Lottery-based elicitation in the labor supply experiment yields 28% gain-seeking, consistent with prior literature estimates of roughly 22% gain-seeking from Chapman et al. (2018).

Crucially, Stage 1 gain-loss attitudes are strongly predictive of Stage 2 treatment effects in both experiments. In the labor supply study, the aggregate treatment effect of approximately 26% greater effort in Condition High — reproducing Abeler et al. (2011) — masks strongly heterogeneous responses: higher λ̂ predicts larger positive treatment effects (raw correlation ρ = 0.18, p < 0.01), and controlling for heterogeneous gain-loss attitudes raises R² by more than a factor of 10. In the exchange study, the aggregate treatment effect is precisely zero (coefficient = 0.00, clustered s.e. = 0.03), a result that prior literature would interpret as contradicting EBRD; but once gain-loss heterogeneity is accounted for, treatment effects are strongly positive for loss-averse subjects and negative for gain-seeking subjects, again raising R² by more than a factor of 10.

Gain-seeking subjects exhibit negative treatment effects in the exchange study, consistent with EBRD predictions, but in the labor supply study the average treatment effect for gain-seeking subjects remains slightly positive, representing a partial deviation from the model’s quantitative predictions. The authors interpret this as evidence that expectations-based reference points are an important but likely incomplete determinant of behavior, with attention-based, status-quo-based, or anchoring-based reference points potentially playing supplementary roles.

Q: What is the central methodological problem with prior tests of expectations-based reference dependence?

A: All prior tests assumed universal loss aversion — that every individual has λ > 1, i.e., weighs losses more severely than equivalent gains. The authors show this is both empirically wrong (roughly 24–29% of subjects are gain-seeking across both studies) and theoretically distorting: within EBRD designs, gain-seeking individuals are predicted to respond in the opposite direction from loss-averse individuals, so averaging across heterogeneous types can suppress, zero out, or even reverse the true treatment effect. This makes standard aggregate tests of EBRD unreliable.

Q: How do the authors measure gain-loss attitudes in the labor supply experiment?

A: In Stage 1, subjects make 30 effort decisions across fixed piece rates and uncertain piece rates with the same mean. Under the Kőszegi-Rabin CPE model, a loss-averse individual reduces effort when the wage is uncertain (because outcomes can fall below the reference point), while a gain-seeking individual increases effort under uncertainty. The authors estimate individual-level parameters by regressing log(e_i + 10) on log(w) and Δw/w in a random-coefficients framework; the coefficient l̂_i on Δw/w is the reduced-form measure of gain-loss attitudes, with λ̂_i = 1 + 4·(l̂_i/ĝ_i) as the structural estimate. The correlation between the two measures is ρ = 0.85 (p < 0.01).

Q: How do the authors measure gain-loss attitudes in the exchange experiment?

A: In Stage 1, subjects are randomly endowed with one of two objects and provide three unincentivized preference statements (relative liking, relative wanting, and hypothetical choice) before any possibility of exchange is introduced. Under CPE, an individual endowed with object X will prefer X to the extent that (1 + λ_i) − 2(Y/X) > 0, so subjects with higher λ_i should more strongly favor their endowment. A principal components analysis reduces the three statements to one factor (capturing ~70% of variation), and residuals from regressing that factor on object assignment constitute the reduced-form measure l̂_i. The structural estimate λ̂_i is obtained via a mixed logit using a log-normal distribution for λ_i; the reduced form and structural measures are correlated at r = 0.95 (p < 0.01).

Q: What does the distribution of gain-loss attitudes look like across the two experiments?

A: In the labor supply experiment (N = 453 estimable subjects), 70.6% are loss averse and 29.4% are gain-seeking, with mean λ̂ = 1.65 and median λ̂ = 1.66. In the exchange experiment (N = 1,024), 76% are loss averse and 24% are gain-seeking, with mean λ̂ = 1.49 and median λ̂ = 1.34. A separate lottery-based elicitation in the labor supply study finds 28% gain-seeking subjects. These proportions are consistent with the weighted average of 22% gain-seeking found by Chapman et al. (2018) across seven prior lottery-choice studies.

Q: What is the aggregate treatment effect in the labor supply experiment, and what does it look like once heterogeneity is accounted for?

A: Without accounting for gain-loss heterogeneity, Condition High is associated with roughly a 26% increase in effort relative to Condition Low (individual-clustered s.e. = 0.03, p < 0.01), reproducing the Abeler et al. (2011) result and consistent with EBRD under universal loss aversion. However, R² = 0.03. Once interactions of Condition High with l̂_i and λ̂_i are included, R² rises to 0.40 and 0.39 respectively — more than a tenfold increase. Higher λ̂_i predicts larger positive treatment effects (raw correlation ρ = 0.18, p < 0.01), and the interaction of Condition High with λ̂_i is highly significant (F(1,452) = 49.14, p < 0.01).

Q: What is the aggregate treatment effect in the exchange experiment, and what does it look like once heterogeneity is accounted for?

A: Without heterogeneity, the treatment effect of Condition High on the probability of exchanging is precisely 0.00 (clustered s.e. = 0.03), which prior literature would read as a failure of EBRD. Once heterogeneity is introduced via interactions with l̂_i and λ̂_i, the pattern changes markedly: loss-averse subjects show positive treatment effects (greater willingness to exchange in High), while gain-seeking subjects show negative treatment effects (less willingness to exchange in High), consistent with Predictions 4–6. R² again rises by more than a factor of 10. In Condition Low, 38% of subjects exchange, reflecting a significant endowment effect (F(1,1022) = 25.66, p < 0.01).

Q: Why does the aggregate treatment effect in the exchange experiment equal zero?

A: The authors show in Appendix B.4 that the relationship between λ_i and exchange probability treatment effects can be concave — negative effects for gain-seeking subjects can be of greater absolute magnitude than positive effects for loss-averse subjects. With roughly 24% gain-seeking and 76% loss-averse subjects, aggregation can yield a near-zero average even when heterogeneous effects are substantial and directionally consistent with EBRD. This aggregation problem, not a failure of the expectations-based reference point mechanism, explains the null aggregate result.

Q: Do gain-loss attitudes measured in one domain predict behavior in another domain?

A: The lottery-based measure of gain-loss attitudes (from Multiple Price Lists administered after the real-effort task in the labor supply experiment) has mean λ̂ = 1.48 and median 1.42, with 28% gain-seeking subjects — proportions similar to the labor supply estimates. However, the correlation between the lottery-based and labor-supply-based structural estimates of λ̂ is only Pearson’s r = 0.091 (p = 0.03) and Spearman’s ρ = 0.084 (p = 0.075). Furthermore, the lottery measure has no predictive power for Stage 2 treatment effects. This suggests that while the prevalence of gain-seeking is similar across domains, gain-loss attitudes at the individual level are more domain-specific than prior work has appreciated.

Q: How do the authors address the “generated regressor problem” when using estimated λ̂_i as a regressor?

A: Since λ̂_i is itself estimated from Stage 1 data, using it directly as a regressor in Stage 2 regressions treats imprecise preference estimates as ideal data, which can distort inference (the Murphy-Topel problem). The authors address this by bootstrapping the entire pipeline — re-estimating gain-loss attitudes from Stage 1 in each of 500 bootstrap iterations and re-running the Stage 2 regressions — then reporting the average bootstrap coefficient and its standard deviation. The bootstrapped conclusions are qualitatively identical to the original regression results in both experiments.

Q: What limitations do the authors acknowledge in the EBRD model’s fit?

A: Even after accounting for heterogeneity, the EBRD model does not provide a complete quantitative account of behavior. In the labor supply experiment, gain-seeking subjects exhibit slightly positive average treatment effects (not negative as predicted), and loss-averse subjects’ empirical treatment effects fall short of theoretical predictions, despite a significant correlation between predicted and empirical treatment effects (ρ = 0.25, p < 0.01). The authors attribute these deviations to potential measurement error (which would attenuate estimated relationships), and to the possibility that reference points have multiple determinants — including status quo-based, attention-based, and anchoring-based factors — beyond expectations alone.

Q: What are the broader implications for other applications of gain-loss attitudes?

A: The paper’s findings have implications for any application that relies on universal loss aversion as a maintained assumption, including Rabin’s (2000) calibration argument for risk aversion at small and large stakes, insurance demand for small losses (Slovic et al., 1977), and preferences for bunched resolution of uncertainty (Kőszegi and Rabin, 2009). Admitting heterogeneity in gain-loss attitudes will require more nuanced predictions in each of these settings. The paper provides a methodology — measuring individual-level gain-loss attitudes within the experimental context of interest — for investigating and controlling for such heterogeneity.

Q: What design features prevent confounds between Stage 1 measurement and Stage 2 treatment in the exchange experiment?

A: Stage 1 uses a different pair of objects (USB stick and pens) than Stage 2 (picnic mat and thermos), or vice versa — each subject encounters each pair exactly once, with counterbalancing at the session level. Stage 1 preference statements are unincentivized and made before any possibility of exchange is introduced, so they do not contaminate the Stage 2 expectations manipulation. The random reassignment of objects at the end of Stage 1 generates exogenous variation in endowments, preventing mechanical confounds. The authors also verify that interpreting Stage 1 variation as reflecting heterogeneity in object valuations (rather than gain-loss attitudes) would predict zero heterogeneous treatment effects in Stage 2 — a prediction rejected by the data.

Expectations-Based Reference Dependence (EBRD): The formulation, due to Kőszegi and Rabin (2006, 2007), in which an individual’s reference point is the entire distribution of outcomes they rationally expected, rather than a fixed status quo. Behavior is governed by a Choice-Acclimating Personal Equilibrium (CPE) in which the chosen action is optimal given that the expectation of that action serves as the reference.

Gain-Loss Attitudes (λ_i): The individual-specific parameter governing how outcomes above versus below the reference point affect utility. Under piecewise-linear gain-loss utility, an outcome that falls short of the reference by z reduces utility by η·λ_i·z, while an outcome above it raises utility by η·z. Loss aversion is λ_i > 1; gain-seeking is λ_i < 1; loss neutrality is λ_i = 1. In this paper, λ_i is treated as heterogeneous across individuals rather than assumed uniform.

Universal Loss Aversion: The implicit homogeneity assumption maintained in all prior tests of EBRD — that every individual has λ > 1. The authors characterize this as a form of the De Gustibus Non Est Disputandum conjecture applied to gain-loss attitudes, and document that it fails empirically in both experimental settings.

Choice-Acclimating Personal Equilibrium (CPE): The rational expectations equilibrium concept from Kőszegi and Rabin (2006, 2007) used throughout the paper to derive comparative statics. A choice is a CPE if its expected utility given its own expectation as the reference exceeds the expected utility of any alternative given that alternative’s expectation as the reference.

Reduced-Form Gain-Loss Measure (l̂_i): In the labor supply context, the individual-level OLS coefficient on Δw/w in a log-effort regression — capturing how strongly a subject reduces (or increases) effort under wage uncertainty relative to a fixed wage of equal mean. A positive l̂_i identifies loss aversion; negative identifies gain-seeking. In the exchange context, the analogous measure is the residual from regressing the first principal component of Stage 1 preference statements on object assignment.

Aggregation Problem: The paper’s central methodological contribution — when gain-loss attitudes are heterogeneous and the EBRD treatment effect is non-linear in λ_i, the average treatment effect across a heterogeneous population need not equal the treatment effect at the average λ. In the exchange experiment, the aggregate treatment effect is precisely zero even though loss-averse and gain-seeking subjects each respond in the theoretically predicted (opposite) direction, because the concave relationship between λ_i and the exchange probability treatment effect causes negative gain-seeking effects to dominate in the aggregate.

Debasements and Small Coins: An Untold Story of Commodity Money

Mon, 01 Jan 0001 00:00:00 +0000

This paper applies a multiple-denomination commodity money model — building on Lee, Wallace, and Zhu (2005) — to coinage episodes in late medieval England, and derives two main findings. Shortages of small coins are severely inconvenient because halfpennies and farthings serve not merely as small change but as consumption-smoothing instruments: parameterized to 15th-century England (per-capita silver approximately 35 grams, penny approximately 1 gram), the model shows that adding a halfpenny is highly welfare-improving for poor agents even at infrequent expenditure, and welfare-improving for all agents when monetary transactions occur at least twice weekly. Debasing the penny by 50 percent has approximately the same welfare effect as introducing a halfpenny and replicates the three stylized facts of the debasement puzzle — large minting volumes, cocirculation of old and new coins, and no additional mint inducement — as equilibrium outcomes rather than paradoxes. However, full-bodiedness creates a commitment device against over-issuance that cannot be replicated by sufficiently small coins, since precious metals have a practical lower bound on coin content, so debasement relieves but does not solve the structural small-coin problem, pointing to the historical necessity of a transition to fiat money.

In depth

Q1. What is the debasement puzzle and how does the paper resolve it?

The debasement puzzle, documented by Rolnick, Velde, and Weber, consists of three facts: following a debasement, minting volumes rose sharply, old and new coins cocirculated sometimes by weight, and yet people still paid minting fees rather than receiving inducements — all of which are puzzling because the absence of an inducement suggests no straightforward arbitrage. The paper resolves the puzzle by modeling a debasement as equivalent to introducing a new denomination: it draws agents to the mint because it supplies the welfare-improving small denomination that agents wanted, not because of a price arbitrage. Cocirculation by weight emerges naturally along the equilibrium path because agents hold both old and new coins in optimal portfolios, and the counterfactual welfare calculation shows the welfare gain from eliminating the shortage is large, explaining why agents willingly pay minting fees to obtain the new coins.

Q2. How does the paper measure the inconvenience of a coin shortage?

The paper measures inconvenience as the welfare difference between the shortage equilibrium and a hypothetical scenario in which the mint suddenly eliminates the shortage — an unanticipated shock that adds the missing denomination to the coinage structure. This counterfactual is tractably computable in the model and directly mirrors the intuition of a historical agent who compares their constrained experience to the imagined experience of having access to the missing coins. Applied to the penny, the model shows that adding a halfpenny (debasing the penny by 50 percent) yields a welfare gain equivalent to the full shortage inconvenience; the result is large for poor agents even at once-monthly expenditure and extends to all agents when transactions are at least twice weekly.

Q3. Why can debasement not permanently solve the small-coin problem?

Full-bodied coinage — coins whose face value equals their precious-metal content — constrains the minimum viable coin size: very small coins are practically too easy to counterfeit and too difficult to handle, so debasement merely pushes the lower denomination boundary down without eliminating it. The model uses this practical indivisibility of precious metals as the structural constraint that prevents an infinite regress of smaller and smaller coins. This constraint points to why fiat money — which severs the link between value and metallic content — ultimately emerged as the only way to provide arbitrarily small denominations at negligible production cost. The paper frames this as the resolution to the historical “big problem of small change.”

Key concepts

debasement puzzle : the simultaneous occurrence of unusually large minting volumes and cocirculation of old and new coins following a debasement, without any additional mint inducement; resolved in this paper as the equilibrium response to supplying a welfare-improving small denomination.

full-bodiedness : the property of commodity coins whose face value equals their precious-metal content; acts as a commitment device against over-issuance in the model but creates a practical indivisibility constraint on the minimum coin size.

multiple-denomination model : the Lee-Wallace-Zhu framework extended in this paper; explains the social demand for multiple coin denominations via wide transaction-value heterogeneity and the burden of carrying many coins.

Debiasing and T-Tests for Synthetic Control Inference on Average Causal Effects

Mon, 01 Jan 0001 00:00:00 +0000

Chernozhukov, Wüthrich, and Zhu propose a debiased synthetic control (SC) estimator and an accompanying self-normalized t-test for making inferences on the average treatment effect on the treated (ATT) in aggregate panel data settings with one treated unit. The inferential target is the time-averaged treatment effect τ = (1/T1) Σ_{t=T0+1}^{T} (Y0t(1) − Y0t(0)), a one-number summary of the overall causal impact that admits standard-form confidence intervals, in contrast to per-period effects (which cannot be consistently estimated with one treated unit) and sharp null hypotheses (which do not inform effect magnitude).

The method addresses two structural challenges in SC inference. First, the canonical SC estimator τ_SC is biased because the weights are estimated from high-dimensional pre-treatment data, and the bias can be substantial under misspecification. Second, even if true weights were known, constructing standard errors requires estimating the long-run variance (LRV), for which classical estimators such as Newey-West are unreliable in the small samples typical of SC applications.

The debiasing procedure is a K-fold cross-fitting scheme applied to the pre-treatment period. The pre-treatment sample is split into K consecutive blocks. For each fold k, SC weights w_(k) are estimated on the leave-one-block-out pre-treatment data H_{(-k)}, and a component estimator τ_k is formed as the difference between the post-treatment SC residual (using w_(k)) and the in-block pre-treatment SC residual. The latter serves as an estimator of the bias, which under the model assumptions is stable across the pre- and post-treatment periods. The final estimator τ_hat is the average of τ_k across folds. A self-normalized t-statistic T_K = sqrt(K)(τ_hat − τ)/σ_τ is constructed using the cross-fold variance; its asymptotic distribution is t_{K-1}, so no LRV estimation is required and (1−α) confidence intervals take the textbook form τ_hat ± t_{K-1}(1−α/2) × σ_τ/sqrt(K).

The t-test is proven valid with both stationary and non-stationary data. With stationary data (Theorem 2), it is valid under arbitrary misspecification. With non-stationary data, validity holds either when all units share a common nonstationarity (Theorem 3, also misspecification-robust) or when units deviate from a common nonstationarity under restrictions on the magnitude and heterogeneity of deviations but SC is correctly specified (Theorem 4). The latter covers heterogeneous deterministic time trends and certain cointegration structures. Researchers therefore need not pre-test for unit roots and select inference procedures accordingly.

A formal efficiency result (Section 3.3) shows that the asymptotic variance of the debiased SC estimator is no larger than that of difference-in-differences (DID), because SC minimizes prediction error and w* dominates the equal-weight DID vector. The relative asymptotic efficiency (RAE) of the t-test versus DID rises with K: K=3 yields RAE of 63.56%; K=5 yields 82.08%; K=10 yields 92.25%.

Simulations calibrated to Andersson’s (2019) Swedish carbon tax application — T0=30, T1=16, N=14, Gaussian AR(1) errors — show that the t-test at K=3 achieves coverage close to the nominal 90% level across correct-specification and misspecification DGPs, while Newey-West standard errors produce substantial undercoverage (coverage = 0.72–0.84) at moderate to high AR(1) coefficients. The method performs comparably to or better than subsampling (Li, 2020) and synthetic DID (Arkhangelsky et al., 2021), and avoids bandwidth selection.

In the empirical application, the debiased SC t-test (K=3) applied to annual CO2 emissions from transport across Sweden (treated, 1990) and 14 OECD control countries over 1960–2005 yields a negative and statistically significant ATT, with a 90% confidence interval lying entirely below zero, implying approximately an 11% average reduction in per capita CO2 emissions from transport attributable to the Swedish carbon tax over 1990–2005. The pre-treatment AR(1) coefficient of SC residuals is approximately 0.31, supporting K=3 as appropriate. These findings corroborate and extend Andersson’s (2019) permutation-based results by providing a confidence interval for the magnitude of the average effect. The method is implemented in the R package scinference.

Q: What is the primary inferential target and why is it preferred over per-period effects or sharp nulls? A: The target is the ATT τ = (1/T1) Σ_{t=T0+1}^{T} (Y0t(1)−Y0t(0)), the time-averaged treatment effect on the treated unit over the post-treatment period. Per-period effects cannot be consistently estimated when there is only one treated unit, yielding wide and uninformative confidence intervals. Sharp nulls (e.g., of no effect whatsoever) are useful starting points but do not inform policy decisions about effect magnitude. The ATT provides an interpretable one-number summary and admits standard-form confidence intervals.

Q: What are the two main inferential challenges that the paper addresses? A: First, the canonical SC estimator τ_SC is biased due to estimation error in the high-dimensional weights, even under correct specification, and the bias can be substantial under misspecification. Second, even with known true weights, standard error estimation requires the long-run variance (LRV), for which classical estimators such as Newey-West (1987) and Andrews (1991) are not sufficiently accurate in the small samples typical of SC applications.

Q: How does the K-fold cross-fitting procedure debias the SC estimator? A: The pre-treatment period is divided into K consecutive blocks H1,…,HK. For each fold k, SC weights w_(k) are estimated using leave-one-block-out pre-treatment data H_{(-k)}. The component estimator τ_k subtracts the in-block pre-treatment SC residual (an estimator of the bias in period Hk) from the post-treatment SC residual (using w_(k)). Because the bias is assumed stable across pre- and post-treatment periods, this subtraction removes it. The final estimator τ_hat averages τ_k across k=1,…,K.

Q: How does the self-normalized t-statistic avoid LRV estimation? A: The statistic T_K = sqrt(K)(τ_hat − τ)/σ_τ uses σ_τ = sqrt(1 + Kr/T1) × sqrt[(1/(K−1)) Σ_k (τ_k − τ_hat)^2], which is the cross-fold standard deviation of the component estimators scaled by a factor reflecting the ratio of pre- to post-treatment block lengths. Under the asymptotic theory, T_K converges to a t_{K-1} distribution, which is pivotal and requires no bandwidth or kernel choice. The cross-fold structure acts as a self-normalizer analogous to the fixed-b approach in the LRV literature.

Q: What does the paper prove about validity with non-stationary data? A: Theorem 3 establishes that when all units share a common nonstationarity (Assumption 4: Yt(0) = Vt(0)+θt and Xt = Zt+1_N·θt where {Vt(0),Zt} is stationary and θt is unrestricted), T_K → t_{K-1} under arbitrary misspecification. Theorem 4 establishes validity when units deviate from common nonstationarity (Assumption 5) under restrictions on the magnitude and heterogeneity of deviations, but requires SC to be correctly specified. These results jointly imply that researchers need not pre-test for unit roots before applying the t-test.

Q: How does the paper formally show that debiased SC is more efficient than DID? A: The pseudo-true SC weights w* minimize mean squared prediction error over W_SC, so the residual variance σ^2_* = E(Yt(0)−Xt’w*)^2 ≤ E(Yt(0)−Xt’w_DID)^2 = σ^2_DID, where w_DID = (1/N,…,1/N)’ is the equal-weight DID vector. This inequality holds regardless of whether SC is correctly specified or not, so the efficiency gain over DID is unconditional. The t-test is also valid when the parallel trends assumption underlying DID is violated, making it more robust.

Q: What is the trade-off in choosing K, and what does the paper recommend? A: A larger K produces shorter confidence intervals (higher RAE: 63.56% at K=3 versus 92.25% at K=10) but may reduce coverage accuracy in finite samples because the t_{K-1} approximation improves with K while each block becomes smaller. The paper recommends K=3 as a starting point for typical SC applications where T0 is small, based on simulation evidence showing excellent 90% coverage at K=3. When T0 is moderate or large, K can be increased without loss of coverage accuracy.

Q: What do the simulations show about the performance of Newey-West standard errors versus the t-test? A: In simulations calibrated to the Swedish carbon tax application (T0=30, T1=16, N=14, AR(1) errors), the t-test at K=3 achieves coverage close to the nominal 90% level across both correct-specification and misspecification DGPs. Newey-West standard errors produce coverage of only 0.72–0.84 when the AR(1) coefficient of the error process is moderate to high. DID achieves nominal coverage when parallel trends hold but is biased and has poor coverage under violations of parallel trends.

Q: How does the method compare with Li (2020) subsampling and synthetic DID (Arkhangelsky et al., 2021)? A: Compared with Li (2020), the t-test allows N to grow with (T0,T1) rather than treating N as fixed, directly corrects for SC estimation bias via cross-fitting, avoids the need to pre-process data for stationarity, and does not require a subsampling bandwidth choice. Compared with SDID (Arkhangelsky et al., 2021), the t-test is simpler, does not require homoskedasticity across units as SDID’s placebo variance estimator does, and is developed under a linear prediction model rather than a factor model. Simulations show the t-test performs comparably to or better than both alternatives in the application-calibrated DGP.

Q: What are the empirical findings for the Swedish carbon tax application? A: Using annual CO2 emissions from transport for Sweden and 14 OECD control countries over 1960–2005, with T0=30 (1960–1989) and T1=16 (1990–2005), the debiased SC t-test at K=3 yields a negative and statistically significant ATT. The 90% confidence interval lies entirely below zero. The estimated average effect is approximately an 11% reduction in per capita CO2 emissions from transport attributable to the carbon tax over 1990–2005. The pre-treatment SC residuals show an estimated AR(1) coefficient of approximately 0.31, confirming moderate persistence and supporting the use of K=3.

Q: When does the paper recommend against using the t-test? A: The paper advises against the t-test when T1 is very small (T1 < 8–10), as asymptotic approximations may be inaccurate; when there are structural breaks shortly after T0 (making the ATT ill-defined); and when SC fit is poor because the treated unit is very different from controls. The method requires T0, T1, N → ∞ for asymptotic validity, and T1 ≥ 10–15 is suggested for reliable finite-sample performance.

Q: How does the paper cover higher-order improvements in finite samples? A: Appendix D formally establishes that the coverage error of the confidence interval I_K(1−α) is O(1/T) rather than O(1/sqrt(T)), analogous to the fixed-b approach in the LRV literature. This provides a formal justification for the excellent finite-sample coverage observed in the simulations and distinguishes the t-test from Gaussian approximations whose coverage error is of larger order.

K-fold cross-fitting debiasing: A procedure that splits the pre-treatment period into K consecutive blocks, estimates SC weights on the leave-one-block-out pre-treatment data for each fold, and subtracts the in-block pre-treatment prediction error as an estimator of the bias. Under the model, the bias is assumed stable across pre- and post-treatment periods, so this subtraction removes it from the final estimator.

Self-normalized t-statistic: A scale-free test statistic T_K = sqrt(K)(τ_hat − τ)/σ_τ whose denominator is the cross-fold standard deviation of the K component estimators, scaled to account for the ratio of pre-treatment block length to post-treatment period length. The statistic converges to a t_{K-1} distribution without requiring any LRV estimation.

Average treatment effect on the treated (ATT): The target parameter τ = (1/T1) Σ_{t=T0+1}^{T} (Y0t(1)−Y0t(0)), representing the time-averaged causal effect of the treatment on the treated unit over the post-treatment period. It provides an interpretable one-number summary that admits standard-form confidence intervals, in contrast to per-period effects (not consistently estimable with one unit) and sharp null hypotheses (informative about presence but not magnitude of effect).

Common nonstationarity: The condition (Assumption 4) that all units share the same nonstationary component θt — formally, Yt(0) = Vt(0)+θt and Xt = Zt+1_N·θt with {Vt(0),Zt} stationary and θt unrestricted. Under this condition, the t-test is valid under arbitrary misspecification of SC weights, without requiring the researcher to specify or pre-test the type of nonstationarity.

Relative asymptotic efficiency (RAE): The ratio of the asymptotic expected confidence interval length of the debiased SC t-test to a benchmark (taken as K→∞), quantifying the cost in interval length from using a finite K. At K=3, RAE = 63.56%; at K=5, RAE = 82.08%; at K=10, RAE = 92.25%.

Long-run variance (LRV): The quantity that governs the asymptotic variance of time-averaged quantities in settings with serially correlated data. The paper argues that classical LRV estimators (Newey-West, Andrews) are insufficiently accurate in the small samples typical of SC applications, motivating the self-normalization approach that avoids LRV estimation entirely.

Pseudo-true SC weights: The population minimizer w* = argmin_{w ∈ W_SC} E(Yt(0)−Xt’w)^2, defined as the best linear predictor of the treated unit’s counterfactual outcome within the SC simplex constraint. These weights exist and satisfy the efficiency bound even under model misspecification, providing the foundation for the efficiency comparison with DID.

Decision Theory for Treatment Choice Problems with Partial Identification

Mon, 01 Jan 0001 00:00:00 +0000

This paper applies classical statistical decision theory (Wald 1950) to treatment choice problems where the data only partially identify payoff-relevant parameters. The policy maker chooses an action a in [0,1] — interpreted as the share of the population assigned to a new policy — to maximize welfare that is linear in the action. The data are Gaussian, and the key departure from prior literature is that the mean function mapping parameters to data need not be injective, so even infinite data may not reveal the optimal action.

The paper evaluates decision rules under three classical criteria: admissibility, maximin welfare, and minimax regret (MMR).

Admissibility result (Theorem 1): Under nontrivial partial identification, every decision rule — however exotic — is welfare-admissible. No rule is dominated. This is a sharp reversal from point-identified settings, where admissibility meaningfully restricts the rule class: in the scalar point-identified case (n=1, m(theta)=theta), Karlin and Rubin’s (1956) result implies that any non-threshold rule is dominated. The proof exploits completeness of the Gaussian statistical model: if a dominating rule d’ existed, it would have to agree almost everywhere with d, yielding a contradiction. Theorem 5 generalizes this result beyond Gaussian likelihoods, tying it to bounded completeness of the statistical model.

Maximin welfare result (Theorem 2): The maximin criterion selects the no-data rule d(y) = 0 — preserve the status quo regardless of data — whenever the status quo welfare is the infimum over states with non-positive welfare contrast. In the running example, maximin welfare equals zero and is achieved by never assigning the new policy. This echoes critiques from Savage (1951) and Manski (2004) about ultra-pessimism.

Minimax regret result (Theorem 3): In point-identified problems, the MMR rule is essentially unique and nonrandomized (Canner 1970; Stoye 2009a; Tetenov 2012). Under partial identification, when the identified set is large enough — formally, when I(0) is large enough and there exists mu in the identified set with I(mu) > I(0) — there are infinitely many MMR optimal rules, and any symmetric, weakly increasing MMR rule depending only on the sufficient statistic (w*)^T Y must randomize for some data realizations. Moreover, if I(mu) is differentiable at zero, no linear threshold rule is MMR optimal.

Least randomizing MMR rule (Theorem 4): Because policy randomization is difficult to implement in practice, the authors uniquely characterize the MMR optimal rule that randomizes least frequently. Among all symmetric, weakly increasing, unimodal MMR optimal rules depending on (w*)^T Y, the rule d*_linear has the smallest randomization region — every other distinct such rule has a strictly wider randomization region. This rule can be profiled-regret dominant over the Stoye (2012a)/Yata (2023) MMR rule (Proposition 2), and the uniformly randomizing rule is inadmissible under profiled regret (Proposition 3). Under some conditions, d*_linear can also be obtained as the MMR rule within a class that penalizes randomized assignments equally (Proposition 4).

Three applications ground the theory. First, in Ishihara and Kitagawa’s (2021) evidence aggregation framework — extrapolating treatment effects from n source countries to a target country — the least randomizing rule randomizes only when estimated bounds on the target treatment effect straddle zero, linking decision rules directly to identified-set estimators. Second, in LATE extrapolation (Mogstad et al. 2018), all decision rules are admissible and IV-based threshold rules are not dominated. Third, in the omitted-variable-bias setting of Diegert et al. (2022), the decision-theoretic breakdown point — the largest confounding magnitude under which the seemingly better policy should be adopted without hedging — tolerates strictly more confounding than Diegert et al.’s breakdown point, where the threshold is k = sqrt(pi/2) * sigma.

Q: What is the central research question? A: The paper asks how classical statistical decision theory — admissibility, maximin welfare, minimax regret — applies when the data only partially identify the payoff-relevant parameters governing a binary treatment choice. Prior literature had developed these criteria for point-identified settings; this paper characterizes how partial identification fundamentally changes the answers.

Q: What is the formal framework? A: The policy maker chooses a in [0,1] (population share assigned to the new policy) with welfare W(a,theta) = a*W(1,theta) + (1-a)*W(0,theta), linear in a. The data are Y ~ N(m(theta), Sigma) with known m and Sigma. Partial identification arises when m is not injective, so distinct parameter values theta and theta’ with opposite-sign welfare contrasts U(theta) = W(1,theta) - W(0,theta) can produce the same data distribution.

Q: Why does admissibility lose all refinement power under partial identification? A: Theorem 1 shows that every decision rule is admissible when there is nontrivial partial identification. The mechanism is Gaussian completeness: if a dominating rule d’ existed, then for every data distribution in the model, d and d’ would have equal expected values, which by completeness implies d = d’ almost everywhere — a contradiction. This relies on the fact that nontrivial partial identification ensures that each data distribution is compatible with both positive and negative welfare contrasts, preventing the construction of a uniformly dominating rule.

Q: What is the contrast with point-identified settings? A: In the scalar point-identified case (n=1, m(theta)=theta, W(1,theta)=theta, W(0,theta)=0), Karlin and Rubin’s (1956) theorem implies any non-threshold rule is dominated; admissibility restricts attention to threshold rules. Partial identification completely eliminates this refinement: even randomized or otherwise arbitrary rules are admissible.

Q: What does the maximin welfare criterion recommend? A: Theorem 2 shows that when the status quo welfare equals the infimum of welfare over states with non-positive welfare contrast, the maximin optimal rule is d(y) = 0 for all y — preserve the status quo regardless of the data. In the running evidence-aggregation example, maximin welfare equals zero and is achieved by never assigning the new policy. The criterion ignores all data because the worst case is always achieved at states where the new policy performs no better than the status quo.

Q: What is the minimax regret criterion and why is it preferred? A: Expected regret at state theta is R(d,theta) = U(theta)*{1{U(theta)>=0} - E[d(Y)]} — the expected welfare loss relative to the oracle who knows theta. A rule is MMR optimal if it minimizes worst-case expected regret. Unlike maximin welfare, MMR uses data and balances risks across states. In point-identified settings it yields essentially unique, nonrandomized rules.

Q: How does partial identification change the MMR solution set? A: Theorem 3 shows that when the identified set is large enough — I(0) is sufficiently large and there exists mu with I(mu) > I(0) — there are infinitely many MMR optimal rules, and every symmetric, weakly increasing MMR rule depending on the sufficient statistic (w*)^T Y must randomize for some data realizations. If I(mu) is differentiable at zero, no linear threshold rule is MMR optimal. Different MMR rules can recommend different policies for the same data, creating a nontrivial multiplicity problem.

Q: How is the least randomizing MMR rule characterized? A: Theorem 4 shows that among all symmetric, weakly increasing, unimodal MMR optimal rules that depend on data only through (w*)^T Y, the rule d*_linear has the smallest randomization region: every other distinct rule in this class has a strictly wider randomization region, V(d*_linear) ⊆ V(F∘w*) with strict inclusion when F ≠ d*_linear. This characterization is essentially unique and provides a pragmatic refinement of the MMR solution set.

Q: What is profiled regret and why is it used? A: Profiled regret reports worst-case expected regret at each fixed value of the point-identified parameters, rather than worst-case over all parameters jointly. Proposition 2 shows that the least randomizing rule d*_linear can profiled-regret dominate the Stoye (2012a)/Yata (2023) MMR rule in the running example. Proposition 3 shows that the uniformly randomizing rule is profiled-regret inadmissible when profiling over point-identified parameters. This concept provides an additional selection criterion within the MMR solution set.

Q: Can the least randomizing rule be derived from an explicit welfare penalty? A: Proposition 4 shows that, under some conditions, d*_linear is minimax regret optimal within the class of rules that penalize all randomized assignments equally. This connects the least randomizing criterion to a modified welfare function that treats randomization itself as costly, providing an interpretation for the refinement beyond mere pragmatics.

Q: What does the evidence aggregation application show? A: In the Ishihara-Kitagawa (2021) framework — extrapolating effects from n source countries to a target country using Lipschitz smoothness — the least randomizing rule randomizes only (though not always) when the estimated bounds on the target treatment effect contain both positive and negative values. When bounds are entirely positive or entirely negative, the rule recommends a deterministic action. This shows how identified-set estimators directly enter decision-theoretically optimal rules.

Q: What does the LATE extrapolation application show? A: In the Mogstad et al. (2018) setting with a binary instrument and no covariates, where the payoff-relevant parameter is a policy-relevant treatment effect corresponding to expanding the complier subpopulation, Theorem 1 applies: all decision rules are admissible. In particular, the IV threshold rule — implement the policy for large IV estimates — is not dominated, providing decision-theoretic grounding for a common empirical practice.

Q: What does the omitted variable bias application show? A: In the Diegert et al. (2022) setting where the identified set for the long regression coefficient given the medium regression coefficient is [beta_med - k, beta_med + k], the least randomizing MMR rule is d*_linear(beta_hat_med) when k > sqrt(pi/2) * sigma. The decision-theoretic breakdown point — the largest k under which the seemingly better policy should be adopted without randomization — is strictly larger than Diegert et al.’s sensitivity breakdown point, meaning the decision-theoretic approach tolerates more confounding before recommending hedging.

Q: How does Theorem 5 generalize Theorem 1 beyond Gaussian likelihoods? A: Theorem 5 extends the admissibility result by connecting it to bounded completeness of the statistical model rather than Gaussian-specific completeness. This shows that the collapse of admissibility’s refinement power is not an artifact of normality but a general consequence of partial identification combined with a sufficiently rich statistical model.

Q: What is the paper’s broader implication for empirical practice? A: The results show that under partial identification, two of the three classical decision-theoretic criteria (admissibility and maximin welfare) provide no useful guidance — the former because everything passes, the latter because it ignores data entirely. MMR remains the operative criterion but yields infinitely many rules, all requiring some randomization. The least randomizing refinement provides a unique, practically implementable rule that connects to estimated identified sets and tolerates more ambiguity than purely statistical sensitivity analyses.

Partial identification: A setting where even infinite data cannot uniquely determine payoff-relevant parameters, because the mean function m mapping parameters to data distributions is not injective. Distinct parameter values with opposite-sign welfare contrasts may be observationally equivalent.

Welfare contrast U(theta): The difference W(1,theta) - W(0,theta) between the welfare under the new policy and under the status quo at parameter theta. The oracle optimal action is 1{U(theta) >= 0}.

Admissibility (welfare): A rule d is admissible if no rule d’ weakly dominates it in expected welfare at every theta with strict improvement at some theta. Under partial identification with Gaussian likelihood, every rule is admissible — admissibility has no refinement power.

Maximin welfare optimality: A rule is maximin optimal if it attains the highest worst-case expected welfare. Under partial identification, this criterion selects the no-data rule (always preserve status quo) whenever the status quo welfare equals the infimum over states with non-positive welfare contrast.

Minimax regret (MMR) optimality: A rule minimizes the worst-case expected welfare loss relative to the oracle action. Under severe enough partial identification, MMR optimal rules are non-unique and all require randomizing policy recommendations for some data realizations.

Least randomizing MMR rule (d*_linear): The unique MMR optimal rule with the smallest randomization region among all symmetric, weakly increasing, unimodal MMR rules depending on the sufficient statistic. Characterized in Theorem 4; randomizes only when estimated identified set bounds straddle zero in the running example.

Profiled regret: The worst-case expected regret at each fixed value of the point-identified parameters, treating them as a parameter of interest and profiling out the partially identified parameters. Provides a finer ranking within the MMR solution set and renders the uniformly randomizing rule inadmissible.

Demand Analysis under Latent Choice Constraints

Mon, 01 Jan 0001 00:00:00 +0000

Agarwal and Somaini study demand estimation in markets where consumers face latent choice constraints — situations where a consumer’s effective choice set is determined not only by her preferences but also by supply-side rationing or information frictions that restrict which options are actually available to her. Standard discrete choice methods assume consumers pick freely from the full product set, but this assumption fails in school and college admissions, entry-level labor markets, healthcare with selective admissions, and consumer markets with incomplete consideration sets. The paper provides a unified non-parametric identification framework for this class of models, proves necessity of the identifying instruments, proposes a computationally tractable estimator, and applies the framework to the California kidney dialysis market.

The model combines a general random utility specification — accommodating multi-dimensional unobserved heterogeneity and product-level unobservables correlated with observed characteristics as in Berry (1994) and BLP (1995) — with a reduced-form acceptance policy function that governs which products accept which consumers. The consumer’s latent choice set is the set of products that accept her, and she picks her most preferred option within that set. Crucially, the acceptance decision may be arbitrarily correlated with consumer preferences, ruling out the independence assumptions common in the consideration-set literature.

Identification rests on two sets of instruments. The first is a preference shifter, a consumer-product observable that affects utility but is excluded from the acceptance policy — distance to facility in the application. The second is a choice-set shifter, an observable that affects the acceptance decision but is excluded from consumer utility — short-term deviation of a facility’s caseload from its estimated target in the application. The main result (Theorem 1) establishes non-parametric point identification of the joint distribution of indirect utilities and acceptance decisions given both instruments. Proposition 1 establishes that the model is not identified when the choice-set shifter is absent — even when the preference shifter has full support — making both instruments necessary rather than merely sufficient.

The application uses USRDS data on 41,913 new dialysis patients treated at 552 California facilities between 2015 and 2018. Most facilities are owned by Fresenius or DaVita. The choice-set shifter is the facility’s caseload deviation from target when a patient enters the market; facility and quarter fixed effects are included so that only short-term caseload variation drives identification. A reduced-form regression shows that higher caseload deviation significantly reduces the inflow of new patients to a facility, consistent with supply-side rationing. Patients also choose more distant facilities when nearby facilities have above-normal caseloads, providing further reduced-form evidence that rationing shapes allocations.

A Gibbs sampler with data augmentation — drawing alternately from the distribution of latent choice sets conditional on utilities and from utility parameters conditional on choice sets — circumvents the curse of dimensionality that makes direct likelihood maximization over all possible choice sets infeasible.

Estimation results show that the probability a patient is accepted at her first-choice facility is only 73.0%, with variation across facilities. Standard discrete choice models that ignore rationing misestimate facility quality, systematically assigning high desirability to low-caseload facilities in a manner that conflates easy access with genuine patient preference. A naive correction that includes the caseload measure in the utility function mischaracterizes the diversion pattern: rationed patients are marginal for the facility but strictly prefer it, so they divert differently from patients who voluntarily switch because of quality changes. Fresenius and DaVita facilities are estimated to be more selective than independent facilities, consistent with chain networks enabling coordinated patient-flow management across locations.

Q: What is the core empirical problem the paper addresses? A: Standard demand estimation inverts market shares to recover preference parameters under the assumption that consumers choose freely from the full product set. When choice sets are constrained by supply-side rationing or information frictions, the largest market share product need not be the one most preferred — it may simply be the one that accepts the most consumers. This makes the standard inversion inapplicable, and ignoring constraints yields biased preference estimates.

Q: What does the paper’s model consist of? A: The model has two components: (1) a random utility model for consumer preferences with rich observed and unobserved heterogeneity, allowing product-level unobservables correlated with observed characteristics; and (2) a reduced-form acceptance policy function sigma_jt taking values in {0,1} that determines whether product j accepts consumer i. The consumer’s latent choice set is the set of products that accept her; she picks her most preferred option within it. Utilities and acceptance decisions may be arbitrarily correlated.

Q: What examples of latent choice constraints are covered by the framework? A: The reduced form encompasses: selective admissions in healthcare (facility accepts patient if profitability exceeds a caseload-dependent threshold); two-sided matching markets where a pairwise stable allocation is described by cutoff scores (school admissions, entry-level labor markets); consideration set models where brand awareness advertising or inattention determines which products a consumer sees; fixed-sample consumer search; and product stock-outs. Each of these implies an acceptance policy function of the form specified in the paper’s reduced-form model.

Q: What are the two identifying instruments and the intuition behind each? A: The preference shifter yij is a consumer-product observable that affects the consumer’s indirect utility for product j but is excluded from that product’s acceptance decision. In the application this is distance: dialysis requires multiple weekly visits, so distance affects patient utility, but a facility’s decision to accept a patient does not depend on how far the patient lives. The choice-set shifter zij is an observable that affects the acceptance decision but is excluded from consumer preferences. In the application this is the deviation of facility caseload from its estimated target: short-term caseload swings affect whether a facility can take a new patient but, conditional on facility fixed effects, do not reflect facility quality as perceived by patients.

Q: What does Theorem 1 establish and under what conditions? A: Theorem 1 establishes non-parametric point identification of (i) the function gj mapping the preference shifter to its utility contribution, and (ii) the joint distribution of indirect utilities and acceptance indicators, for every consumer attribute vector and every value in the interior of the joint support of the instruments. Conditions required include: monotonicity of the acceptance policy in the choice-set shifter (higher z makes acceptance weakly less likely, with sigma=1 as z approaches negative infinity and sigma=0 as z approaches positive infinity); conditional independence of unobservables from the instruments given observed consumer attributes; and at least two products available.

Q: What does Proposition 1 establish about necessity of the choice-set shifter? A: Proposition 1 shows that if the choice-set shifter z has singleton support (no variation), then even when the preference shifter g has full support on R^|J|, the distribution of preferences is not identified wherever a choice set strictly smaller than the full product set has positive probability. The non-identification result applies on any open set where a constrained choice set has positive probability — it is not a knife-edge case. This makes the choice-set shifter a necessary condition for identification, not merely a convenient one.

Q: How does the paper handle endogeneity of product characteristics? A: Corollary 2 extends the baseline identification result to allow product-level unobservables that may be correlated with observed product characteristics, as in Berry (1994) and BLP (1995). Identification in this case requires an additional instrument that shifts product characteristics but is excluded from both preferences and choice sets — analogous to BLP supply-side instruments — alongside the two shifters already required. This extends Berry and Haile (2010) to settings with constrained choice sets.

Q: What is the Gibbs sampler estimator and why is it needed? A: With J products per market, the number of possible choice sets is 2^J, making direct likelihood computation infeasible for even moderate J. The Gibbs sampler uses data augmentation to alternate between: (a) drawing latent choice sets conditional on current utility parameters and observed choices; and (b) drawing utility parameters conditional on the augmented choice sets. Each conditional draw reduces to a standard problem, avoiding the curse of dimensionality. The Bernstein-von Mises theorem implies that the posterior mean of the sampling chain is asymptotically equivalent to the maximum likelihood estimator.

Q: What is the reduced-form evidence for supply-side rationing in dialysis? A: The regression of log(1 + new patient inflows to facility j in quarter q) on facility fixed effects, quarter fixed effects, and the caseload deviation z_jq yields a statistically significant negative coefficient on caseload deviation: above-target caseloads reduce new patient admissions even after controlling for facility-level and time-level averages. Additionally, patients whose nearest facilities have above-normal caseloads travel to more distant facilities, providing complementary evidence that rationing displaces patients geographically.

Q: What is the estimated probability of acceptance at a first-choice facility? A: The structural estimates imply that a patient is accepted at her first-choice facility with probability only 73.0%, with variation across facilities. The implied 27.0% rejection rate is economically substantial, meaning a large share of observed allocations do not reflect unconstrained patient preference.

Q: How do estimates from the constrained model differ from a standard discrete choice model? A: The standard model, which ignores selective admissions, assigns higher utility to facilities with lower caseloads — a bias that conflates easy access with genuine patient preference. The constrained model separately identifies the facility’s acceptance propensity from the patient’s underlying preference, yielding different facility quality rankings. The largest facilities are not necessarily the most desirable once selective admissions are accounted for.

Q: Why is the naive correction — including caseload in the utility function — insufficient? A: The naive correction treats caseload as a quality attribute, implying that a patient turned away because of high caseload and a patient who voluntarily avoids a high-caseload facility are pulled from the same margin. In the constrained model, a rationed patient is marginal for the facility but strictly prefers it, so she diverts to a different set of alternatives than a patient who voluntarily switches. Not capturing this distinction produces quantitatively different diversion ratios.

Q: What do the estimates say about chain versus independent facilities? A: Fresenius and DaVita facilities are estimated to be more selective in their admissions than independent facilities. The paper interprets this as consistent with large chains having better ability to coordinate patient flows across their network of facilities, potentially directing turned-away patients to other chain locations.

Q: What is the scope of the identification results? A: Identification is established within each market, for consumer attribute vectors in the interior of support, and for utility-acceptance pairs in the interior of the joint support of the instruments. The results are non-parametric in that they do not restrict the functional form of preferences or acceptance policies beyond monotonicity and support conditions, and they allow unobservables affecting choice sets to be arbitrarily correlated with preference unobservables. The empirical application implements a parametric version for tractability.

Latent choice constraint: A restriction on a consumer’s effective choice set arising from supply-side rationing or information frictions, such that the consumer can only choose among the products that accept her rather than freely among all products in the market. Distinct from price-based market clearing.

Acceptance policy function: A reduced-form function mapping consumer attributes, consumer unobservables, and the choice-set shifter to a binary accept/reject decision by product j. Indexed by product and market, allowing arbitrary variation in selectivity across products and time. The consumer’s latent choice set is defined as the set of products whose acceptance policy equals 1.

Choice-set shifter: A consumer-product observable that shifts the acceptance probability — making product j more or less likely to accept consumer i — while being excluded from consumer indirect utility. In the application: short-term deviation of facility caseload from its estimated target. Necessary (not merely sufficient) for non-parametric identification of the model.

Preference shifter: A consumer-product observable that shifts consumer utility for product j and is separable from consumer-specific unobservables, but is excluded from that product’s acceptance policy function. In the application: distance from patient’s residence to the facility. Also necessary for identification.

Curse of dimensionality in constrained choice: The computational problem that the number of possible latent choice sets grows as 2^J with the number of products J, making direct likelihood integration over choice sets infeasible for even moderate J. Resolved in this paper by a Gibbs sampler with data augmentation that conditions alternately on latent choice sets or utility parameters.

Diversion ratio under selective admissions: The share of patients lost by a facility who are captured by each alternative facility. In a model with selective admissions, rationed patients (marginal for the facility) divert differently from patients who voluntarily switch (marginal for the consumer), because rationed patients strictly prefer the rejecting facility. The naive correction conflates these two margins, yielding quantitatively different and biased diversion ratio estimates.

Non-parametric necessity of instruments: The property that both the preference shifter and the choice-set shifter are individually necessary conditions for point identification of the joint distribution of preferences and acceptance decisions, not merely convenient sufficient conditions. Absence of either instrument leaves the model non-identified on any open set where a constrained choice set has positive probability.

Demand Stimulus as Social Policy

Mon, 01 Jan 0001 00:00:00 +0000

This paper estimates the distributional and social consequences of Department of Defense (DOD) contract spending using a city-level (CBSA) panel dataset spanning 2005–2016. The research question is whether demand stimulus — specifically DOD spending, the largest category of U.S. discretionary government spending — has differential effects across demographic groups and whether it improves social outcomes typically targeted by dedicated government programs. A secondary question is whether these effects are specific to DOD spending or common to any demand shock.

The empirical strategy exploits variation in DOD contract spending from USAspending.gov, constructing a proxy for outlays over time using contract duration, and instrumenting with a Bartik-type shock (location’s average DOD share interacted with aggregate contract spending). The main specification is a two-year differenced panel regression with CBSA and time fixed effects. Social outcomes come primarily from the American Community Survey (ACS), covering 290 CBSAs; mortality data come from the CDC; crime data from the FBI/NACJD. For comparison, the authors construct a general demand shock series using the standard Bartik shift-share approach across two-digit industries, which is nearly uncorrelated with the DOD shock (correlation -0.07).

Main findings on distributional effects: A 1 percent increase in DOD spending as a share of local earnings raises overall average ACS earnings by 0.43 percent but raises average earnings for households without a bachelor’s degree by 0.71 percent, and raises average earnings for Black households by a slightly larger amount, while Whites receive the majority of total income. The employment rate rises by 0.22 percentage points per percent increase in DOD spending. Labor force participation is largely unchanged in aggregate, but rises 0.08 percentage points for the middle-aged (41–61) and 0.14 percentage points for those with a bachelor’s degree.

On social outcomes: The poverty rate falls 0.08 percentage points, driven entirely by those without a bachelor’s degree. Food stamp (SNAP) receipt falls 0.08 percentage points. Self-reported disability rates fall, particularly among households without a bachelor’s degree. Occupational prestige rises by 0.024 points overall (0.037 for those without a bachelor’s degree). Travel time to work falls by 6.7 minutes per day, implying an annual benefit exceeding $558 per worker at a value of time of $10/hour. Marriage rates rise and divorce rates fall for some demographic groups. Homeownership increases significantly for some groups. Mortality falls, with 2.61 fewer deaths per 100,000 among those age 45–65 and 8.49 fewer deaths per 100,000 among those over 65 per percent increase in DOD spending; health-related deaths account for the majority of the decline. Crime is largely unaffected, except for a statistically significant reduction in vehicle theft.

Comparing DOD to general demand shocks: Although both raise total earnings by similar amounts ($0.56 and $0.63 per dollar of shock, respectively), the general demand shock produces only about half the employment rate response (14.3 vs. 24.5 percentage point increase for households without a bachelor’s degree), concentrates earnings gains among already-employed, higher-educated, and White households, produces weaker effects on disability and occupational prestige, increases mortality by approximately 100 deaths per 100,000, and increases crime (vehicle theft and aggravated assault). The differential mortality response is partly attributed to differential pollution effects: general demand shocks raise the median AQI substantially, while DOD shocks do not. The differential employment effects of DOD shocks are explained primarily by city and occupational composition rather than industry composition: DOD shocks are directed toward smaller, lower-earnings cities with lower employment rates and fewer college-educated residents, and toward construction, manufacturing, and production/maintenance occupations with high no-bachelor’s shares.

Scope conditions: Results are identified using CBSA-level variation over 2005–2016. DOD spending is treated as predominantly supply-side-driven and not directly entering household utility or local infrastructure. The social outcome results are local partial-equilibrium estimates and do not account for general equilibrium spillovers across CBSAs.

Q: What is the core identification strategy, and why is DOD spending considered a valid instrument for demand stimulus? A: DOD contract data from USAspending.gov are used to construct a proxy for outlays (distributing contract obligations over contract duration), and this measure is instrumented with a Bartik-type shock (location’s average DOD share times aggregate contract growth). The Bartik IV isolates the component of DOD contracts associated with new production, addressing endogeneity and the “anticipated contracts” problem. DOD spending is treated as predetermined relative to local business cycles and does not directly enter household utility or local infrastructure, isolating the aggregate demand channel.

Q: Which demographic groups receive the most total income from DOD spending, and which see the largest relative gains? A: In absolute terms, the majority of wage and salary income from DOD spending accrues to Whites and to those without a bachelor’s degree. However, adjusting for existing income shares, Black households and households without a bachelor’s degree experience the largest proportional increases in average earnings: a 1 percent increase in DOD spending as a share of local earnings raises average earnings for no-bachelor’s households by 0.71 percent, compared to a 0.43 percent increase in overall average earnings.

Q: How does DOD spending affect employment at the extensive margin, and what does this imply about who benefits? A: A 1 percent increase in DOD spending as a share of local earnings raises the overall employment rate by 0.22 percentage points. The large employment response among those without a bachelor’s degree (24.5 percentage points in the comparative analysis) implies that DOD spending disproportionately benefits previously unemployed workers rather than simply raising wages for those already employed.

Q: Does DOD spending increase labor force participation? A: There is no detectable aggregate effect on labor force participation rates, suggesting limited effects of demand stimulus on the participation margin over short horizons. However, participation rises 0.08 percentage points for the middle-aged (41–61) and 0.14 percentage points for those with a bachelor’s degree. The population response is strongest for those without a bachelor’s degree, though the estimate is imprecise.

Q: What are the poverty and welfare effects of DOD spending? A: A 1 percent increase in DOD spending as a share of local earnings reduces the poverty rate by 0.08 percentage points, with the entire effect concentrated among households without a bachelor’s degree. SNAP (food stamp) receipt falls by 0.08 percentage points. Medicaid receipt falls significantly for young children, while children substitute into private health insurance, leaving overall child health insurance coverage unchanged.

Q: How does DOD spending affect disability rates? A: A 1 percent increase in DOD spending leads to a 0.001 percentage point reduction in self-reported disability rates among households without a bachelor’s degree. The effect is most apparent for this group, the middle-aged, and Whites. In the comparative analysis, the employment margin accounts for a disability decline of -0.051 for no-bachelor’s households, nearly half of the total disability decline of -0.114 for that group.

Q: What are the occupational prestige and commute time effects? A: A 1 percent increase in DOD spending raises a city’s average occupational prestige score (Siegel score) by 0.024 points, with the effect concentrated among no-bachelor’s households (0.037). Commute time falls by 6.7 minutes per day; at a value of time of $10/hour, this implies an annual benefit of approximately $558 per worker.

Q: How does DOD spending affect household formation outcomes? A: Marriage rates increase and the likelihood of single parenthood decreases for White households. Divorce rates decrease for middle-aged and Black households. White households become more likely to own homes and less likely to live in multi-family homes. Estimates for Black and Hispanic households are imprecise.

Q: What are the mortality effects of DOD spending, and how do they compare to general demand shocks? A: A 1 percent increase in DOD spending as a share of local income leads to 2.61 fewer deaths per 100,000 among those aged 45–65 and 8.49 fewer deaths per 100,000 among those over 65, with health-related deaths accounting for the majority of the decline. This implies the DOD must spend approximately $25 million to save a life aged 45–65, exceeding the typical value of a statistical life. By contrast, a general demand shock increases mortality by approximately 100 deaths per 100,000, consistent with Ruhm’s (2000) finding that mortality is procyclical; mortality increases from general shocks are also concentrated among those over 45.

Q: What explains the divergent mortality effects of DOD and general demand shocks? A: One mechanism explored is pollution: general demand shocks raise median AQI substantially while DOD shocks leave AQI largely unaffected, consistent with Ruhm’s (2000) emphasis on deteriorating health behaviors during expansions. The paper also points to differential occupational and geographic composition: DOD shocks flow to construction, manufacturing, and production/maintenance occupations rather than to higher-pollution or higher-accident-risk activities common in broad economic expansions.

Q: How do the crime effects differ between DOD and general demand shocks? A: DOD spending shocks are associated with a statistically significant reduction in vehicle theft but no significant change in other crime categories. General demand shocks, by contrast, appear to increase vehicle theft and aggravated assault. Voter turnout falls substantially in response to a general demand shock; both shock types reduce Democratic vote shares.

Q: What is the key mechanism explaining why DOD shocks have stronger social effects than general demand shocks? A: Despite similar average earnings effects for no-bachelor’s households (0.71 for DOD vs. 0.69 for general shocks), DOD shocks produce a much larger employment rate increase for that group (24.5 vs. 14.3 percentage points). The authors show that this employment margin accounts for large shares of the differential declines in poverty, food stamp receipt, disability, and improvements in marriage rates and occupational prestige.

Q: What accounts for the differential employment effects on no-bachelor’s households between DOD and general demand shocks? A: Of the 0.21 percentage point differential employment effect, roughly one quarter is associated with differences in the no-bachelor’s share across industries. Differences across cities and across occupations each account for much larger shares. DOD shocks are directed toward smaller, lower-income, lower-employment cities with fewer college-educated residents, while general demand shocks go to larger, richer cities with more elastic housing supply and higher education levels.

Q: Which industries and occupations drive DOD’s stronger employment effects for no-bachelor’s workers? A: Within industries, DOD-induced employment gains for no-bachelor’s workers are strongest in construction and manufacturing, with much milder effects from general demand shocks in these industries. The occupations benefiting most are military occupations (broadly defined) and Production and Maintenance occupations, which rank among the lowest in occupational prestige for no-bachelor’s workers.

Q: How does DOD spending compare to targeted social programs in achieving distributional goals? A: The paper argues that although DOD spending is not designed as social policy, its effects on earnings for households without a bachelor’s degree, poverty reduction, disability reduction, homeownership, and occupational upgrading mirror the stated objectives of many targeted programs (job training, housing subsidies, SNAP, Medicaid). At the same time, DOD-induced life savings cost approximately $25–45 million per life, exceeding the typical value of a statistical life, so the mortality benefits cannot alone justify the spending.

Local DOD earnings multiplier: The dollar amount of earnings for a demographic group produced by a dollar of local DOD spending over a two-year period, estimated using a two-year differenced panel regression with CBSA and time fixed effects, instrumented by a Bartik-type shock.

Bartik-type IV shock: An instrumental variable constructed as the product of a location’s average share of DOD contract spending and aggregate contract spending in a given period; used to isolate the component of DOD contracts associated with new production rather than anticipated or smoothed payments.

General demand shock: A Bartik shift-share shock constructed from local industry employment shares and national industry-level growth rates across all private-sector industries, used as a comparison series to evaluate whether DOD spending effects are generic or specific to defense contracts (correlation with DOD shock: -0.07).

Extensive margin of employment: The change in the employment rate (entry from unemployment or non-participation into employment) as distinct from hours or wage adjustments among the already-employed; identified in the paper as the primary mechanism linking DOD shocks to differential social outcomes for no-bachelor’s households.

Deaths of despair: Drug-and-alcohol-related deaths and deaths by suicide, following Case and Deaton (2020); examined here at higher frequency as an outcome of labor market earnings changes induced by aggregate demand stimulus.

Occupational prestige (Siegel prestige score): A summary measure of job quality based on survey-derived perceptions of occupational standing (Siegel 1971), aggregated to the CBSA level by demographic group; used as a measure of upward job-ladder mobility in response to demand stimulus.

Source text origin: A classification of the text basis for a paper summary — full PDF or OA-HTML versus abstract-only; the pipeline hard-blocks summaries derived solely from abstract text.

Destabilizing Capital Flows amid Global Inflation

Mon, 01 Jan 0001 00:00:00 +0000

Layer 1 — Overview

Research Question

Bengui and Coulibaly ask whether the pattern of capital flows observed during the 2021–2023 global monetary tightening cycle — whereby capital flowed from low-inflation to high-inflation countries — was a stabilizing or destabilizing force for the global economy’s adjustment to cost-push shocks. Among the G7 and a broader sample of 26 jurisdictions, those with higher average CPI inflation (October 2021–March 2023) and larger cumulative interest rate hikes ran more negative current account balances over the same period, with the slope of the cross-sectional relationship between cumulative hikes and the current account equal to −1.29 (significant at 1%) and the slope between average inflation and the current account equal to −0.99 (significant at 1%), and over 75% of the top two quartile hikers running deficits while over 75% of the bottom two quartiles ran surpluses.

Model and Methodology

The authors build a standard continuous-time two-country general equilibrium model with nominal rigidities (Calvo price-setting), internationally traded bonds, and cost-push shocks modeled as wage markup shocks that create an output-inflation trade-off. The baseline model features no home bias (equal weights on domestic and foreign goods) and two tradable goods. Extensions introduce (i) consumption home bias (parameter α ∈ [0, 1/2]) and (ii) non-tradable goods. Policy is analyzed under two regimes: (a) free capital mobility (no taxes on financial transactions) with optimal cooperative monetary policy, and (b) a managed capital flow regime in which a planner jointly optimizes both monetary policy and a tax wedge on the international bond (τ^D_t). A second-order approximation of household utility yields a loss function penalizing world and cross-country output gaps, PPI inflation differentials, and the demand imbalance term θ_t. The quantitative section replaces optimal monetary policy with standard Taylor rules (φ_π = 1.5, φ_y = 0.25) and calibrates a Home cost-push shock to generate a peak CPI inflation rate of about 7%, with an annual autocorrelation of 0.65.

Main Findings

The paper’s central theoretical result (Proposition 2, “Topsy-Turvy Capital Flows”) is that, under the Marshall-Lerner condition (trade elasticity η > 1), a free capital mobility regime channels capital into the country with the most acute inflationary pressures — the very country whose central bank is most aggressively tightening — while the constrained-efficient managed regime would channel capital in the opposite direction. The mechanism operates through the supply side: capital inflows raise domestic households’ wealth, reducing their labor supply and thereby raising real wages and firms’ marginal costs. In the presence of non-tradable goods, an additional channel operates through the real exchange rate — capital inflows appreciate the domestic real exchange rate and inflate tradable-sector firms’ marginal costs independently of labor supply. Both channels worsen the central bank’s output-inflation trade-off.

In the quantitative exercise (Taylor rule setting, home bias α = 0.25, trade elasticity χ = 3), following the calibrated inflationary cost-push shock in Home:

Under free capital mobility: Home inflation rises to 8% on impact; Home output gap reaches −8.4%; Foreign output gap reaches +2.4%; Home runs a trade deficit of 2.5% of GDP on impact; Home’s initial policy rate hike is nearly 10% while Foreign’s is less than 1%.
Under the managed capital flow regime (capital flows reversed to outflows from Home): Home inflation on impact falls to nearly 6% (a reduction of approximately 2 percentage points); Home output gap is −6.8% (improvement of about 1.5 percentage points); Foreign output gap is 0.8% (improvement of about 1.5 percentage points); Home runs a trade surplus of 0.6% of GDP; Home’s initial hike falls to approximately 8% (roughly 2 percentage points lower) while Foreign’s rises to approximately 2.5% (roughly 1.5 percentage points higher).
The managed regime delivers average welfare gains of 0.78% of current consumption (0.03% of permanent consumption). Welfare gains are increasing in the trade elasticity η: at η = 10 (consistent with Yi 2003’s bilateral trade flow estimates), gains reach approximately 0.08% of permanent consumption or 1.9% of current consumption.

Scope Conditions

The topsy-turvy result (free mobility channels capital in the wrong direction) holds conditional on the Marshall-Lerner condition (η > 1 in the baseline; equivalently, the trade elasticity χ > 1). With consumption home bias, the condition weakens to: the trade elasticity exceeds the degree of home bias (χ > 1 − 2α, which is weaker than Marshall-Lerner). When home bias is strong relative to the trade elasticity, a purchasing power effect may dominate the wealth effect, and free capital mobility may instead deliver too little capital flow toward the depressed country — the opposite inefficiency. The welfare analysis throughout assumes symmetric initial net foreign asset positions. The key insight is specific to environments in which monetary policy faces an output-inflation trade-off from cost-push shocks; it is directionally opposite to the aggregate demand externality prescription that arises in demand-shortage environments (e.g., currency unions with productivity shocks), where optimal policy instead calls for capital to flow toward the more depressed country.

In depth

Q1. What is the empirical motivation for the paper, and how is the stylized fact documented?

A1: During October 2021–March 2023, jurisdictions with higher average CPI inflation and larger cumulative policy rate hikes ran more negative current account balances. The cross-sectional slope between average inflation and the current account-to-GDP ratio is −0.99 (R² = 0.22, significant at 1%), while the slope between cumulative hikes and the current account is −1.29 (R² = 0.27, significant at 1%). Among the top two quartiles of cumulative hikers, over 75% of jurisdictions ran current account deficits, while among the bottom two quartiles over 75% ran surpluses. Data come from the BIS (inflation and policy rates) and the OECD Main Economic Indicators (quarterly current accounts), covering 26 jurisdictions excluding Argentina, Russia, and Turkey.

Q2. What is the core externality the paper identifies, and why do atomistic agents fail to internalize it?

A2: When a household in the high-inflation country borrows from abroad for consumption smoothing (as the domestic central bank tightens), it raises domestic consumption and thereby reduces labor supply through a wealth effect, pushing up real wages and firms’ marginal costs. The central bank must then tighten further to achieve the same inflation stabilization, or accept a worse inflation outcome. Because this effect operates through economy-wide wages and prices (general equilibrium), atomistic households do not internalize it when making individual borrowing decisions. The paper shows formally that a marginal increase in Home borrowing dθ_t raises welfare losses by an amount proportional to the product of the Phillips curve slope κ, the co-state variable φ^D_t (equal to the cross-country output gap differential y^D_t under optimal monetary policy), and the direct effect on cross-country marginal cost differences (1/2). When output is more depressed in Home (y^D_t < 0), additional borrowing by Home tightens the constraint and lowers welfare.

Q3. What does the optimal capital flow management targeting rule say, and what is its economic interpretation?

A3: Proposition 1 states that under jointly optimal monetary and capital flow management, the demand imbalance (relative consumption) should satisfy θ_t = 2y^D_t. This means the planner generates a demand imbalance in favor of the less depressed country, reallocating spending away from the country with the most acute inflationary pressure. This is counterintuitive from a pure output stabilization view: policy deliberately shifts demand away from the country with the most depressed output. The logic is that reducing the domestic wealth of the high-inflation country lowers real wages, reduces firms’ marginal costs, and thereby relaxes the output-inflation trade-off for that country’s central bank.

Q4. What is the “topsy-turvy” capital flows result (Proposition 2), and under what condition does it hold?

A4: Under free capital mobility, standard neoclassical consumption-smoothing motives lead capital to flow into the country with the most depressed output (the high-inflation country): the trade deficit equals [(η−1)/η]·y^D_t. Under managed capital flows, the optimal regime instead mandates a trade surplus for the most depressed country: the trade balance equals −(1/η)·y^D_t. Comparing signs, the direction of capital flows is literally reversed — hence “topsy-turvy.” The result holds whenever Assumption 1 (η > 1, the Marshall-Lerner condition in the baseline model) is satisfied, which the authors argue has compelling empirical support (trade elasticities estimated at 7–17 in the literature).

Q5. How does the presence of home bias in consumption affect the externality and the topsy-turvy result?

A5: With home bias (α < 1/2), capital inflows also appreciate the terms of trade, which lowers the relative price of imports in terms of domestic goods and reduces marginal costs for domestic tradable firms — a “purchasing power effect” that partially offsets the wealth effect. The optimal capital flow targeting rule becomes θ_t = [1 − (1−2α)/(2(1−α)η)]·2y^D_t. Under the condition that the trade elasticity exceeds the degree of home bias (χ > 1 − 2α, strictly weaker than Marshall-Lerner), the wealth effect dominates the purchasing power effect and the topsy-turvy result is preserved. Below a knife-edge curve in the (α, η) parameter space, the purchasing power effect dominates and free capital mobility results in too little rather than too much capital flowing toward the high-inflation country.

Q6. Does the externality always imply excessive capital flow volatility?

A6: No — this is a novel contribution relative to the prior literature. In the limiting case of a unit intratemporal elasticity (η → 1, the Cole-Obstfeld case), trade is balanced at all times under free capital mobility. Under managed capital flows, however, capital should flow from the most depressed to the least depressed country. This means the externality can result in too little rather than too much capital flow. The standard normative literature (e.g., Bianchi 2011) has focused on excessive capital flow volatility; the supply-side channel identified here shows that market failures can sometimes lead to insufficient external imbalances.

Q7. How does the paper’s mechanism differ from aggregate demand externalities as in Farhi and Werning (2016)?

A7: Farhi and Werning (2016) study demand-shortage environments (fixed exchange rates or zero lower bound) where constraints on monetary policy mean output is demand-constrained. Their prescription is to channel capital toward the most depressed country to stimulate demand for undersupplied goods. In Bengui and Coulibaly, monetary policy is unconstrained but faces an output-inflation trade-off from cost-push shocks. Here, the depressed output reflects the central bank’s deliberate demand contraction to fight inflation, not an inability to stimulate. The optimal response is therefore to shift spending away from the high-inflation (most depressed) country to reduce supply pressure — the opposite direction. Formally, in the demand-shortage case with unit elasticity and home bias, the optimal trade balance targeting rule is nxt = [(1−2α)/(4(1−α))]·ỹ^D_t (trade deficit for most depressed country), while in the supply pressure case it is nxt = −[α/(1−α)]·y^D_t (trade surplus for most depressed country).

Q8. What does the non-tradable goods extension add to the baseline mechanism?

A8: The baseline model (two tradable goods, no home bias) transmits the externality only through the wealth effect on labor supply: capital inflows raise consumption, reduce labor supply, and raise real wages and marginal costs. In the non-tradable goods extension, a second channel operates through the real exchange rate. Capital inflows raise demand for non-tradable goods, appreciating the domestic real exchange rate and inflating the price of the consumption basket relative to domestically produced tradable goods. This raises marginal costs for tradable-sector firms independently of any labor supply response, and is therefore unaffected by whether preferences exhibit a wealth effect on labor supply. The paper shows that the optimal policy problem in this extension is isomorphic to the baseline: the loss decomposition (equation 42) yields two additive terms proportional to the share of tradable goods (wealth effect on labor supply) and the share of non-tradable goods (wealth effect on demand for non-tradables), respectively.

Q9. What does the quantitative exercise show about cross-country policy rate dispersion?

A9: Under free capital mobility with Taylor rules, the initial policy rate hike in Home following the calibrated shock is nearly 10%, while in Foreign it is less than 1% — a cross-country dispersion of roughly 9 percentage points. Under managed capital flows, Home’s initial hike falls to approximately 8% and Foreign’s rises to approximately 2.5% — a dispersion of roughly 5.5 percentage points. The authors interpret this as evidence that free capital mobility leads high-inflation countries to tighten excessively and low-inflation countries to tighten too little, generating an inefficiently large cross-country dispersion in monetary policy.

Q10. How does the welfare gain from managed capital flows vary with the trade elasticity?

A10: Welfare gains are increasing in the elasticity of substitution between domestic and foreign goods (η). At the baseline calibration of η = 2 (trade elasticity χ = 3, near the lower bound of empirical estimates), the gain is 0.78% of current consumption (0.03% of permanent consumption). At η = 10 (consistent with Yi 2003’s estimate needed to match bilateral trade flows), the gain rises to approximately 1.9% of current consumption (0.08% of permanent consumption). The welfare gain is defined as the percentage increase in permanent consumption required by a household under free capital mobility to be as well off as under managed capital flows.

Q11. What is the role of Lemma 1 (irrelevance of capital flow regime for world variables)?

A11: Lemma 1 shows that under optimal cooperative monetary policy, the paths of world output gap and world inflation are independent of the capital flow regime (i.e., independent of the path of θ_t). This follows because the “world” block of the model can be solved independently of the “difference” block and the demand imbalance. As a result, the entire normative analysis of capital flows reduces to the behavior of cross-country difference variables (y^D_t, π^D_t, and θ_t), greatly simplifying the analysis. It also implies that switching capital flow regimes does not affect the global total of output or inflation, only its distribution across countries.

Q12. What extensions do the authors suggest would enrich the analysis without invalidating the main insight?

A12: Three extensions are noted. First, additional monetary policy constraints — discretionary (non-commitment) policy, non-cooperative policy setting, or a currency union — would introduce extra stabilization constraints and generate additional terms in the capital flow management targeting rule but would not overturn the supply-side channel. Second, alternative goods pricing specifications (local currency pricing, deviations from the law of one price) would make additional variables like cross-country consumer price differentials relevant measures of policy tightness, again adding terms to the rule. Third, the insight is argued to apply more generally in heterogeneous-agent or multi-sector closed-economy models with nominal rigidities whenever private financial decisions affect the economy’s supply side through general equilibrium price effects.

Key Concepts

Cost-push shock (wage markup shock): In the paper’s model, a cost-push shock is a positive deviation of the wage markup (µ^w_t) from its steady-state value. It shifts the New Keynesian Phillips curve, creating an output-inflation trade-off: the central bank must accept either higher inflation or a larger negative output gap. It is not a demand shock; its policy implications are directionally opposite to demand shortage shocks.

Demand imbalance (θ_t): The log ratio of Home to Foreign consumption, defined as c_t − c^*_t = θ_t in the linearized model. Under free capital mobility and symmetric initial wealth, θ_t = 0 (consumption shares are equalized). Under managed capital flows, θ_t is the instrument of capital flow policy: setting θ_t > 0 shifts spending toward Home; θ_t < 0 shifts it toward Foreign. The loss function penalizes deviations of θ_t from zero as an independent inefficiency (cross-country consumption misallocation).

Topsy-turvy capital flows: The paper’s central finding that, following a cost-push shock, the direction of capital flows prescribed by constrained-efficient policy is opposite to the direction that free capital mobility generates. Under free mobility, capital flows into the high-inflation country (trade deficit there); under managed flows, capital should flow out of the high-inflation country (trade surplus there). The term is used to describe the directional reversal, not merely excessive magnitude.

Macroeconomic externality (supply-side): The failure of atomistic agents to internalize the general equilibrium effect of their borrowing decisions on domestic firms’ marginal costs (via real wages or the real exchange rate). This is the paper’s label for the source of inefficiency. It is classified as a supply-side externality to distinguish it from aggregate demand externalities (Farhi and Werning 2016), where the operative mechanism runs through demand for specific goods rather than through factor costs.

Trade elasticity (χ): In the baseline model, χ = η (elasticity of substitution between domestic and foreign tradable goods). With home bias, χ = 2(1−α)η. The trade elasticity plays the key role in determining whether the topsy-turvy result holds: the result requires χ > 1 (Marshall-Lerner in baseline) or, with home bias, χ > 1 − 2α (weaker condition). At χ = 1 (Cole-Obstfeld case), trade is balanced under free mobility, and managed flows call for capital to move from the most to the least depressed country — implying insufficient rather than excessive capital flows under free mobility.

Purchasing power effect: In the model with home bias, a capital inflow appreciates the terms of trade (the relative price of exports over imports), which raises the purchasing power of domestic firms and lowers their marginal costs. This effect partially offsets the wealth-effect-driven rise in marginal costs. Its strength is proportional to the degree of home bias (1−2α) relative to the trade elasticity 2(1−α)η. Under the paper’s weaker-than-Marshall-Lerner condition, the wealth effect dominates the purchasing power effect.

Managed capital flow regime: A policy regime in which the government imposes taxes on international financial transactions (τ_t for Home, τ^_t for Foreign) to control the demand imbalance θ_t, subject to the targeting rule θ_t = 2y^D_t (or its home-bias-adjusted counterpart). This regime accounts for the macroeconomic externality and delivers a constrained-efficient allocation given the presence of nominal rigidities. The tax wedge τ^D_t = (τ_t − τ^_t)/2 represents the gap in returns on the international bond faced by Home versus Foreign households.

World and difference formulation: Following Engel (2011) and Groll and Monacelli (2020), the model is decomposed into “world” variables (averages: y^W_t, π^W_t) and “difference” variables (cross-country gaps: y^D_t, π^D_t). The targeting rules and Phillips curves separate additively into world and difference blocks, and Lemma 1 establishes that the capital flow regime affects only the difference block. This decomposition is the analytical device that isolates the role of capital flows.

Devaluations, Deposit Dollarization, and Household Heterogeneity

Mon, 01 Jan 0001 00:00:00 +0000

Layer 1 — Overview

Research Question

Ferrante and Gornemann study the aggregate and redistributive effects of currency devaluations in emerging market economies, focusing on a feature that prior open-economy HANK models had not jointly incorporated: households hold dollar-denominated deposits that are disproportionately concentrated among wealthier agents, and these deposits sit on the liability side of leveraged, agency-constrained banks. The paper asks how this combination of deposit dollarization and household wealth heterogeneity shapes the macroeconomic and distributional consequences of a currency depreciation, and what it implies for the optimal degree of exchange-rate smoothing by the central bank.

Data and Empirical Motivation

The model is calibrated to match cross-sectional micro-data from the 2013 Uruguayan Household Financial Survey, which records the currency denomination of household assets and liabilities. As documented by Drenik et al. [2018] and confirmed by the authors for Uruguay, the top quintile of the wealth distribution holds close to 70% of liquid savings in dollars, while households with zero or negative net wealth have essentially no direct foreign-currency exposure. The baseline calibration targets a deposit dollarization rate of 40% of aggregate bank deposits, in line with the cross-country average reported for Latin America. The spread between bank lending and deposit rates is calibrated at 8% annualized for household loans (consistent with Uruguayan bank data over the prior 15 years) and 2% for capital returns, implying a bank leverage ratio of approximately 6.

Model

The framework is a small open economy New Keynesian model with two non-standard elements layered on a Bewley-Huggett-Aiyagari incomplete-markets household sector. First, households face idiosyncratic labor productivity risk and a borrowing constraint, generating a non-degenerate wealth distribution in which, at the calibrated steady state, approximately 8% of households are constrained borrowers, 22% are unconstrained borrowers, 27% hold zero liquid wealth and behave hand-to-mouth (HtM), 52% are net savers, and 1% are capitalists. Second, financial intermediaries face a Gertler-Karadi [2011] agency problem that generates an endogenous, time-varying spread between lending and deposit rates. Households can save in local- or foreign-currency bank deposits and in foreign bonds, but can only borrow through domestic banks. The currency composition of household portfolios, which is a linear function of household wealth in the baseline, maps through market clearing into the banks’ currency mismatch, so that a wealthier-household preference for dollar deposits directly determines the bank’s foreign-currency liability share.

Main Findings with Quantitative Magnitudes

The paper’s central experiment is a 100 basis-point annualized increase in the foreign interest rate with persistence 0.85, which induces a currency depreciation.

Aggregate amplification: Combining a HANK household sector with leverage-constrained banks exposed to currency mismatch causes aggregate consumption to drop approximately twice as much as in a representative-agent New Keynesian (RANK) model with constrained banks, and output to decline more than 1% — roughly 30% larger than the 0.75% decline in the RANK model with financial frictions. In contrast, absent banking frictions, a bank-less HANK model would generate an output expansion because the standard expenditure switching channel dominates.
Channels: The paper decomposes the consumption decline into (a) a labor income channel — lower hours and wages caused by the financial accelerator contraction account for approximately two-thirds of the aggregate consumption decline — and (b) a borrowing rate channel — the endogenous rise in household lending spreads accounts for approximately one-third. In a counterfactual model in which the spread on household loans is held fixed, the decline in consumption and output is approximately 50% smaller than in the baseline, confirming that the borrowing rate channel and its general-equilibrium feedback onto wages and asset prices are responsible for more than half of the baseline output decline.
Distributional effects: Within the baseline model, unconstrained borrowers see their consumption fall on average by more than 3.5% on impact; constrained borrowers’ consumption falls by more than 5% in the second period as interest payments jump. Zero-wealth HtM agents cut consumption roughly one-for-one with the more-than-2% decline in real labor income. Wealthier savers and capitalists are partially insulated through their dollar holdings, which gain real value during the depreciation.
Portfolio composition and deposit dollarization: When the deposit dollarization rate is raised from the baseline 40% to 80% (to match high-dollarization countries such as Uruguay at the extreme), investment declines approximately 12% (versus 6% in the baseline) and aggregate consumption falls approximately 1.7% (versus 1% in the baseline), with the output decline more than twice as large as in the baseline. Wealthier households’ consumption path is actually higher in the high-dollarization calibration because of larger windfall gains on their dollar portfolios, while poorer households bear the amplified downturn through stronger labor income and borrowing rate channels. This produces a novel distributional result: stronger currency hedging by richer households deepens the aggregate recession and worsens outcomes for poorer agents.
Monetary policy: In the baseline 40% dollarization calibration, reacting to exchange rate changes by raising domestic interest rates is welfare-detrimental for most households: the gain from partially stabilizing banks’ balance sheets is more than offset by the contractionary effect of higher rates on aggregate demand and spreads. A modest response (κ_e ≈ 0.04 in the ex-ante welfare experiment) is preferred, conditional on aggregate dynamics. When dollarization is 80%, a small degree of exchange rate leaning (κ_e = 0.5) can improve welfare for most agents, as the benefit from protecting banks’ balance sheets becomes larger relative to the cost of tighter monetary conditions.

In depth

Q1. What three stylized facts about liability dollarization motivate the model, and how does the model’s structure capture each?

A1: The three facts are: (i) banks and firms borrow in foreign currency; (ii) foreign-currency bank debt is matched by dollar-denominated deposits from domestic households; (iii) those deposits are held predominantly by wealthier households. The model captures (i) and (ii) by having the bank hold a currency mismatch on its balance sheet — local-currency loans on the asset side, foreign-currency deposits on the liability side. Fact (iii) is captured by assuming a linear portfolio rule in which household dollar deposit share is an increasing function of wealth, calibrated to the slope observed in Uruguayan micro-data, with borrowers restricted to local-currency debt.

Q2. Why does a bank-less HANK open-economy model produce an output expansion rather than a contraction following a foreign interest rate shock in the calibration used?

A2: Without banking frictions, the expenditure switching channel dominates. A rise in the foreign interest rate depreciates the real exchange rate by roughly 1%, making domestic goods cheaper and raising exports by approximately 2%. In the bank-less HANK, this export boost causes hours and real labor income to increase, and high-MPC households (HtM and constrained borrowers) raise consumption. There is no financial accelerator operating through the bank’s balance sheet to offset this stimulus, so output expands rather than contracts.

Q3. Through what exact mechanism does bank currency mismatch transform an exchange rate depreciation into a financial accelerator event?

A3: A weaker domestic currency raises the real cost of repaying foreign-currency deposits (R_Dt jumps on impact), directly eroding bank net worth (N_t). As net worth falls and leverage rises, the bank’s incentive constraint tightens, requiring spreads on both capital loans and household loans to increase jointly (per equation 21, the ratio of spreads moves one-for-one with the ratio of diversion parameters). Lower asset prices further reduce the return on capital, feeding back into net worth in the standard Gertler-Karadi financial accelerator loop. In the RANK with banks benchmark, investment declines approximately 6% compared to only 1% in the frictionless RANK.

Q4. What is the borrowing rate channel, and how is it distinct from the balance-sheet exposure channel studied in De Ferra et al. [2020]?

A4: The borrowing rate channel operates through the endogenous widening of bank lending spreads following a net worth erosion: when banks’ leverage constraint binds more tightly, both the spread on firm capital and the spread on household loans rise simultaneously (equation 21). This forces even households who borrow only in local currency — and thus have no direct exchange-rate exposure on their liabilities — to face sharply higher borrowing costs, causing their consumption to fall steeply. De Ferra et al. [2020] study a different channel in which households borrow in foreign currency and suffer a direct balance-sheet loss from depreciation; the borrowing rate channel in this paper is distinct because it operates through financial intermediary frictions rather than through direct currency exposure of household debt.

Q5. How much of the aggregate consumption decline is attributable to the borrowing rate channel versus the labor income channel, and how do the authors establish these shares?

A5: The decomposition exercise (Figure 6) simulates each household’s response to a single price path at a time while holding all other prices at steady state. The labor income channel — the decline in real wages and hours caused by the contraction in output — accounts for approximately two-thirds of the aggregate consumption decline. The borrowing rate channel accounts for approximately one-third. Separately, a counterfactual model in which the household loan spread is held fixed produces consumption and output declines roughly 50% smaller than the baseline, showing that the borrowing rate channel and its second-round effects on wages and asset prices together account for more than half of the output decline in general equilibrium.

Q6. How does the distribution of dollar deposits across the wealth distribution affect the severity of the downturn, and what is the novel redistribution result?

A6: Through market clearing for local-currency deposits (equation 44), a larger household demand for dollar deposits directly raises the bank’s foreign-currency liability share (x^D_bt), magnifying the bank’s currency mismatch. Raising the deposit dollarization rate from 40% to 80% causes bank net worth to decline twice as much as in the baseline, investment to fall roughly 12% versus 6%, and aggregate consumption to fall roughly 1.7% versus 1%, with output declining more than twice as much. The novel distributional result is that wealthier savers and capitalists are actually better off in the high-dollarization scenario because their windfall dollar gains are larger, while poorer households suffer a more severe recession through the labor income and borrowing rate channels. Hence, stronger currency hedging by the rich deepens the aggregate recession and worsens distributional outcomes for the poor.

Q7. What happens when borrowers are assumed to hold foreign-currency debt rather than local-currency debt, as in De Ferra et al. [2020]?

A7: In this alternative calibration, borrowers face a direct balance-sheet loss from depreciation, causing constrained borrowers’ consumption to drop more steeply on impact. However, since household loans represent only approximately 5% of annual GDP in the baseline, the boost to bank net worth from having dollar-denominated loan assets is modest compared to the reduction in the dollar deposit liability. As a result, the path for investment is very similar to the baseline, while on impact consumption drops about 20% more and output declines about 10% more than in the baseline model.

Q8. What welfare implications arise from removing dollar deposits entirely from savers’ portfolios?

A8: In a calibration where households hold only local-currency assets (with banks’ currency mismatch maintained through external dollar borrowing), savers lose their windfall dollar gains during depreciation. The consumption of savers drops about 25% more than in the baseline on impact, and capitalists experience even larger changes. Because of general equilibrium feedback through wages and prices, poorer households also cut consumption more, causing aggregate consumption to fall approximately 20% more than in the baseline and output to decline approximately 5% more on impact.

Q9. Under what dollarization conditions does exchange rate stabilization through monetary tightening improve welfare, and why?

A9: Under the baseline 40% dollarization, raising domestic interest rates in response to depreciation is welfare-detrimental for most households because higher rates depress asset prices, tighten the bank’s leverage constraint, worsen the borrowing rate channel and the labor income channel for low-net-worth agents, more than offsetting the benefit from partially stabilizing the bank’s balance sheet. Only a very modest response (κ_e ≈ 0.04) is preferred. When deposit dollarization is 80%, the benefit from protecting the bank’s balance sheet is proportionally larger; a moderate reaction (κ_e = 0.5) can improve welfare for most households, though further tightening (κ_e = 5) causes bank net worth to fall more than 20% and leads to a deeper recession, reversing the gains.

Q10. How does the quarterly average MPC in the model compare to external estimates, and why is the MPC distribution central to the paper’s mechanism?

A10: The quarterly average MPC in steady state is approximately 27%, which implies an annual MPC of approximately 71%, consistent with Hong [2020b]’s estimates for Peru. The MPC distribution is central because the amplification mechanisms — both the borrowing rate channel and the labor income channel — work by hitting high-MPC agents (HtM households and constrained borrowers) hardest. Without a sufficiently high mass of high-MPC agents, changes in spreads and labor income would have muted aggregate consumption effects. The presence of approximately 27% of households with zero liquid wealth at the borrowing spread is itself endogenously generated by the bank’s agency problem, which creates a wedge between saving and borrowing rates.

Q11. How does the HANK model without banks compare to the RANK model without banks in transmitting the foreign interest rate shock?

A11: Both HANK-without-banks and RANK-without-banks generate output expansions through the expenditure switching channel. However, in the bank-less HANK, aggregate consumption declines only half as much as in the frictionless RANK because high-MPC households amplify the positive real income effect from rising labor income. Some household groups (HtM agents and constrained borrowers) actually increase consumption on impact due to higher real labor income, the Fisher channel reducing the real value of domestic-currency debt, and portfolio gains for savers holding dollar assets.

Q12. What role does the monetary policy Taylor rule play during the baseline devaluation, and how does it interact with the financial accelerator?

A12: The standard Taylor rule (coefficient 1.5 on domestic inflation) causes the central bank to raise rates in response to the CPI inflation spike accompanying the depreciation. Higher domestic rates compress the real exchange rate depreciation and reduce the boost to exports, but also directly increase banks’ funding costs, contributing to the financial accelerator by compressing the return on capital. This interaction means that the baseline monetary policy passively amplifies the banking-sector contraction relative to a model with no monetary response.

Key Concepts

Deposit dollarization: The share of domestic bank deposits denominated in foreign currency, held by domestic households. In the paper’s calibration this is set at 40% of aggregate bank deposits (baseline) or 80% (high-dollarization alternative), reflecting the empirical range across Latin American countries. It determines the bank’s foreign-currency liability share and thus the severity of currency mismatch.

Currency mismatch (banks): The gap between the currency denomination of a bank’s assets (local-currency loans to households and firms) and its liabilities (foreign-currency deposits from households). In the model, when the domestic currency depreciates the real cost of dollar deposits rises, directly eroding bank net worth without any offsetting appreciation of loan assets.

Borrowing rate channel: The mechanism by which a decline in bank net worth, caused by currency mismatch losses, tightens the bank’s incentive constraint and forces up the spread on household loans. This raises borrowing costs for households who have no direct foreign-currency exposure on their balance sheets, causing high-MPC borrowers to cut consumption sharply and thereby depressing aggregate demand and wages. This channel is distinct from the direct balance-sheet channel studied in De Ferra et al. [2020].

Labor income channel (in an open economy with banking frictions): The mechanism by which the financial accelerator — reduced credit supply and lower capital demand following bank net worth erosion — depresses output, hours, and wages, causing a decline in real labor income that hits high-MPC workers regardless of their asset-portfolio currency composition. Accounts for approximately two-thirds of the aggregate consumption decline in the baseline experiment.

Hand-to-mouth (HtM) agents: In this paper’s setting, HtM behavior is not a permanent household state but arises endogenously for households who hold zero liquid wealth because the bank’s endogenous lending spread makes both saving and borrowing suboptimal for them in a given period. Their consumption moves approximately one-for-one with current labor income, making them a key amplifier of real income fluctuations.

Financial accelerator (with currency mismatch): The Gertler-Karadi [2011] mechanism as augmented by exchange-rate exposure: a currency depreciation erodes bank net worth through the dollar deposit liability, tightening the leverage constraint, raising spreads on capital and household loans simultaneously, lowering the price of capital, further reducing net worth, and feeding back to reduce credit supply. The currency mismatch channel and the asset-price channel interact to amplify the initial shock.

Portfolio dollarization rule: The assumption that each household’s share of savings held in foreign-currency deposits is a linear function of net wealth (x_i = λ_bar + λ·b_i, with λ > 0 and x_i = 0 for borrowers). This rule is calibrated to match the wealth-gradient of dollar holdings in the 2013 Uruguayan Household Financial Survey, and through market clearing it pins down the aggregate bank deposit dollarization rate and the distributional exposure of households to exchange rate shocks.

Exchange rate stabilization trade-off: The central bank’s choice of how much to raise domestic interest rates in response to a depreciation (parameterized by κ_e in the augmented Taylor rule). A higher κ_e reduces the bank’s currency mismatch loss but simultaneously depresses asset prices and raises borrowing costs, potentially worsening the financial accelerator. The paper shows the net welfare effect depends critically on the level of deposit dollarization: at 40% dollarization aggressive leaning is harmful for most agents; at 80% dollarization a moderate response (κ_e = 0.5) can be welfare improving.

Diversification, Market Entry, and the Global Internet Backbone

Mon, 01 Jan 0001 00:00:00 +0000

This paper investigates how buyer demand for supplier diversification shapes entry incentives and market structure, using the global undersea fiber-optic cable industry as the empirical setting. The research question has two parts: first, how much of observed cable entry and surplus generation is attributable to buyers’ diversification motives rather than standard price competition; and second, whether market forces produce too much or too little diversification relative to the social optimum.

The empirical setting spans 2005–2021 and covers the worldwide network of undersea cables that carries more than 98% of all international internet traffic. Cables fail frequently — hundreds of faults per year — and industry professionals confirm that “no customer would buy capacity on a single cable.” The median monthly price for a 10Gbps lease fell from $55,500 in 2005 to $2,200 in 2021, and the number of active cables roughly doubled over the sample period.

The authors use proprietary data from TeleGeography covering cable characteristics (construction costs, capacity, landing points, entry dates), quarterly bandwidth prices at the city-pair level, annual used bandwidth at the country-pair level, and 168 documented cable faults. Markets are defined as country-pairs in calendar quarters.

The theoretical model begins with a representative buyer who splits bandwidth purchases equally across n symmetric cable operators to minimize expected disruption costs. Because disruption shocks are i.i.d. across cables, adding suppliers reduces the variance of realized bandwidth delivery, lowering the required over-provisioning buffer. This generates a “market expansion” channel: entry increases aggregate demand holding prices fixed, not just through price competition. The aggregate demand equation takes log-linear form with cable count indicators alongside price and demand shifters.

The structural model adds a dynamic oligopoly game where firms make entry and exit decisions as a non-stationary Markov Perfect Equilibrium, with Cournot competition in each period. The three-step estimation procedure recovers: (1) price elasticities and diversification parameters from an IV demand regression using electricity generation cost shares as instruments; (2) marginal costs from firms’ first-order conditions; (3) entry and fixed costs from a nested pseudo-likelihood (NPL) estimator, supplemented by construction cost data to separately identify entry costs given the near-absence of observed exits.

Key demand results: the IV price elasticity is −1.36. The market expansion effect is large and exhibits decreasing marginal returns — entry of a second cable expands demand by as much as a 28.3% price decrease; a third cable is equivalent to a 19.3% price decrease; an eighth cable is equivalent to a 7.5% price decrease. The demand model achieves R² = 95%.

The first counterfactual removes the diversification channel entirely (entry raises competition only). Without diversification, cable investment falls by 12%. The net present value of total surplus per market over the sample period averages $1.11 billion under the observed equilibrium; supplier diversification accounts for 11% of total surplus and 27% of consumer surplus.

The second counterfactual quantifies two opposing distortions relative to the social optimum. Business-stealing creates excessive entry (entrants reduce incumbents’ output), while diversity effects create insufficient entry (marginal entrants generate surplus through diversification they cannot fully capture). At end-of-sample (2021-Q4), diversity distortions in terms of number of entrants range from 54% to 125% of the business-stealing distortion. Business-stealing tends to dominate for most markets, producing moderately excessive entry. Relative to the market outcome, total surplus under the social planner’s solution is on average 10% higher: 53% of this welfare gap is attributable to diversity effects and 47% to business-stealing effects. These findings hold across market heterogeneity in entry costs, market size, and demand growth.

The paper concludes that profit-maximizing suppliers fail to fully internalize diversification-related social benefits, and that targeted entry subsidies would pass cost-benefit tests in settings where diversity distortions dominate.

Q: What is the core mechanism by which supplier diversification expands demand? A: When buyers split purchases across n cable operators whose disruption shocks are i.i.d., adding a supplier reduces the variance of realized delivered bandwidth. The buyer therefore needs to hold a smaller over-provisioning buffer to achieve the same expected level of used bandwidth B. This lowers the effective cost of a given quantity of used bandwidth, shifting the aggregate demand curve outward. As the number of suppliers grows to infinity, the expected disruption cost converges to zero.

Q: How large is the market-expansion effect of diversification empirically? A: The effect is large but exhibits decreasing marginal returns. Entry of a second cable expands demand by as much as a 28.3% price reduction holding prices fixed; the third cable is equivalent to a 19.3% price reduction; and the eighth cable is equivalent to a 7.5% price reduction. All cable-count coefficients are positive and statistically significant in the IV demand model.

Q: How is price endogeneity addressed in the demand estimation? A: Bandwidth prices are instrumented using the marginal cost of electricity generation — specifically, country-level electricity generation shares (coal, gas, oil) interacted with quarterly commodity price series for coal, gas, and oil (Brent crude, Australian coal price, EU natural gas price). The first-stage results indicate electricity costs are strong predictors of bandwidth prices. Accounting for endogeneity raises the price elasticity from an OLS level to −1.36 in absolute value, consistent with the expected direction of OLS bias.

Q: What share of cable investment and surplus is attributable to diversification motives? A: In the counterfactual where the diversification channel is eliminated — entry raises competition and lowers prices but provides no diversification benefit — cable investment falls by 12%. Under the observed equilibrium, the net present value of total surplus per market over 2005–2021 averages $1.11 billion; supplier diversification accounts for 11% of this total surplus and 27% of consumer surplus.

Q: How are the two distortions — business-stealing and diversity — defined and separated? A: Business-stealing distortion arises because entrants reduce incumbents’ outputs and revenues, so private entry benefits exceed social benefits, leading to excessive entry. Diversity distortion arises because entrants create surplus for buyers through diversification but cannot fully capture it without perfect price discrimination (following Spence (1976) and Mankiw and Whinston (1986)), leading to insufficient entry. The authors disentangle these by comparing: (i) the social planner’s solution (eliminates both distortions), and (ii) a coordinated entry solution maximizing producer surplus (eliminates only business-stealing). The residual gap between the two identifies the diversity distortion.

Q: What is the net direction and magnitude of distortion in equilibrium market structure? A: At 2021-Q4, for most markets, business-stealing dominates, leading to moderately excessive entry. Diversity distortions in number of entrants range from 54% to 125% of the business-stealing distortion across markets. Relative to the market outcome, the social planner’s solution yields average total surplus that is 10% higher. Of that welfare gap, 53% is attributable to diversity effects and 47% to business-stealing effects.

Q: How do market characteristics affect which distortion dominates? A: The paper analyzes cross-market heterogeneity and identifies market features — including the size of entry costs, market size, and the rate of demand growth over time — as determinants of whether insufficient diversification or excessive entry is the binding distortion. Markets with higher entry costs or slower demand growth are more likely to exhibit insufficient diversification.

Q: How are entry costs identified given the near-absence of cable exits in the data? A: Because exit events are rare in a nascent industry — only a handful of exits observed, mostly after 2020 — entry and fixed costs cannot be separated by exit decisions alone. The authors address this by using cable-level construction cost data from TeleGeography to estimate entry costs outside the dynamic model. With entry costs in hand, firms’ optimal entry decisions identify fixed costs. Scrap values are normalized to zero, consistent with industry reports that retired cables are typically abandoned on the seabed.

Q: What role does the non-stationarity of the market environment play in the model? A: The data covers the industry’s earliest growth phase, with demand growing by roughly three orders of magnitude (used bandwidth from 5 Tbps in 2005 to 2,886 Tbps in 2021) and prices falling by a factor of roughly 25. The authors use a non-stationary Markov Perfect Equilibrium concept in which strategies and transition functions are indexed by time, aligning with the treatment of high-tech commodities in Igami (2017).

Q: What are the policy implications of the findings? A: Because profit-maximizing suppliers do not fully internalize the diversification-related social benefits of entry, entry rates can be sub-optimal from a welfare perspective when diversity distortions dominate. The authors suggest targeted entry subsidies would pass cost-benefit tests in such cases. For antitrust analysis, regulators who ignore the demand-expansion effect of incremental suppliers may incorrectly judge a market as sufficiently competitive. In merger review, authorities must account for firms’ private incentives to provide diversification to reach accurate welfare conclusions.

Q: How does the paper verify that diversification demand is not a spurious empirical artifact? A: Several checks support the causal interpretation. The estimated demand parameters are consistent with the predictions of the consumer-level utility maximization problem derived analytically: decreasing marginal returns to diversification and a positive relationship between the number of suppliers and demand. The demand model achieves R² = 95%, suggesting limited unobserved confounders. Additionally, 78% of cable faults involve only a single cable, confirming that disruptions are geographically isolated and that cross-cable diversification provides genuine insurance value.

Q: What are the main data limitations acknowledged by the authors? A: The authors cannot observe cable-level revenue or market shares, nor contracts between buyers and sellers; only aggregate country-pair used bandwidth is observed. Price coverage is not comprehensive — TeleGeography collects prices on a voluntary basis from dozens of providers. The cable faults dataset (168 faults) represents only a subset of total faults, as collection focuses on publicly disclosed events. The demand model also does not explicitly account for substitution patterns across firms due to lack of firm-level market share data, though the high R² partly mitigates this concern.

Diversification (in this paper’s sense): Buyers’ practice of splitting bandwidth purchases across multiple cable operators to reduce exposure to idiosyncratic disruption risk. Diversification across n cables with i.i.d. disruption shocks reduces the variance of realized delivered bandwidth and lowers the required over-provisioning buffer, making the effective cost of a given usage level B a decreasing function of n.

Market Expansion Effect: The channel through which entry of additional cable suppliers raises aggregate demand holding prices fixed. This occurs because each additional supplier reduces disruption risk, allowing buyers to demand more used bandwidth for the same price. It is distinct from the conventional competition channel (entry lowering prices).

Diversity Distortion: The tendency toward insufficient entry arising because marginal entrants generate consumer surplus through diversification benefits but cannot fully capture this surplus absent price discrimination. Follows Spence (1976) and Mankiw and Whinston (1986).

Business-Stealing Distortion: The tendency toward excessive entry arising because entrants reduce incumbents’ output and revenues, creating a gap between private and social returns to entry.

Non-Stationary Markov Perfect Equilibrium: The equilibrium concept used for the dynamic entry game, in which strategies and equilibrium selection rules are indexed by calendar time to accommodate substantial secular trends in demand and costs — as opposed to a stationary MPE which assumes a stable long-run distribution.

Used Bandwidth vs. Purchased Bandwidth: Used bandwidth B is the amount the buyer is committed to delivering (to downstream customers or for internal use). Purchased bandwidth Q is what the buyer actually contracts for across all cables; Q > B because the buyer holds an over-provisioning buffer against disruption risk. The ratio B/Q is a decreasing function of the disruption cost parameter gamma and an increasing function of the number of suppliers n.

Nested Pseudo-Likelihood (NPL) Algorithm: The baseline estimator for the dynamic game, following Aguirregabiria and Mira (2007). It iterates on the best-response mapping to impose equilibrium restrictions. The authors supplement NPL with two-step estimators (1-PML, 1-MD) and the spectral algorithm of Aguirregabiria and Marcoux (2021), which solves for the root of a nonlinear system using a quasi-Newton method and is robust to fixed-point instability.

Do The Effects of Nudges Persist? Theory and Evidence from 38 Natural Field Experiments

Mon, 01 Jan 0001 00:00:00 +0000

This paper asks why the Home Energy Report (HER) — a widely deployed social-comparison nudge that shows households how their electricity consumption compares to their neighbors — produces behavioral changes that persist long after the nudge is discontinued, while analogous nudges in other domains (charitable giving, financial savings, voter turnout, tax compliance) fade almost entirely within a year or two. The authors formalize a research design to decompose the HER’s long-run effectiveness into two channels: technology adoption (a change in the stock of energy-efficient capital in the home) and habit formation (a change in the stock of habits or skills in the resident).

The identifying strategy exploits the administrative rule that when the initial resident in an HER experiment moves out, HER mailings stop immediately — but electricity consumption in the home continues to be observed as new residents occupy it. Under three assumptions — (1) treatment assignment did not influence the initial resident’s decision to move; (2) treatment assignment did not influence the type of resident who moved in; and (3) energy-efficient technology adopted in response to the HER remained in the home after the move — the post-move HER effect identifies the fraction of the long-run treatment effect attributable to technology adoption (ATK), and the remainder identifies the fraction attributable to habit formation (ATH).

Data come from 38 natural field experiments administered by Opower between 2008 and 2013 across 21 U.S. residential energy providers, comprising 61,310,166 electricity bills for 1,810,096 homes. The mover sample, restricted to homes where the initial resident deactivated service at or after the receipt of their fourth HER, contains 5,890,855 bills for 139,908 homes. Treatment and control homes enter the mover sample at statistically indistinguishable rates and have similar baseline electricity consumption.

The main findings: the HER reduced electricity consumption by 2.1 percent in the long run (the pre-move ATE). After the initial resident moved and the HER was discontinued, 1.1 percent of the reduction persisted in the home — attributable to technology. The habit channel accounts for the remaining 1.0 percent reduction. Normalizing by the ATE, 51.4 percent (s.e. = 13.1) of the long-run effectiveness is attributable to technology adoption and 48.6 percent to habit formation. The persistence of the post-move effect is robust across alternative specifications, different HER-receipt cutoffs, balanced panels, and exclusion of low-consumption move-period homes. A falsification test using rental homes — where tenants do not typically own appliances and the technology channel is therefore shut down — yields a null post-move effect, consistent with the balanced-habits assumption.

The authors use these results to explain a broader empirical pattern: one year after discontinuation, social comparison nudges targeting compliance, charitable giving, savings, and voter turnout retain on average only 4 percent of their initial effect, while nudges targeting energy and water conservation retain 65 percent. The paper argues this divergence reflects the relative abundance of enabling technologies in conservation contexts versus their absence in compliance or voting contexts. The findings also have cost-benefit implications: ignoring HER-induced technology adoption overstates net benefits by as much as 65 percent, depending on assumed technology cost per kWh saved (ranging from $0.03 per kWh saved per Gillingham et al. 2018 to $0.12 per kWh saved per Billingsley et al. 2014).

Scope conditions: results are specific to electricity-consumption nudges in the U.S. residential sector; the technology channel identification requires that adopted equipment stays in the home after a move; the decomposition rests on a linear production function for outcomes in habits and technology.

Q: What is the Home Energy Report and how was it administered in these experiments? A: The HER is a mailed social-comparison report that contrasts a household’s electricity consumption with that of similar neighbors. In each of the 38 waves, homes were observed for a 12-month baseline, then randomly assigned to treatment (receiving HERs) or control. HERs were mailed monthly, bimonthly, or quarterly; generation ceased when the initial resident deactivated electricity service.

Q: What is the paper’s central identification strategy? A: The authors exploit a discontinuity created when the initial treated resident moves out: HER mailings stop, but the home’s electricity consumption continues to be measured as new residents move in. Under three assumptions about non-interference of treatment with moving decisions, balanced habits of subsequent residents, and stability of adopted technology, the post-move HER effect point-identifies the technology-adoption component (ATK) of the long-run average treatment effect (ATE). The habit-formation component (ATH) is then inferred as ATE minus ATK.

Q: What are the three identifying assumptions and how are they tested? A: Assumption 1 (no effect of treatment on moving rates) and Assumption 2 (balanced habits of subsequent residents) are tested with the data; treatment and control homes enter the mover sample at statistically indistinguishable rates and have similar baseline consumption, supporting Assumption 1. The rental-home falsification test supports Assumption 2: rental homes show a null post-move effect, consistent with renters having balanced habits because the technology channel is inactive in rentals. Assumption 3 (stable technology after a move) is untestable from the data; the authors note that violation of this assumption would imply the post-move effect is a lower bound on ATK, making the technology-adoption estimate conservative.

Q: What are the main quantitative estimates of the decomposition? A: The pre-move (long-run) ATE is -2.1 percent of baseline electricity consumption. The post-move effect (ATK) is -1.1 percent, and the habit-formation component (ATH) is -1.0 percent. Normalizing by the ATE, 51.4 percent (s.e. = 13.1) is attributed to technology adoption and 48.6 percent to habits.

Q: How large is the HER effect in absolute terms during the comparison period? A: During the comparison period, the HER reduced average daily electricity consumption by approximately -1.8 to -2.3 percent in the first year and -1.5 to -2.0 percent in the second year, with 95 percent confidence intervals excluding zero. In levels, these correspond to roughly -0.6 to -0.9 kWh per day — equivalent to using 2 to 4 sixty-watt incandescent bulbs for 5 fewer hours per day.

Q: How persistent is the HER effect during the move period? A: In the first year of the move period the HER continues to produce reductions of -1.7 and -1.4 percent; more than a year after the initial resident’s departure the estimated effect is -1.2 percent. All move-period estimates are statistically significant at conventional levels.

Q: How does the paper explain variation in persistence across social-comparison nudge contexts? A: One year after discontinuation, nudges targeting compliance, charitable giving, savings, and voter turnout retain on average only 4 percent of their initial effect, while nudges targeting energy or water conservation retain 65 percent on average. The paper argues the divergence reflects the relative availability of enabling technologies: households can adopt long-lived, input-efficient technologies (appliances, fixtures) to reduce energy and water use, but analogous technologies to facilitate compliance, donations, or voting are largely unavailable or absent.

Q: How does this paper’s finding about technology adoption compare to Allcott and Rogers (2014)? A: Allcott and Rogers (2014) used participation in utility-sponsored energy-efficiency programs as a proxy for technology adoption and found it explained no more than 2 percent of the HER’s long-run effectiveness. The authors reject this conclusion: their decomposition attributes 51.4 percent to technology, which is estimated precisely enough to statistically reject the 2 percent figure from Allcott and Rogers (2014). They attribute the discrepancy to the imperfect proxy used by Allcott and Rogers and low statistical power in analogous analyses.

Q: What are the cost-benefit implications of accounting for HER-induced technology adoption? A: Assuming monthly HERs for one year, a household electricity price of $0.10/kWh, and benefits accruing over two years, the baseline net benefit (ignoring technology costs) is $32.38 per household (electricity savings of $44.38 minus $12 administration cost). Using a technology cost of $0.03/kWh saved (Gillingham et al. 2018), net benefits fall to $27.14. Using $0.12/kWh saved (Billingsley et al. 2014), net benefits drop to $11.43 — a reduction of up to 65 percent from the baseline estimate. The HER still passes cost-benefit analysis but prior evaluations that ignore technology costs overstate net benefits substantially.

Q: How robust are the decomposition results to alternative sample definitions and specifications? A: The qualitative findings are stable across: alternative sets of control variables (Table A1); mover samples defined by receiving as few as 1 or as many as 5 HERs before moving (Table A2, with pre-move effects of -2.08 and post-move effects of -0.93 to -1.04 across cutoffs); balanced panels requiring fixed observation windows in each period (Table A3); and exclusion of homes showing unusually low consumption in the move period (Table A4, post-move effects of -1.19 to -1.48).

Q: What policy implications does the paper draw for nudge design? A: Policymakers seeking persistent nudge effects should target behaviors that can be augmented by readily available technologies, or pair social-comparison nudges with opportunities to adopt new technologies. In voting contexts, combining social-comparison nudges with opt-in mail-in or online ballot defaults could produce more persistent effects. In savings and charitable giving, pairing social comparisons with automatic contribution-rate defaults (as in Madrian and Shea 2001; Thaler and Benartzi 2004) is predicted to produce longer-lived effects than the nudge alone.

Q: What methodological contribution does the paper offer beyond the HER application? A: The mover-based decomposition is a generalizable research design for separating human capital (habits, skills) from physical capital (technology, infrastructure) as channels of policy effectiveness. The authors suggest it can be applied using other natural separation events — such as student graduation or employee departure — to assess the extent to which nudges build human capital in both recipients and the organizations in which they are embedded.

Technology adoption channel (ATK): The component of the HER’s long-run average treatment effect attributable to increases in the stock of energy-efficient technologies in the home — identified empirically as the post-move HER effect that persists after the treated resident departs and the HER is discontinued.

Habit formation channel (ATH): The component of the HER’s long-run treatment effect attributable to changes in the habits or skills of the resident — inferred as the residual after netting the technology component (ATK) from the total long-run effect (ATE).

Post-move effect: The estimated difference in electricity consumption between treatment and control homes after the initial resident has moved out, the HER has been discontinued, and a new resident has taken occupancy; under the paper’s identifying assumptions this equals ATK.

Balanced-habits assumption: The identifying assumption that treatment assignment did not influence the characteristics or habits of residents who subsequently moved into homes in the experimental sample, so that the habits of incoming residents are comparable across treated and control homes.

Stable-technology assumption: The identifying assumption that energy-efficient technologies adopted in response to the HER remain in the home after the initial resident moves; relaxing this assumption implies the post-move effect is a lower bound on ATK.

Home Energy Report (HER): A mailed social-comparison report that contrasts a recipient household’s electricity consumption with that of similar neighboring households; the treatment studied across all 38 experiments in this paper.

Enabling technologies: Long-lived, input-efficient capital goods (appliances, lighting, insulation) that reduce the marginal cost of conservation and thereby lock in behavioral changes induced by a nudge; their relative abundance in energy and water conservation contexts — versus their absence in voting, giving, or compliance contexts — is the paper’s proposed explanation for cross-context variation in nudge persistence.

Does Deposit Insurance Promote Deposit Stability? Evidence from the Postal Savings System during the 1920s

Mon, 01 Jan 0001 00:00:00 +0000

Overview

Research question. Does deposit insurance promote financial depth by arresting the outflow of deposits from the banking system during periods of bank distress? The paper tests and quantifies the deposit-stabilizing effect of state-level deposit insurance schemes operating in the United States during the 1920s.

Setting and identification. Between 1908 and 1929, eight primarily Midwestern states adopted some form of deposit insurance. The paper exploits the discontinuity in deposit insurance coverage at state borders to identify the causal effect of insurance on depositor behavior. The identification strategy compares outcomes in contiguous city pairs straddling deposit-insurance (DI) and non-deposit-insurance (NDI) state borders — a quasi-experimental design that controls for observed and unobserved confounders by using narrow geographic areas where the only relevant policy difference is the presence or absence of deposit insurance.

Proxy for “mattress money.” The paper uses postal savings deposits as a proxy for money withdrawn from the banking system. The U.S. Postal Savings System (established 1911) was backed by the full faith and credit of the federal government, with a maximum individual account limit of $2,500, and was widely viewed as a far safer alternative to commercial bank deposits. The authors validate this proxy by demonstrating, via Johansen cointegration tests, that the nationwide ratio of postal savings balances to total bank deposits is cointegrated (rank 1) with the currency-deposit ratio — a well-established indicator of banking distress.

Data. The empirical analysis covers 1921–1929. The main postal savings dataset is drawn from Annual Reports of the Postmaster General. Bank suspension data are drawn from FDIC manuscript lists compiled in the 1930s by FDIC economist Clark Warburton, providing location, charter type, and suspension/reopening dates. The sample includes 74 city pairs across 14 states (7 DI: North Dakota, South Dakota, Nebraska, Kansas, Oklahoma, Texas, Mississippi; 7 NDI: Minnesota, Iowa, Missouri, Arkansas, Louisiana, Tennessee, Alabama), with an average distance between paired cities of approximately 18 miles.

Main findings — postal savings regressions (Table 4). Using OLS with city-pair and year fixed effects and standard errors clustered at the NDI city level, the paper finds that following a bank suspension within a 10-mile radius, postal savings deposits in NDI cities grew 16 percent more than deposits in the corresponding DI city. The effect is positive and statistically significant at the 20-mile radius but smaller — approximately 9 percent — and is statistically indistinguishable from zero at the 30-mile radius. The localized decay with distance is consistent with a geographically contained flight-to-safety response. Critically, when the same specification is estimated for periods after deposit insurance was discontinued, the effect at all radii is statistically nil, providing a falsification test ruling out omitted unobserved factors as the driver.

Persistence of effects (Table 5). Arellano-Bond GMM dynamic panel regressions confirm that the disintermediation effects are persistent. The lagged dependent variable enters with a negative and statistically significant coefficient (approximately −0.20 for the 10-mile regression), indicating mean reversion, but the bank suspension coefficients remain robust. Implied long-run effects for the 10-mile and 20-mile equations are approximately 0.151 and 0.100, respectively, suggesting sustained rather than transitory deposit diversion away from the banking system in the absence of deposit insurance.

Banking capacity (Table 6). Because the postal savings deposit limit constrained the intake of funds — particularly severely during distress episodes, as documented through narrative evidence from the 1915 Congressional Record — the postal savings regressions underestimate the true effect of deposit insurance. The paper therefore estimates an alternative specification at the county level, comparing deposits at state-chartered banks in paired DI and NDI border counties. The results indicate that deposit insurance is associated with approximately a 56 percent increase in county-level deposits at state-chartered banks (coefficient 0.574, significant at 5 percent, robust to inclusion or exclusion of year fixed effects). By contrast, the analogous coefficient for national banks — which were prohibited by the OCC from participating in state deposit insurance schemes — is positive but statistically insignificant, providing a placebo test consistent with the interpretation that deposit insurance, not unobserved county characteristics, drove the banking capacity difference.

Scope conditions. All effects are estimated for state-chartered bank deposits in predominantly agricultural, Midwestern border counties during 1921–1929, a period characterized by an average annual bank suspension rate of 2.22 percent (versus 0.3 percent during 1911–1920). The paper acknowledges that state deposit insurance schemes of this era generated moral hazard (as established by prior literature), and frames the contribution as quantifying the stability-enhancing component rather than the net welfare effect.

Policy implication. The 56 percent banking capacity differential implies that deposit runoffs in the absence of insurance are substantially higher than the 3–10 percent runoff rates assumed in the Basel III Liquidity Coverage Ratio (LCR) framework, and more consistent with the 25–50 percent runoffs observed in non-systemic institutions in Denmark following an exogenous reduction in deposit insurance limits (Iyer et al., 2016).

In depth

Q1. Why is the Postal Savings System a valid proxy for “mattress money,” and what evidence supports this?

The postal savings system was backed by the full faith and credit of the United States, making it categorically safer than commercial bank deposits, and was explicitly designed to attract savings hidden in mattresses. The authors validate the proxy empirically by showing that the nationwide ratio of postal savings balances to total bank deposits is cointegrated (Johansen test, rank 1) with the currency-deposit ratio — a series that rises during banking distress as depositors convert bank funds to currency. Contemporary narrative accounts from the 1915 Congressional Record further confirm that postal savings offices experienced sharp deposit inflows during local banking distress, with deposit intake frequently constrained by the $2,500 individual account cap.

Q2. What is the identification strategy, and why does it address endogeneity concerns?

The strategy exploits the discontinuity in deposit insurance at state borders by comparing relative postal savings deposit growth in contiguous city pairs — one city in a DI state, one in an adjacent NDI state — conditioning on bank suspensions within 10, 20, or 30 miles. The authors argue that deposit insurance legislation was a statewide political decision driven largely by partisan composition (Democrats favored it, Republicans opposed it), making it implausible that interests concentrated at border cities systematically determined which states adopted it. Six of the seven NDI control states introduced deposit insurance legislation but failed to pass it, underscoring that the policy variation was not determined by border-specific characteristics. A falsification test using the same city pairs after deposit insurance was discontinued shows zero effects, ruling out time-invariant unobserved heterogeneity as the driver.

Q3. What are the main quantitative results from the city-pair postal savings regressions?

Following a bank suspension within 10 miles, postal savings deposits in NDI cities grew 16 percent more than in DI cities (coefficient 0.162, significant at 5 percent). At the 20-mile radius the differential is approximately 9 percent (coefficient 0.0933, significant at 5 percent). At the 30-mile radius the coefficient is 0.0997 and statistically indistinguishable from zero. These results are estimated with OLS using city-pair and year fixed effects and standard errors clustered at the NDI city level, based on 524 observations for the 10- and 20-mile specifications and 66 observations for the post-discontinuation falsification regressions.

Q4. How does the paper establish that distance matters for the flight-to-safety effect?

The monotonic decline in the estimated coefficient from 0.162 (10 miles) to 0.093 (20 miles) to a statistically insignificant 0.100 (30 miles) indicates that the diversion of deposits into postal savings was geographically localized. This pattern is consistent with depositors responding primarily to nearby bank failures rather than to distant ones, and it supports the interpretation that the effect is driven by local banking distress rather than by state-level or regional macroeconomic shocks that would affect all pairs symmetrically.

Q5. Are the disintermediation effects of bank suspensions temporary or persistent?

The Arellano-Bond GMM dynamic panel regressions (Table 5) show that the effects are persistent. The lagged dependent variable coefficient is approximately −0.205 (10-mile) and −0.188 to −0.201 (20-mile), indicating partial mean reversion but not full reversal. Year-1, Year-2, and implied long-run dynamic effects are all statistically significant and of similar magnitude (approximately 0.145–0.152 for the 10-mile equation and 0.096–0.100 for the 20-mile equation), indicating that once depositors shift funds to postal savings in response to bank suspensions, a substantial portion of the effect persists in subsequent years. This is consistent with prior literature showing that deposits leave the banking system quickly but return slowly.

Q6. Why are the postal savings coefficient estimates considered a lower bound on the true effect of deposit insurance?

Two institutional features constrained the postal savings system from fully capturing flight-to-safety deposits. First, individual accounts were capped at $2,500, and narrative evidence shows that this limit was severely binding during distress — depositors attempted to place far more than the ceiling allowed. Second, the re-depositing rate of postal savings funds back into local banks was not 100 percent: during 1921–1923 only 32–47 percent of postal savings deposits were re-deposited in banks, compared to 72–82 percent in calmer years. Because the postal savings system could not absorb unlimited deposits and did not fully recycle absorbed funds into local banking, its level understates the true flight of deposits from the banking system in NDI states.

Q7. How does the county-level banking capacity test address the censoring problem?

The paper estimates log-ratio regressions comparing county-level deposits at state-chartered banks in DI versus NDI border counties, using a “DI Active” indicator that switches on when deposit insurance is in effect in a given state-year and switches off when schemes are discontinued. Because different states discontinued their insurance at different times, there is sufficient within-county variation to identify the DI coefficient even with year fixed effects. The estimated coefficient of 0.574 (without year FE) and 0.557 (with year FE) translates to approximately a 56 percent higher deposit level in state-chartered bank counties with deposit insurance, with virtually identical estimates across specifications.

Q8. What is the placebo test for national banks, and what does it show?

National banks were prohibited by the Office of the Comptroller of the Currency from participating in state deposit insurance schemes. If deposit insurance — rather than unobserved county characteristics — is responsible for the 56 percent banking capacity premium, then county deposits at national banks in DI states should show no corresponding premium. The Table 6 results confirm this: the DI Active coefficient for national bank deposits is positive (0.165 to 0.267) but statistically insignificant, providing a falsification result consistent with the causal interpretation for state-chartered banks.

Q9. How does the paper situate deposit insurance’s stabilizing benefits relative to its moral hazard costs?

The paper explicitly frames its contribution as quantifying the stability-enhancing component of deposit insurance separately from the moral hazard component. It cites extensive prior literature (Calomiris 1992, 1993; Wheelock 1992, 1993; Wheelock and Wilson 1994) establishing that the 1910s–1920s state schemes generated moral hazard: insured banks reduced capital-to-asset ratios, relaxed lending standards, and increased risk exposure. The paper does not contest those findings but argues that the two effects are analytically separable and that the stabilization benefit had significant quantitative magnitude — a benefit that should be accounted for when assessing the net welfare effects of deposit insurance design.

Q10. What are the implications for the Basel III Liquidity Coverage Ratio framework?

The Basel III LCR formula assumes that during distress 3 percent of “stable deposits” and 10 percent of “less stable deposits” run off. The paper’s finding that deposit insurance is associated with a 56 percent increase in banking capacity implies that in the absence of insurance, deposit runoffs are far higher than these Basel assumptions — substantially larger than 10 percent and more consistent with the 25–50 percent runoffs observed for non-systemic banks in Denmark following an insurance limit reduction (Iyer et al. 2016). The authors argue their results suggest that empirical grounding for the LCR runoff assumptions remains insufficient, consistent with critiques by Allen (2014) and Diamond and Kashyap (2016).

Key Concepts

Postal Savings System (as “mattress money” proxy). The U.S. Postal Savings System (1911–) accepted deposits up to $2,500 per individual, backed by the full faith and credit of the United States. In this paper, postal savings deposits are used as a quantitative proxy for money withdrawn from the banking system during distress — “money under the mattress” — validated by cointegration with the currency-deposit ratio.

Policy discontinuity / border-pair design. The identification strategy exploits the fact that deposit insurance was adopted at the state level, creating a sharp policy discontinuity at state borders. Contiguous city pairs straddling DI and NDI state borders are treated as quasi-experimental units, with the within-pair difference in postal savings deposit growth serving as the outcome, controlling for time-invariant city-level heterogeneity and common time effects.

Relative Postal Savings Deposit Growth (RPS). The dependent variable defined as the log-ratio of postal savings deposits in the NDI city to postal savings deposits in the DI city within a pair, and then first-differenced over time. This construction controls for city-pair-level time-invariant characteristics and isolates the differential response to bank suspensions.

Bank suspension. In this paper’s context, a bank suspension is any closure of a bank (state-chartered or national) at a specific geographic location, as recorded in FDIC manuscript lists compiled by Clark Warburton during the 1930s. The variable used in regressions is the change in the number of suspensions within R miles (R = 10, 20, 30) of the paired postal savings offices.

Financial depth / local banking capacity. The paper uses county-level deposits at state-chartered banks as a measure of local banking market size. Deposit insurance is hypothesized to increase financial depth by preventing the diversion of funds out of the banking system during distress, and the 56 percent estimated premium is the paper’s primary measure of the insurance’s capacity-enhancing effect.

DI Active indicator. A time-varying binary variable equal to 1 when deposit insurance was legally in effect in a given state at a given time, and 0 otherwise (including after repeal). Because different states repealed their schemes at different times (Oklahoma 1923, Texas 1927, South Dakota 1927, North Dakota 1929, Kansas 1929, Nebraska 1930, Mississippi 1930), this variable provides within-county variation that identifies the banking capacity coefficient after controlling for county and year fixed effects.

Moral hazard vs. stability-enhancing components. The paper distinguishes analytically between the moral hazard effect of deposit insurance (insured banks undertake riskier projects, reduce capital buffers, relax lending standards) and the stability-enhancing effect (depositors retain funds in the banking system, preventing runs). The paper’s contribution is to quantify the latter component in isolation, using a setting where the two effects can be separated by focusing on depositor — rather than banker — behavior.

Double Robustness of Local Projections and Some Unpleasant VARithmetic

Mon, 01 Jan 0001 00:00:00 +0000

This paper provides formal theoretical results on the relative robustness of local projection (LP) and vector autoregression (VAR) confidence intervals for impulse response inference when the data generating process (DGP) is locally misspecified. The research question is whether the widely held belief that LP estimators are more robust to misspecification than VARs is theoretically justified, and if so, precisely under what conditions and with what consequences for VAR inference.

The analytical framework models the DGP as a stationary structural VARMA(1, ∞) that is local to an SVAR(1), of the form y_t = Ay_{t-1} + H[I + T^{-ζ}α(L)]ε_t, where the MA component T^{-ζ}α(L)ε_t represents misspecification that vanishes at rate T^{-ζ} as sample size T grows. The key rate parameter is ζ ∈ (1/4, 1/2), which corresponds to misspecification large enough to be detected with probability approaching 1 by conventional Hausman-type specification tests, yet small enough that the bias-variance trade-off between LP and VAR remains non-trivial asymptotically. The framework encompasses under-specification of lag length, omitted variables, temporal aggregation, measurement error, and failure of shock invertibility — essentially all sources of dynamic misspecification relevant to linearized DSGE models.

The main finding on LP is a “double robustness” result: the conventional LP confidence interval achieves correct asymptotic coverage for all ζ > 1/4, even when misspecification is large enough to be detected with certainty. The mechanism is that the omitted-variable bias in the LP regression is of order T^{-2ζ} = o(T^{-1/2}) when ζ > 1/4, because both the direct effect of omitted lags on the outcome and the covariance of the residualized regressor with omitted lags are each of order T^{-ζ}, so their product is negligible relative to the T^{-1/2} standard deviation. This is formally analogous to double robustness in partially linear regression and debiased machine learning: LP is consistent if either the outcome-equation controls or the first-stage controls are correctly specified.

In stark contrast, the VAR estimator carries asymptotic bias of order T^{-ζ}, which is non-negligible relative to its T^{-1/2} standard deviation for ζ ≤ 1/2. This causes the conventional VAR confidence interval to severely undercover: for ζ ∈ (1/4, 1/2) the coverage converges to zero, and for ζ = 1/2 it converges to a level strictly below the nominal level.

The “no free lunch” result formalizes the trade-off. Setting ζ = 1/2 and bounding the noise-to-signal ratio at M²/T, the worst-case scaled VAR bias equals M√(aVar(β̂_h)/aVar(δ̂_h) − 1). This worst-case bias is small if and only if the VAR asymptotic variance is close to that of LP. When the VAR standard error is less than half that of LP — which is typical in applied practice — worst-case coverage falls below 48% even for M = 1. Moreover, the least favorable misspecification takes the form of exponentially decaying MA coefficients peaking at horizon h, a pattern consistent with standard economic theories of adjustment costs, learning, or overshooting, and is difficult to rule out on prior grounds. The Hausman test also provides weak protection: when M = 1, the odds of the test failing to reject are nearly 3-to-1 at the 10% significance level.

Simulations using the Smets and Wouters (2007) model with T = 240 observations confirm these results. With lag length selected by AIC (median selected p = 2), VAR confidence intervals materially undercover at all but very short horizons while LP achieves close to nominal coverage throughout. Increasing lag length to p = 4 or p = 8 ameliorates VAR undercoverage at short horizons but at the cost of making VAR confidence intervals essentially as wide as LP intervals, with substantial undercoverage persisting at longer horizons. For p = 4 the total misspecification measure is M ≈ 3.23; for p = 8, M ≈ 1.89.

Scope conditions: results are pointwise asymptotic in fixed model parameters and horizon; they abstract from order-T^{-1} small-sample biases from persistence or the nonlinearity of the impulse response transformation. The LP robustness result requires controlling for lags that are strong predictors of the outcome or impulse variables; omitting lags with small-to-moderate predictive power does not threaten coverage.

Q: What is the precise sense in which LP confidence intervals are “doubly robust”?

A: LP is doubly robust in the sense of partially linear regression: its bias from misspecified MA dynamics is the product of two errors, the estimation error in the outcome-equation lag controls γ̂ − γ_0 and the estimation error in the first-stage lag controls ν̂ − ν_0. In the local-to-SVAR model each error is of order T^{-ζ}, so their product is of order T^{-2ζ} = o(T^{-1/2}) whenever ζ > 1/4, making the omitted-variable bias negligible relative to the T^{-1/2} standard deviation. This means the asymptotic distribution of the LP estimator is completely invariant to the misspecification parameters α(L) and ζ.

Q: How large does misspecification need to be before LP coverage is threatened?

A: The LP double robustness result holds for all ζ > 1/4 regardless of the magnitude parameter M of the MA misspecification. Misspecification with ζ ∈ (1/4, 1/2) can be detected with probability approaching 1 asymptotically by standard specification tests — in particular, the Hausman test is consistent for this range — yet LP coverage remains exactly correct. There is no threshold M below which LP fails; robustness is structural, not contingent on misspecification being small.

Q: Under what conditions does the VAR estimator have zero asymptotic bias?

A: The VAR asymptotic bias is zero if and only if the lagged shocks ε_{j*,t-ℓ} for ℓ = 1, …, h lie in the span of the lagged data used for estimation. Two sufficient conditions from Corollary 3.2 are: (i) the true model is SVAR(p_0) and the estimation lag length p satisfies h ≤ p − p_0, so the extra lags absorb the residual MA structure; or (ii) the shock of interest is directly observed and ordered first, and h ≤ p. In these cases the VAR estimator is asymptotically equivalent to LP, with equal variance.

Q: What is the “no free lunch” result for VARs?

A: For ζ = 1/2 and noise-to-signal ratio bounded by M²/T, the worst-case scaled VAR bias equals M√(aVar(β̂_h)/aVar(δ̂_h) − 1) (Proposition 4.1). This quantity is small if and only if aVar(δ̂_h) ≈ aVar(β̂_h), meaning the VAR has little efficiency advantage over LP. Put differently, the only way to guarantee robust VAR coverage is to include enough lags that the VAR confidence interval becomes as wide as the LP interval. There is no procedure that simultaneously offers narrower intervals than LP and reliable coverage.

Q: How severe is the worst-case undercoverage of conventional VAR confidence intervals?

A: From Corollary 4.3, even for M = 1 (a noise-to-signal ratio of just 1/T), worst-case VAR coverage falls below 48% whenever the VAR asymptotic standard deviation is less than half that of LP — a configuration typical in applied practice. For larger M the undercoverage is worse: the formula 1 − r(M√(aVar(β̂_h)/aVar(δ̂_h) − 1); z_{1-α/2}) can approach zero. Furthermore, the worst-case probability that VAR fails to cover AND the Hausman test fails to reject misspecification simultaneously exceeds 46% when the VAR standard deviation is less than half that of LP (Corollary 4.4).

Q: Can the researcher detect the problematic misspecification using a Hausman test before it causes undercoverage?

A: Only weakly. When M = 1, the Hausman test fails to reject misspecification with probability approximately 74% (odds of nearly 3-to-1) at the 10% significance level, since r(1; z_{0.95}) = 26%. At the 5% level the odds of non-rejection are nearly 5-to-1, since r(1; z_{0.975}) = 17%. The least favorable misspecification also cannot be ruled out on economic-theory grounds: the least favorable MA polynomial has exponentially decaying coefficients peaking at horizon h, consistent with adjustment costs, learning, or overshooting.

Q: Does using a bias-aware critical value (Armstrong-Kolesár approach) resolve the VAR undercoverage problem?

A: The bias-aware VAR confidence interval CI_B(δ̂_h; M) achieves correct asymptotic coverage by inflating the critical value based on the known bound M on misspecification. However, the bias-aware VAR interval tends to be wider than the LP interval. Specifically, M must be quite small — apparently below 1 — for the bias-aware VAR to dominate LP in width regardless of DGP and horizon. For M ≥ 2 (noise-to-signal ratio above 4/T), bias-aware VAR is dominated by LP in interval width. The practical conclusion is that the simpler LP interval is preferable in most empirically relevant settings.

Q: What does the minimax model-averaging result say about optimal weighting of LP and VAR?

A: From Corollary 4.2, the minimax optimal weight on LP when estimating a convex combination of LP and VAR estimators is M²/(1 + M²). For M = 1 (equal noise-to-signal threshold), the optimal weight is 50% on each. For M = 2, the LP estimator receives 80% weight. In the Smets and Wouters simulations, M ≈ 3.23 for p = 4 lags, corresponding to an optimal LP weight of approximately 91%, and M ≈ 1.89 for p = 8 lags, giving an optimal LP weight of approximately 78%.

Q: What do the Smets and Wouters simulations show about AIC-selected VARs?

A: In 5,000 simulated samples of T = 240 observations from the Smets and Wouters (2007) model, the AIC selects a median lag length of p = 2. At all but very short horizons, VAR confidence intervals materially undercover while LP confidence intervals throughout achieve close to nominal coverage. A bootstrap correction for VARs somewhat improves coverage but leaves large distortions. Increasing lag length to p = 4 or p = 8 moves coverage closer to nominal at short horizons (h ≤ p) but makes VAR confidence intervals essentially as wide as LP, and substantial VAR undercoverage persists at longer horizons.

Q: Is the no-free-lunch result specific to univariate impulse responses?

A: No. Proposition 4.2 extends the result to simultaneous inference on multiple impulse responses. For any k × 1 linear combination R of the impulse response vector, the worst-case squared bias is M² λ_max(R[aVar(β̂) − aVar(δ̂)]R’), where λ_max denotes the largest eigenvalue. Because VAR impulse response estimates are often highly correlated across horizons, undercoverage can be particularly severe in the multivariate (joint confidence ellipsoid) case. The no-free-lunch principle holds: the VAR ellipsoid offers non-negligible worst-case bias as long as it offers any efficiency gain relative to LP for any linear combination of horizon-specific impulse responses.

Q: What is the practical recommendation for lag selection in LP and VAR?

A: The paper offers three practical guidelines. First, LP researchers should control for those lags of the data that are strong predictors of the outcome or impulse variables, using conventional information criteria (such as AIC) applied to a VAR in all variables to select the number of lags for LP control — omitting lags with small-to-moderate predictive power does not threaten coverage. Second, VAR researchers should increase the lag length until the VAR confidence interval is no longer substantially narrower than the corresponding LP interval. Third, conventional specification tests do not suffice to guard against VAR coverage distortions.

Local Projection (LP) Estimator: The LP estimator for the impulse response at horizon h is the OLS coefficient on the shock variable y_{j*,t} in a direct regression of y_{i*,t+h} on y_{j*,t}, the variables ordered before it, and lagged data. It is a “direct” estimator in that it does not iterate a one-step VAR forward.

Double Robustness: A property of LP whereby its asymptotic bias from MA misspecification equals the product of two estimation errors — in the outcome-equation lag controls and in the first-stage residualization controls — each of order T^{-ζ}, making their product of order T^{-2ζ} = o(T^{-1/2}) for ζ > 1/4. This is the LP analogue of the double robustness of partially linear regression estimators in debiased machine learning.

Local-to-SVAR Misspecification: A DGP of the form y_t = Ay_{t-1} + H[I + T^{-ζ}α(L)]ε_t in which the MA term T^{-ζ}α(L)ε_t represents misspecification that vanishes at rate T^{-ζ}. The rate parameter ζ governs the magnitude; ζ ∈ (1/4, 1/2) is the empirically relevant range where bias is detectable by specification tests yet the bias-variance trade-off between LP and VAR remains non-trivial.

No Free Lunch (for VARs): The result that the worst-case scaled VAR bias equals M√(aVar(β̂_h)/aVar(δ̂_h) − 1), implying that the VAR confidence interval has reliable (robust) coverage if and only if the VAR asymptotic variance is close to that of LP — i.e., there is no way to simultaneously have shorter confidence intervals than LP and guaranteed coverage robustness.

Noise-to-Signal Ratio: The quantity T^{-1}||α(L)||² = trace{Var(T^{-1/2}α(L)ε_t) Var(ε_t)^{-1}}, which measures the total magnitude of the MA misspecification relative to the variance of the shocks. The paper bounds this at M²/T and uses M as the sufficient statistic for worst-case bias and coverage.

Bias-Aware Critical Value: An inflated critical value cv_{1-α}(b) solving r(b; cv_{1-α}(b)) = α, used to construct a VAR confidence interval CI_B(δ̂_h; M) that achieves correct asymptotic coverage by accounting for the worst-case bias M√(aVar(β̂_h)/aVar(δ̂_h) − 1). The paper shows this approach typically produces intervals at least as wide as LP for M ≥ 2.

Asymptotic Bias of VAR (aBias): The scaled bias term T^{ζ}E[δ̂_h − θ_{h,T}] converging to aBias(δ̂_h) = trace{S^{-1}Ψ_h H Σ_{ℓ=1}^∞ α_ℓ D H’(A’)^{ℓ-1}} − e’{i*,n} Σ{ℓ=1}^h A^{h-ℓ} H α_ℓ e_{j*,m}. This term is structurally absent from the LP asymptotics due to the double robustness mechanism.

Dynamic Regulation with Firm Linkages: Evidence from Texas

Mon, 01 Jan 0001 00:00:00 +0000

This paper evaluates the efficiency of linked environmental regulation, a targeting mechanism whereby inspectors who discover violations at one plant can increase enforcement pressure on other plants sharing the same owner. The central research question is whether linking inspection decisions across co-owned plants adds value over unlinked, plant-level targeting and over random enforcement. The paper develops a new empirical framework of dynamic moral hazard under linked regulation, applies it to Texas environmental enforcement data, and uses the estimated model to evaluate counterfactual regulatory designs.

The empirical setting is the Texas Commission on Environmental Quality (TCEQ), which enforces the Resource Conservation and Recovery Act (RCRA, governing hazardous waste) and the Clean Water Act using a two-dimensional scoring system. A plant-level “site rating” score captures the individual plant’s compliance history, while a firm-wide “person rating” score aggregates the weighted average of plant scores across all plants under the same manager. Both scores feed into a multiplicative penalty escalation rule and a logit-form inspection probability function. The data are an unbalanced panel of 9,792 plants from 2012–2020, with detailed records of inspections, violations, penalties, scores, and ownership. The average plant is inspected with probability 0.289 per year and is linked with approximately 2 other plants through common ownership, though some firms own portfolios exceeding 50 plants.

The model features firms endowed with private types (abatement cost parameters) that may be affiliated within a firm’s portfolio, choosing continuous pollution actions to maximize discounted payoffs net of expected penalties. The regulator observes only scores and minimizes social costs subject to a binding inspection budget. A key computational innovation is “continuation value sufficiency”: because fully solving the portfolio optimization over large plant sets is infeasible due to the curse of dimensionality, each plant’s decision is approximated using three state variables — its own plant score, the firm-wide score, and a scalar summarizing other co-owned plants’ continuation values — governed by an AR(1) transition process. Estimation proceeds in three stages: OLS/logit for inspection and penalty parameters, simulated method of moments for type distribution and curvature parameters, and inversion of the regulator’s first-order conditions to recover sector-specific marginal social harms.

Descriptive evidence confirms three preconditions for linked regulation to add value: violations are positively correlated within firm portfolios, inspections are targeted toward higher-scoring plants on both dimensions, and higher inspection probabilities (instrumented by scores) are associated with fewer violations conditional on plant fixed effects. The coefficient on predicted inspection probability in the deterrence regression (specification 3, plant fixed effects, inspected years only) is −3.920, and an increase in log scores from 0 to 1.5 (roughly the interquartile range) reduces expected violations by approximately 0.5.

Structural estimates show that plant-level and firm-level type variance are similar (σ²_J = 0.209, σ²_F = 0.275), indicating moderate within-firm cost correlation. The curvature parameter y = 0.403 governs diminishing returns to negligence. In counterfactual experiments centered on a 30% budget increase (approximately 10 percentage point rise in per-plant inspection probability), unlinked plant-score-based escalations reduce social costs by 31.9% relative to random inspections. Linked firm-score-based escalations reduce social costs by 41.8% relative to random. The optimal mix — approximately 40% unlinked and 60% linked — reduces social costs by 42.2% relative to random. A back-of-the-envelope cost-benefit calculation calibrating utility-sector violation costs at $3,157 per violation and inspection costs at $740 finds a return of $11.77 in avoided social costs per additional dollar spent on inspections under the optimal mixed regime, versus $8.28 under random inspections.

The scope conditions are specific: the framework applies to RCRA and Clean Water Act plants in Texas, which typically cannot reallocate production across facilities (unlike Clean Air Act firms), so the pollution-substitution channel documented for multi-plant Clean Air Act firms is not modeled. The penalty schedule is taken as fixed; only inspection allocation is treated as a policy choice.

Q: What is linked regulation and why might it improve on unlinked enforcement? A: Linked regulation allows the regulator to increase inspection and penalty pressure on all plants owned by a firm when any one plant accumulates violations. It is efficient when compliance costs (types) are correlated within firms — e.g., due to managerial practices — because a violation at one plant is informative about likely violations at co-owned plants. This correlation means the regulator can target scarce inspection resources toward portfolios that are likely to harbor multiple bad actors, rather than inspecting each plant independently.

Q: How does Texas implement linked regulation in practice? A: Texas uses a two-dimensional scoring system. The plant score (“site rating”) summarizes the individual plant’s violation history over the past five years, normalized by complexity points. The firm score (“person rating”) is the complexity-weighted average of plant scores across all plants under the same manager. Penalties are then multiplied by escalation factors based on both scores: a firm in the “unsatisfactory performer” tier (firm score ≥ 55) faces a 1.1× firm escalation, while a “high performer” (firm score < 0.1) faces a 0.9× multiplier. Because the firm escalation applies to all plants in the portfolio simultaneously, even a small change in firm score can produce large aggregate deterrence effects across a large portfolio.

Q: What descriptive evidence supports the preconditions for linked regulation to add value? A: Three pieces of evidence are presented. First, a scatterplot (Figure 1) shows a positive cross-sectional correlation between a plant’s average violations per inspection and the leave-one-out average violations per inspection of its co-owned plants, indicating within-firm cost correlation. Second, Table 2 logit regressions show that both plant score (coefficient 0.121) and firm score (coefficient 0.062) significantly predict inspection probability, conditional on year and NAICS fixed effects. Third, Table 3 shows that conditional on plant fixed effects, predicted inspection probability is negatively associated with violations (coefficient −3.246 in specification 2, rising to −3.920 in specification 3 restricted to inspected plant-years), confirming dynamic deterrence.

Q: What is the curse of dimensionality problem and how is it resolved? A: In a multi-plant firm, each plant’s optimal action depends on the scores of every other co-owned plant, producing a state space of dimension n_plants + 1. For firms with portfolios of 50+ plants this is computationally infeasible. The paper introduces “continuation value sufficiency”: each plant’s decision is reduced to three state variables — its own score s_j, the firm score s_f, and a scalar W_j aggregating other co-owned plants’ continuation values. Transitions are approximated by plant-specific AR(1) processes. This reduces the portfolio problem from one high-dimensional value function to n_plant separate three-dimensional value functions, each solved independently within an inner fixed-point loop.

Q: How are the type distribution parameters identified? A: The mean type for each NAICS sector θ̄_g is identified by average violations per inspection within that sector — a higher mean type implies more violations conditional on inspection. The plant-level type variance σ²_J is identified by the share of total violation variance occurring across plants within the same firm. The firm-level type variance σ²_F is identified by the share of total violation variance occurring across firms. The curvature parameter y is identified by the responsiveness of violations to changes in predicted inspection probability (the coefficient from specification 3 of Table 3, which equals −3.920 empirically and −6.095 in simulation moments).

Q: What are the main counterfactual results? A: A 30% increase in the inspection budget (approximately +10 percentage points in per-plant inspection probability) is allocated under four regimes. Random inspections reduce violations per plant by 0.31 from a baseline of 0.98. Unlinked (plant-score) escalations reduce social costs by 31.9% more than random. Linked (firm-score) escalations reduce social costs by 41.8% more than random. The optimal mix (approximately 40% unlinked, 60% linked) reduces social costs by 42.2% more than random. In detected violations, all three targeted regimes perform similarly (+0.7% detected violations versus random), meaning the social cost advantage of linked regulation comes through greater undiscovered deterrence rather than through detection rates.

Q: How does the decomposition into static, own-plant, and cross-plant effects clarify the mechanism? A: For unlinked escalations: the static effect accounts for −5.4% of social cost relative to random, own-plant dynamic deterrence accounts for −30.6%, and the cross-plant effect is +4.1% (slightly adverse, because unlinked escalations do not account for portfolio-level incentives). For linked escalations: the static effect is −2.4%, own-plant deterrence is −24.5% (smaller than unlinked because linked escalations are less precisely targeted to individual plant histories), and cross-plant deterrence is −14.9% (large and beneficial). The dominance of cross-plant deterrence under linked escalations is the key mechanism explaining why linking outperforms unlinked targeting.

Q: What does the cost-benefit calculation find? A: Calibrating utility-sector violation social costs at $3,157 per violation (from Kang and Silveira 2021 for California water utilities post-2006) and inspection costs at $740, the paper finds a return of $11.77 in avoided social costs per additional dollar spent on inspections under the optimal linked/unlinked mix, versus $8.28 under random inspections. This suggests a large return to expanding enforcement budgets, with the gain amplified substantially by optimal targeting design.

Q: What are the scope conditions and limitations acknowledged? A: The framework applies to RCRA and Clean Water Act plants in Texas, where firms (e.g., gas station chains) typically cannot reallocate production across facilities, so the pollution-substitution channel documented by Gibson (2019) for Clean Air Act firms is not modeled. The penalty schedule is taken as fixed — only inspection allocation is treated as a policy choice — because Texas’s bylaws are prescriptive about how violations translate into penalties while leaving inspection targeting largely to regulator discretion. Social harm parameters h_g are identified only up to a scale normalization. The paper also does not model why types are correlated within firms (bad managers versus specialization), as the counterfactual results depend only on the degree of correlation, not its source.

Q: How well does the model fit the data? A: The model matches the targeted moments well (Table 5). Mean violations by NAICS sector are closely reproduced (e.g., utility: 0.201 empirical vs. 0.184 simulated; trade: 0.252 vs. 0.236). Responsiveness of violations to inspection probability matches closely (−6.398 empirical vs. −6.095 simulated). A non-targeted fit statistic — the correlation between a plant’s own violation rate and its co-owned plants’ violation rates — is 0.32 in simulation versus 0.26 in the data, which the authors characterize as a good out-of-sample fit given it was not directly targeted in estimation.

Q: How do heterogeneous effects shed light on the distributional consequences of regulation? A: The own-plant deterrence effect is positive for all plants including those with low types that are unlikely to be targeted, but is especially pronounced for high-type plants under unlinked escalations. Under linked escalations, high-type plants are deterred less to the extent they are co-owned with lower-type plants, because firm-score-based targeting aggregates across the portfolio. Cross-plant effects are predictably small under unlinked escalations and larger under linked escalations, especially for firms with high-type portfolios, since those are the firms whose firm scores respond most to individual violations.

Linked regulation: An enforcement mechanism in which the discovery of violations at one plant triggers increased inspection and penalty pressure on all other plants under the same owner. It exploits within-firm correlation in compliance costs to target scarce regulatory resources more efficiently than plant-by-plant escalation alone.

Escalation mechanism: A penalty and inspection design in which plants with worse compliance records — measured by accumulated compliance scores — face disproportionately greater scrutiny and higher penalties per additional violation. The TCEQ’s two-dimensional scoring system is an escalation mechanism operating simultaneously at the individual plant and firm portfolio level.

Plant score / firm score: The plant score (“site rating”) is a normalized index of a single facility’s violation history over the past five years, divided by investigation count and complexity points; the firm score (“person rating”) is the complexity-weighted average of all plant scores across the firm’s portfolio. Higher scores indicate worse compliance records and trigger both higher penalties and higher inspection probabilities.

Continuation value sufficiency: The paper’s solution to the curse of dimensionality in large plant portfolios. Rather than tracking the full joint score state across all co-owned plants, each plant’s optimal action is approximated using three variables — its own score, the aggregate firm score, and a scalar W_j summarizing co-owned plants’ continuation values — with state transitions governed by a plant-specific AR(1) process.

Dynamic moral hazard under linked regulation: The firm’s problem of choosing how much to invest in pollution mitigation at each plant over time, given that current actions affect future scores, future penalties, and — through the firm-wide score — future scrutiny of all co-owned plants. The moral hazard arises because abatement costs are private information not directly observable by the regulator.

Complexity points: A normalization factor in the TCEQ scoring system that adjusts raw violation counts for plant size and sector, enabling comparable compliance histories across heterogeneous facilities. They were introduced in 2012 specifically to prevent mechanically larger facilities from appearing riskier simply due to their scale.

Cross-plant deterrence effect: The reduction in pollution actions at co-owned plants induced by increases in the firm-wide score following a violation at one plant in the portfolio. In the counterfactual decomposition, this effect accounts for −14.9 percentage points of social cost reduction under linked escalations and is the primary mechanism by which linked regulation outperforms unlinked plant-level escalation.

Eliciting Multiple Prior Beliefs

Mon, 01 Jan 0001 00:00:00 +0000

Multiple prior decision models—in which beliefs are represented by a set of probability measures rather than a single measure, generating a probability interval for each event—have become increasingly important in economics, but choice-based incentive-compatible elicitation of probability intervals remains an open problem: existing scoring rules and matching-probability methods cannot recover probability intervals without assuming probabilistic sophistication that is precisely least warranted in settings where multiple priors are most relevant. This paper develops a preference-based identification of a subject’s probability interval for an event, and a method for eliciting it under weak decision-theoretic assumptions with no need for probabilistic sophistication. Three incentivized experiments on artificial and natural sources of uncertainty demonstrate that the elicited intervals are sensitive to the direction and amount of information, are typically consistent with objective probabilities where available, and exhibit a predominance of non-degenerate probability intervals that are wider when there is less information or predictability. On aggregate, the choice-based intervals are similar to stated probability intervals, providing behavioral foundations for the use of stated interval techniques in the field.

In depth

Q1. What is the key identification challenge for multiple prior elicitation?

The key challenge is that existing incentive-compatible elicitation methods—scoring rules and matching-probability approaches—confound a subject’s probability interval with their ambiguity attitude, so they cannot separately identify the probability interval without assuming probabilistic sophistication. Under the popular α-maxmin EU model, the matching probability of an event depends on both the subject’s probability interval and their ambiguity attitude parameter α; even eliciting both the event and its complement’s matching probabilities yields two equations in three unknowns. Probabilistic sophistication is least warranted precisely in settings with deep uncertainty where multiple priors are most relevant, making precision-laden methods unsuitable.

Q2. What is the paper’s elicitation solution?

The paper develops a preference-based method that identifies a subject’s probability interval under weak decision-theoretic assumptions—with no need for probabilistic sophistication—using a series of incentivized choices, and demonstrates its feasibility in three laboratory experiments. The approach comprises two components: (i) a preference-based identification theorem establishing the conditions under which the probability interval can be recovered from observable choices; and (ii) a concrete elicitation procedure that is incentive compatible and does not impose the precision-laden assumption of probabilistic sophistication.

Q3. What do the experiments show?

Three incentivized experiments on artificial and natural sources of uncertainty demonstrate that probability intervals elicited by the method are sensitive to the direction and amount of information, are typically consistent with objective probabilities where available, and predominantly non-degenerate—with intervals wider when there is less information or predictability. The sensitivity to information and consistency with objective probabilities provide external validation that the elicited intervals capture real beliefs rather than noise or confusion. The predominance of non-degenerate intervals (rather than point probabilities) indicates that subjects genuinely hold imprecise beliefs in the relevant settings.

Q4. What is the relationship between choice-based and stated probability intervals?

On aggregate, probability intervals elicited with the choice-based method are similar to those stated by subjects, suggesting that the new method can provide behavioral foundations for the use of stated probability-interval techniques that are widely used in field surveys but previously lacked incentive-compatible grounding. This convergence is informative because stated intervals are cognitively simpler and can be collected at large scale in surveys, while the choice-based intervals are theoretically grounded; the consistency between them justifies the use of simpler stated methods in field applications.

Key concepts

multiple priors : a model of beliefs in which a decision maker’s uncertainty is represented by a set of probability measures rather than a single measure; associated with the Gilboa-Schmeidler (1989) maxmin expected utility model and its generalizations; generates a probability interval for each event. probability interval : the interval [p(E), p̄(E)] of probability values a subject’s set of priors assigns to event E; non-degenerate (with width > 0) when the subject’s beliefs are genuinely imprecise. incentive-compatible elicitation : an elicitation procedure in which subjects’ optimal strategy is to report their true beliefs; for Bayesian single-prior beliefs, achieved by scoring rules and matching-probability methods, but these fail for multiple priors. probabilistic sophistication : the assumption that a multiple-prior agent’s set of priors is generated by precise probabilistic beliefs; existing methods require this assumption to disentangle the probability interval from ambiguity attitude, but the paper’s method does not.

Environmental Consequences of Hydrocarbon Infrastructure Policy

Mon, 01 Jan 0001 00:00:00 +0000

Covert and Kellogg study policies that aim to “keep carbon in the ground” by blocking fossil fuel infrastructure investment, with the Dakota Access Pipeline (DAPL) as their empirical application. DAPL moves more than 500,000 barrels per day of oil from the Bakken Shale of North Dakota to the U.S. Gulf Coast and was completed in June 2017 amid substantial opposition. The central research question is whether blocking pipeline construction actually keeps oil in the ground or merely shifts transport to alternative modes — specifically crude-by-rail — and what the net environmental and economic consequences are.

The paper develops a two-period model of crude oil production and transportation mode choice. In the model, oil shippers decide in period 1 whether to commit to pipeline capacity under ship-or-pay contracts, then in period 2 allocate flows between the committed pipeline and the more flexible but costlier railroad alternative. Pipeline construction is an irreversible sunk cost with zero ongoing marginal cost; rail involves no sunk cost but substantial ongoing marginal costs including quadratic adjustment costs that capture capital investment in rail cars and loading/unloading facilities. Equilibrium pipeline capacity is determined by a shippers’ indifference condition: expected per-barrel returns from pipeline access equal the FERC-regulated tariff.

The empirical model is estimated using monthly Bakken oil production and transportation data, price differentials across three coastal destinations (Gulf, East, West), and drilling productivity data. Crude-by-rail marginal costs are estimated via 2SLS, yielding static marginal cost intercepts of $9.49/bbl to the East Coast, $12.64/bbl to the Gulf Coast, and $8.69/bbl to the West Coast, plus a dynamic adjustment cost of $1.28/bbl per mbbl/d of flow change. The upstream supply model follows Anderson, Kellogg, and Salant (2018), with old-well production following exponential decline (estimated decay parameter β = 0.955) and new-well drilling responding to current and lagged prices with a total long-run elasticity of 1.32. Shippers’ beliefs about future oil prices are calibrated to an AR(1) process fit to historical price volatility (persistence φ₁ = 0.9925, volatility σ_G = 0.098). Model validation confirms a predicted expected return to pipeline commitment of $6.17/bbl against DAPL’s actual tariff of $5.50–$6.25/bbl.

The main counterfactual asks what would have happened had DAPL’s construction been enjoined. In expectation, blocking DAPL reduces pipeline flows by 306 mbbl/d. Expected crude-by-rail flows increase by 248 mbbl/d, offsetting 81% of the pipeline reduction. Bakken oil production falls by only 58 mbbl/d, a 4% reduction. The modal shift from pipeline to rail worsens local environmental outcomes: per-barrel local pollution damages from rail transport substantially exceed those from pipelines, dominated by locomotive NOx emissions in populated areas. Foreclosing DAPL increases net local pollution damages by $444,000 per day (the decrease in pipeline-related harm of $144,000/day is more than offset by the increase from rail of $588,000/day). The total cost of blocking DAPL is $45/tonne of CO2 abated — $28/tonne from lost producer surplus and $17/tonne from increased local pollution damages — a figure comparable to the contemporaneous U.S. government social cost of carbon estimate of $42/tonne.

An upstream production tax achieving the same CO2 reduction costs only $1.01–$2.68/tonne CO2 abated, an order of magnitude less, because it does not induce the distortionary modal shift to rail. Two caveats apply: if 57% of Bakken production reductions leak to other basins, the cost of blocking DAPL rises from $45/tonne to $104/tonne; and if reductions represent production delays rather than permanent reductions, effective abatement is further diminished. The analysis is scoped to Bakken crude oil and land transportation alternatives. The finding that blocking infrastructure increases local pollution is atypical of CO2 abatement policies, which usually generate local pollution co-benefits.

Q: What is the core economic mechanism by which blocking a pipeline can keep oil in the ground? A: When a pipeline is foreclosed, crude oil can still move by railroad, but rail transport involves substantial ongoing marginal costs. These costs create a wedge between upstream (Bakken) and downstream (Gulf Coast) prices that depresses upstream supply. Only when downstream prices are high enough to cover both rail marginal cost and this wedge will rail fully substitute for the pipeline; at lower prices, some production is uneconomical and stays in the ground. In the model, this price-depressing wedge is the mechanism that reduces production — but it operates only partially, since rail can substitute for much of the pipeline’s flow.

Q: How much of the blocked pipeline flow substitutes to rail versus stays in the ground? A: In expectation, blocking DAPL reduces pipeline flows by 306 mbbl/d. Expected crude-by-rail flows increase by 248 mbbl/d, offsetting 81% of the pipeline reduction. Bakken oil production falls by only 58 mbbl/d, or approximately 4%. In a specific simulated month (December 2019), 348 mbbl/d (67%) of the 520 mbbl/d of foregone pipeline flows would still move by rail.

Q: How are crude-by-rail costs estimated, and what is the role of adjustment costs? A: The authors estimate a 2SLS model of rail flows on price differentials, allowing for quadratic adjustment costs to capture investments and disinvestments in rail cars and loading facilities. Static marginal costs are $9.49/bbl (East Coast), $12.64/bbl (Gulf Coast), and $8.69/bbl (West Coast). The adjustment cost parameter γ is estimated at $1.28/bbl per mbbl/d, meaning a 10 mbbl/d monthly increase in rail flows raises marginal shipping cost by $12.76/bbl — a substantial share of total rail costs. Adjustment costs are necessary to reconcile the model with the sluggish observed response of rail flows to price differentials.

Q: What is the structure of the upstream oil supply model and what are its key parameter estimates? A: The model distinguishes “old” production from pre-existing wells, which follows exponential decline with estimated decay parameter β = 0.955, and “new” production from newly drilled wells, which is price-responsive with a total long-run elasticity of 1.32 — comparable to the 1.1–1.2 estimated by Newell and Prest (2019) across major U.S. shale plays. This structure implies that total production is highly inelastic in the short run (dominated by old wells) but responds to persistent price shocks over the long run through changes in drilling rates.

Q: How do the local pollution damages of rail compare to those of pipeline transport? A: At a social cost of carbon of $100/tonne, local air pollution damages from rail transport to the Gulf Coast are $1.66/bbl (plus $0.73/bbl in spill/accident costs), versus only $0.35/bbl local pollution (plus $0.11/bbl spills) for pipelines. Locomotive NOx emissions are the dominant factor, both because locomotives have high NOx emission factors and because these emissions often occur in densely populated areas. CO2 damages at $100/tonne SCC are roughly similar across modes ($0.79–0.83/bbl), so local pollution is the key differentiator.

Q: What is the net welfare impact of foreclosing DAPL, and how is it decomposed? A: Foreclosing DAPL reduces producer surplus by $716,000/day, increases net local pollution damages by $444,000/day (the $588,000/day increase from rail more than offsets the $144,000/day decrease from pipeline), and reduces CO2 emissions by 25.2 mtonnes/day from the 58 mbbl/d production reduction. The cost per tonne of CO2 abated is $28/tonne from lost producer surplus and $17/tonne from increased local pollution damages, totaling $45/tonne — broadly comparable to the U.S. government’s contemporaneous SCC estimate of $42/tonne. This means the policy’s abatement cost is approximately equal to the social value of each tonne abated, leaving little or no net social gain even before accounting for leakage.

Q: How does the model validate against observed data and institutional parameters? A: The model predicts an expected return to committed DAPL pipeline shipment of $6.17/bbl, which closely matches the actual DAPL tariff for committed shippers of $5.50–$6.25/bbl. The authors also validate simulated crude-by-rail flows against actual flows across destinations. The close match on the tariff is particularly meaningful because it tests the model’s equilibrium condition for pipeline capacity investment rather than a within-sample fit.

Q: How does an upstream production tax compare to blocking DAPL as a policy instrument? A: A production tax normalized to achieve the same CO2 reduction requires only $3.68/bbl if imposed after shippers have committed to DAPL (holding capacity fixed), or $3.24/bbl if announced before commitments are made (reducing pipeline capacity to 443 mbbl/d). The production tax reduces combined producer surplus and government revenue by only $96,000–$109,000/day versus $716,000/day under the DAPL ban, and reduces local pollution damages by $82,000/day rather than increasing them. The resulting cost per tonne CO2 abated is $1.01–$2.68 — an order of magnitude smaller than the $44.63/tonne for blocking DAPL.

Q: What is the production leakage caveat and how large is its effect? A: If blocking DAPL causes Bakken production to fall, production from other U.S. or global oil basins may increase, partially or fully offsetting the CO2 reduction. Following Prest (2022) and Prest et al. (2023), the authors note that if 57% of the Bakken production reduction leaks to other basins, the cost of blocking DAPL rises from $45/tonne to $104/tonne. Leakage would increase the cost per tonne for the upstream tax as well, but the relative advantage of the tax over the pipeline ban is unaffected by this caveat.

Q: What is the production delay caveat? A: Even absent leakage, the paper cautions that production reductions from either policy may represent production delays rather than permanent reductions — oil not extracted today may be extracted later as prices rise or technology improves. To the extent that reductions are temporary, the effective carbon abatement is smaller than the authors compute, and the cost per tonne of CO2 abated is correspondingly higher. The paper does not quantify this effect but flags it as a material caveat.

Q: What institutional features drive pipeline capacity investment and risk allocation? A: Pipelines are irreversible investments subject to ex-post holdup, so construction financing requires firm ship-or-pay commitments from shippers before construction and before future prices are known, meaning oil price risk is borne primarily by shippers rather than the pipeline owner. Pipeline tariffs are regulated by FERC on a cost-of-service basis. In the DAPL case, shippers executed binding ten-year ship-or-pay contracts in June 2014, and shippers’ beliefs about future oil prices at that date — calibrated to historical price volatility using an AR(1) process with estimated persistence φ₁ = 0.9925 and volatility σ_G = 0.098 — determine equilibrium capacity investment.

Q: How does the paper’s finding relate to the typical co-benefit structure of climate policies? A: Most CO2 abatement policies generate local pollution co-benefits (reduced NOx, SOx, particulates), so the abatement cost is partially offset by local pollution gains. Blocking DAPL reverses this: the pipeline-to-rail modal shift increases local pollution damages, making local pollution a cost rather than a co-benefit of the policy. The authors note this is atypical but not unprecedented — urban densification and post-combustion emissions controls in fossil fuel boilers also present CO2–local pollution trade-offs.

Infrastructure foreclosure policy: A “keep it in the ground” strategy that blocks construction of specialized fossil fuel transportation infrastructure (pipelines) with the aim of inhibiting production of the fuels that would have been transported, without requiring direct acquisition or buyout of mineral rights.
Ship-or-pay agreement: A firm, up-front capacity commitment in which a pipeline shipper agrees to pay for reserved pipeline capacity whether or not they ultimately use it, made before construction and before future prices are realized; the institutional mechanism by which oil price risk is transferred from pipeline owners to shippers.
Crude-by-rail adjustment costs: Quadratic costs modeled as linear in the period-to-period change in rail volumes to a given destination, capturing capital investments and disinvestments in rail cars, loading facilities, and unloading terminals needed to expand or contract crude-by-rail capacity; estimated at $1.28/bbl per mbbl/d of monthly flow change.
Production leakage: The partial or full offset of production reductions in one oil basin (Bakken) by production increases in other U.S. or global basins in response to the same price signals; at 57% leakage, the cost of blocking DAPL rises from $45/tonne to $104/tonne of CO2 abated.
Old-well vs. new-well production dynamics: The distinction between production from pre-existing wells (which follows an exponential decline path insensitive to current prices, β = 0.955) and production from newly drilled wells (which responds to current and lagged upstream prices with long-run elasticity 1.32); this structure makes total short-run supply highly inelastic while allowing substantial long-run price responsiveness through drilling adjustments.
Local pollution damages from NOx: The dominant component of environmental harm from crude-by-rail transport, arising from locomotive NOx emissions that are both large in magnitude and concentrated in densely populated areas along rail corridors; at $100/tonne SCC, monetized local pollution damages from rail exceed CO2 damages for all three coastal destinations, whereas for pipelines CO2 damages exceed local pollution costs.
Cost per tonne of CO2 abated: The authors’ metric for comparing infrastructure foreclosure to alternative policies; computed as the sum of lost producer surplus and net change in local pollution damages divided by the quantity of CO2 emissions avoided from reduced oil production and consumption; equals $45/tonne for blocking DAPL versus $1.01–$2.68/tonne for an equivalent upstream production tax.

EU ETS Market Expectations and Rational Bubbles

Mon, 01 Jan 0001 00:00:00 +0000

This paper tests whether the sharp rise in EU Emissions Trading System (EU ETS) allowance prices from 2018 onward was driven by a rational bubble. The methodological contribution is to modify the Fama (1984) Predictive Regression (FPR) approach to remain valid for rational bubble testing when the risk premium is time-varying — potentially stationary, integrated of order one, or even explosive — and when the fundamental price process exhibits a unit root or mildly explosive behavior. Standard bubble tests (including the KPSS applied to the price-expectations differential, and the Phillips-Shi-Yu SADF/GSADF tests applied to price levels) lose size control when the risk premium follows a nonstationary process; the paper’s FPR approach combined with the IVX estimator of Kostakis, Magdalinos, and Stamatogiannis (2015) retains correct size under all risk premium specifications. Using weekly EU ETS spot and futures data from 2013 to 2023 (T = 563), the paper finds: (1) explosive behavior in both spot and futures price levels during the third and fourth trading phases (2018–2023), confirming a necessary condition for a bubble; (2) no evidence of a rational bubble in the FPR test — the IVX-AR Wald statistic fails to reject the null of no bubble (β₂,ₙ = 0) in full-sample and sub-sample analyses across delivery horizons of 4, 8, 12, and 16 weeks; (3) no evidence of explosiveness in the differential between future spot rates and futures rates; (4) no evidence of co-explosiveness between spot and futures prices within either the third or fourth trading period separately. The paper concludes that the EU ETS price surge reflects a shift in market expectations about future allowance scarcity — driven by policy tightening of the cap trajectory and reform of the Market Stability Reserve — rather than speculative excess.

In depth

Q1. What is the Fama Predictive Regression approach to testing rational bubbles, and what is its key limitation with a dynamic risk premium?

The Fama (1984) decomposition splits the futures price F_{n,t} into expected future spot price E_t[P_{t+n}] and a risk premium RP_{n,t}; from this, two predictive regressions (FPR 1 and FPR 2) have slope coefficients β₁,ₙ and β₂,ₙ that equal 1 and 0, respectively, in the absence of a rational bubble, and deviate from these values (β₁,ₙ < 1, β₂,ₙ > 0) when a bubble is present. The paper shows analytically (equations 27–33) that when a rational bubble is present, β₁,ₙ decreases monotonically as Var(B_t) increases and β₂,ₙ increases monotonically, with the direction of the bias confirmed under both zero and nonzero covariance between the bubble and the risk premium. The key limitation of standard OLS inference in this regression is that when the regressor (F_{n,t} − P_t) is highly persistent or mildly explosive, the OLS t-statistic has a non-standard distribution, and Stambaugh (1999) bias can lead to over-rejection of the no-bubble null. The paper addresses this by applying the IVX estimator, which replaces the persistent regressor with an instrument of controllable lower persistence, yielding a Wald statistic that converges to a standard chi-squared distribution regardless of the persistence or trending behavior of the risk premium.

Q2. Why does the KPSS test applied to the price-expectations differential fail in the presence of a nonstationary risk premium?

The KPSS test applied to P_{t+n} − F_{n,t} tests whether this differential is stationary; under the no-bubble null (equation 16 in the paper), the differential equals the negative risk premium RP_{n,t}, so the KPSS test has correct size when RP is stationary but incorrectly rejects the no-bubble null when RP is integrated of order one or explosive — because non-stationarity in the risk premium is incorrectly attributed to a bubble. The paper’s Monte Carlo simulations (Table 1) confirm that the KPSS test maintains nominal size of 5 percent only when the risk premium is stationary (λ ∈ {0, 0.5} under RP 1); when λ = 1 or λ = 1.01, the KPSS test rejects far more often than 5 percent under the null. The FPR approach with IVX inference, by contrast, maintains size close to 5 percent across all risk premium specifications including explosive ones. This is the central methodological motivation: prior EU ETS bubble tests that relied on KPSS may have detected non-stationarity in the risk premium rather than a genuine bubble component.

Q3. What does the SADF/GSADF test find for EU ETS spot and futures prices, and what is its role in the paper’s empirical strategy?

The paper applies SADF and GSADF tests (Phillips, Shi, and Yu, 2015a,b) to weekly EU ETS spot and futures price levels from 2013 to 2023 and finds evidence of explosive behavior at the 5 percent significance level in both series, with consistent timing of explosive phases across spot and all four futures contracts. The explosive episodes are date-stamped using the BSADF sequence with wild-bootstrapped critical values (999 repetitions): explosive periods are identified during the end of the third trading period (2018–2022) and at the commencement of the fourth trading period (2021–2023). These results confirm that the necessary condition for a rational bubble — an explosive price component — is satisfied. However, the paper emphasizes that explosiveness in levels is not sufficient for a rational bubble: a mildly explosive fundamental or an explosive risk premium would produce the same SADF/GSADF result without any bubble component. The FPR-IVX test is designed to distinguish between these cases and constitutes the primary bubble test.

Q4. What do the FPR-IVX tests find for the presence of a rational bubble?

Across the full sample (January 2018 to October 2023, T = 302), the IVX-AR Wald statistic (denoted W̃_β, adjusting for serial correlation in FPR 2’s error term using the Yang, Long, Peng, and Cai (2020) procedure) fails to reject the null β₂,ₙ = 0 against β₂,ₙ ≠ 0 for all four delivery horizons n ∈ {4, 8, 12, 16} weeks. The Bayesian Information Criterion selects models with lagged error terms for both the full sample and sub-samples, confirming the need for the IVX-AR procedure over the standard IVX Wald statistic. The sub-sample analysis separates the third trading period (January 2018 to December 2020, T = 156) and the fourth trading period (January 2021 to October 2023, T = 146); in both sub-samples the null is not rejected for all delivery horizons. The conventional OLS t-statistic (|t_β|) sometimes provides marginal evidence against the null, but the paper interprets this as reflecting the Stambaugh bias problem and defers to the IVX-AR inference. These results contradict both the collapsing bubble hypothesis (which would require β₂,ₙ < 0) and the ongoing bubble hypothesis (which would require β₂,ₙ > 0).

Q5. What do the tests on the differential between future spot rates and futures rates find?

Applying the SADF/GSADF test to the differential P_{t+n} − F_{n,t} for n ∈ {4, 8, 12, 16} weeks reveals no evidence of explosiveness in this differential across all horizons (Table 9 in the paper). This is consistent with the absence of a rational bubble: under the FPR framework, a rational bubble would generate an explosive component in the futures basis (F_{n,t} − P_t), which would in turn produce explosiveness in the differential between actual future spot prices and futures prices. The absence of explosiveness in this differential therefore provides an additional check corroborating the FPR-IVX finding.

Q6. What does the co-explosiveness test find, and why does a structural break affect the full-sample result?

The co-explosiveness test of Evripidou, Harvey, Leybourne, and Sollis (2022) tests whether spot and futures prices share a common explosive trend (null: co-explosive, no bubble) versus the alternative that they diverge by an explosive component (rational bubble or explosive risk premium). In the full sample from January 2018 to October 2023, the test rejects the null for all n ∈ {4, 8, 12, 16}, apparently indicating a non-stationary component separating spot and futures prices. However, sub-sample analysis dividing the sample at December 2021 reveals that neither the third trading period (January 2018 to December 2021) nor the fourth trading period (January 2022 to October 2023) sub-samples show rejection of the null — the co-explosive null cannot be rejected in either period alone. The paper interprets the full-sample rejection as reflecting a structural break in the risk premium at the boundary between the two trading periods (a mean shift in the risk premium) rather than a bubble, consistent with the KPSS size problem and with the lack of any significant positive serial correlation between the phases. The sub-sample co-explosiveness results align with the FPR-IVX findings.

Q7. What explains the EU ETS price surge if not a rational bubble, and what are the policy implications?

The paper interprets the consistent co-movement of spot and futures prices in an explosive common trend — without any divergence between them — as evidence that the fundamental value of allowances itself became explosive, driven by a regime shift in market expectations about future allowance scarcity. The scarcity shift is traced to two policy changes: (a) the progressive tightening of the EU ETS cap trajectory under the European Green Deal and the Fit-for-55 legislation, which reduced the total number of allowances available over time; and (b) reform of the Market Stability Reserve, which removed surplus allowances from circulation, making the cap effectively more binding than its nominal level. When market participants updated their expectations about how scarce allowances would become, the fundamental value — the present discounted value of allowance scarcity rents — rose along an explosive path without any bubble component. For policy, this distinction matters: if the price surge reflected a rational bubble, regulatory intervention to deflate it could be efficiency-improving (bubbles misallocate resources and their bursting creates financial instability); if the surge reflects genuine scarcity expectations, intervention would undermine the price signal that guides firms’ decarbonization investment decisions. The paper concludes there is no basis from historical data to justify bubble-prevention intervention in the EU ETS architecture.

Key concepts

rational bubble (in the EU ETS context): a component of the allowance price that exceeds the present discounted value of future allowance scarcity rents and grows at the discount rate; theoretically possible because allowances are storable and have positive returns from banking across periods; the paper finds no evidence of this component in EU ETS prices during 2018–2023.

Fama Predictive Regression (FPR): a regression of the basis (F_{n,t} − P_t) on itself or on subsequent spot-futures differentials, used here to test rational bubbles; FPR 2 (the regression of P_{t+n} − P_t on F_{n,t} − P_t) has slope β₂,ₙ = 0 under no bubble and β₂,ₙ > 0 under an ongoing bubble, with the direction of β₂,ₙ identifying both the presence and type (ongoing vs. collapsing) of the bubble.

IVX estimator: the instrumental-variable estimator of Kostakis, Magdalinos, and Stamatogiannis (2015) that instruments a mildly explosive or highly persistent regressor with an instrument of controllable lower persistence; produces a Wald statistic with a standard chi-squared limiting distribution regardless of the persistence or trending behavior of the risk premium, enabling valid inference on bubble hypotheses in the FPR when the risk premium is nonstationary.

IVX-AR procedure: the extension of the IVX estimator by Yang, Long, Peng, and Cai (2020) that additionally accounts for serial correlation in the error term of the predictive regression; the paper applies this as its primary inference procedure because BIC selects models with lagged errors in both full-sample and sub-sample analyses.

allowance scarcity expectations: market participants’ beliefs about the future tightness of the EU ETS cap relative to aggregate emissions; the paper finds that the price surge since 2018 is consistent with a shift in these expectations driven by cap trajectory tightening and Market Stability Reserve reform, rather than with a speculative bubble.

mildly explosive process: a time series with autoregressive root θ = 1 + c·T^{−α} for c > 0, α ∈ (0,1), converging to unity as T → ∞; used in the paper’s Monte Carlo and theoretical analysis to model the fundamental price process and the risk premium under the alternative hypothesis of ongoing rational bubble behavior, following Phillips and Magdalinos (2007).

Exchange Rates and Asset Prices in a Global Demand System

Mon, 01 Jan 0001 00:00:00 +0000

The paper develops an asset demand system to analyze, jointly and across all countries, how international portfolio holdings and flows, exchange rates, short-term rates, long-term yields, and equity prices are determined in equilibrium. The authors specify a nested logit model of asset demand (substitution across countries within an asset class, and across asset classes) and introduce a new instrumental-variables identification strategy based on the size distribution of countries and bilateral distances; estimating on portfolio-holdings data for 37 countries and three asset classes from 2003 to 2020, they find demand is relatively inelastic, with mean demand elasticities of 27.9 (s.e. 1.9) for short-term debt, 3.2 (0.4) for long-term debt, and 1.2 (1.1) for equity. A variance decomposition attributes 82% of exchange-rate variation, 86% of short-term-rate variation, and 60% of log market-to-book equity variation to ’latent demand’ (the residual demand shifter), while portfolio flows (54%) and macro variables (43%) dominate long-term yields. Applying the framework to the European sovereign debt crisis, latent demand explains essentially all of the Italian long-term-yield variation and 74% of the Portuguese, whereas macro fundamentals are relatively more important for Greece (46% vs. 32% for latent demand), which the authors read as consistent with Greece being insolvent while Italy and Portugal were solvent but perceived as vulnerable. Estimating the convenience yield on US assets, they find, in units of expected annual returns, 1.41% on the US dollar, 2.71% on US long-term debt, and 0.50% on US equity. All estimates are specific to their sample, model, and identification assumptions.

In depth

Q1. What is a ‘global demand system’ and what does it explain?

The authors represent the equilibrium of an international macro model as an asset demand system and replace traditional optimal portfolios with estimated asset demand functions that match observed international portfolio holdings, so that portfolio flows and shifts in asset demand explain all movements in exchange rates and asset prices. This lets them reinterpret the exchange rate disconnect (Meese and Rogoff 1983) as the finding that shifts in asset demand through macro variables explain much less variation than portfolio flows and latent demand, and to identify which countries’ latent demand matters for exchange rates and asset prices.

Q2. What is the nested logit model of asset demand?

Asset demand follows a nested logit model with substitution across countries in the inner nest and across asset classes in the outer nest, where demand depends on expected returns (asset prices or yields and real exchange rates), macro variables (GDP, GDP per capita, inflation, equity volatility, sovereign rating), bilateral distance (the gravity effect), a domestic-ownership indicator (home bias), and latent demand. The nested structure gives more flexible substitution than the logit model of Koijen and Yogo (2019), while latent demand captures heterogeneous beliefs about risk exposure across investors and assets.

Q3. How are the demand elasticities identified?

The authors develop an instrumental-variables strategy in which an exogenous component of one investor group’s demand shifters generates variation in residual supply that identifies another group’s demand elasticity, isolating cross-sectional variation in residual supply from the size distribution of countries and the bilateral distances between them. Intuitively, smaller issuer countries in close proximity to larger investor countries have lower residual supply and thus higher asset prices and/or real exchange rates (the example contrasts Dutch with Australian long-term debt).

Q4. What are the estimated demand elasticities, and why do they matter?

Averaged across years and issuer countries, the mean demand elasticities are 27.9 (s.e. 1.9) for short-term debt, 3.2 (0.4) for long-term debt, and 1.2 (1.1) for equity — so, e.g., a country’s aggregate equity demand falls about 1.2% per 1% rise in its price. The authors present these as empirical targets for international macro models that rely on inelastic demand and demand shocks unrelated to fundamentals to resolve long-standing puzzles, and they note the estimates are broadly consistent with prior, more granular estimates for narrower sets of countries and asset classes once differences in aggregation and identification are accounted for.

Q5. What does the variance decomposition reveal?

Latent demand is relatively more important for exchange rates, short-term rates, and equity prices — explaining 82% of exchange-rate variation (of which foreign-exchange reserves explain 10%), 86% of short-term-rate variation, and 60% of log market-to-book equity variation — whereas portfolio flows (54%) and macro variables (43%) are relatively more important for long-term yields (latent demand explains only about 3%). For equity, North American investors explain 13% and European investors 26% of the log market-to-book variation.

Q6. How does the framework interpret the European sovereign debt crisis?

Applied to extreme long-term-yield movements in Greece, Italy, and Portugal, the decomposition shows macro variables are relatively more important for Greece (46% vs. 32% for latent demand), while latent demand explains all of the Italian and 74% of the Portuguese yield variation, with European investors alone explaining 98% of the Italian and 65% of the Portuguese movements. The authors read this as consistent with the narrative that Greece was insolvent while Italy and Portugal were solvent but perceived as vulnerable.

Q7. What are the estimated convenience yields on US assets?

Computing counterfactual prices that remove the special demand for US assets, the authors estimate convenience yields, in units of expected annual returns, of 1.41% on the US dollar, 2.71% on US long-term debt, and 0.50% on US equity. In the absence of special status, a value-weighted US-dollar exchange rate would be 5.23% higher, the US long-term yield 0.73% higher, and US market-to-book equity 3.35% lower, consistent with the view that the dollar is the global reserve currency and US Treasury debt the global safe asset.

Q8. How does the framework connect to monetary policy?

The authors note in their conclusion that, because unconventional monetary policy fundamentally concerns changes in the supply of long-term debt and its impact on exchange rates and asset prices through substitution effects, the demand-system approach is suited to study the simultaneous and cumulative impact of conventional and unconventional monetary policy across many countries — and they flag this as a direction for future research rather than a result of the current paper. This scope condition matters: the present paper estimates the demand system and its decompositions, not the effects of monetary policy itself.

Key concepts

asset demand system / demand system asset pricing : an approach (introduced in Koijen and Yogo 2019 and here extended to international finance) that estimates asset demand functions on portfolio holdings data and analyzes the equilibrium relation between holdings/flows and prices, in place of traditional optimal portfolios.

nested logit asset demand : the specific functional form for demand, with substitution across countries in the inner nest and across asset classes in the outer nest, allowing flexible substitution patterns.

latent demand : the residual component of demand shifters — capturing heterogeneous beliefs about risk exposure — that, together with portfolio flows and macro variables, accounts for movements in exchange rates and asset prices; it is the dominant driver of exchange rates and short-term rates in the decomposition.

demand elasticity (inelastic markets) : the percentage change in a country’s aggregate asset demand per 1% change in its price; the paper’s low estimates (especially 1.2 for equity) are offered as empirical targets for ‘inelastic markets’ macro-finance models.

convenience yield : the extra demand for (and hence lower expected return on) US assets owing to their special status as global reserve currency and safe asset; measured here as 1.41% (USD), 2.71% (US long-term debt), and 0.50% (US equity) in expected-annual-return units.

gravity effect and home bias : the empirical regularities that portfolio holdings decline with bilateral distance (gravity) and are tilted toward domestic assets (home bias), which the demand system captures via distance and a domestic-ownership indicator.

Financial Intermediation and Aggregate Demand: A Sufficient Statistics Approach

Mon, 01 Jan 0001 00:00:00 +0000

This paper develops a sufficient statistics approach to measuring the aggregate demand effects of financial intermediation disturbances — shocks to the ability of financial intermediaries to supply credit. The central contribution is characterizing, in a general class of models with heterogeneous firms and financial frictions, the aggregate demand impact of a disruption to intermediary balance sheets as a function of a small set of sufficient statistics observable from data: the elasticity of investment to intermediary net worth, the share of investment financed through intermediaries, and the sensitivity of asset prices to intermediary capacity. The approach does not require full model estimation, allowing model-free measurement of the aggregate demand loss from identified intermediary distress episodes. Applied to the 2008–2009 financial crisis, the paper estimates that the shock to financial intermediary balance sheets generated an aggregate demand reduction of 3–4 percentage points of GDP — substantially larger than estimates from reduced-form regressions that do not account for general equilibrium propagation.

In depth

Q1. What are the key sufficient statistics?

The three sufficient statistics are: (1) the elasticity of investment to intermediary net worth — how much investment falls per dollar of balance sheet loss; (2) the share of investment financed through intermediaries — how broadly the balance sheet shock propagates; (3) the sensitivity of asset prices to intermediary capacity — how much collateral values fall when intermediaries are distressed. Together these three moments summarize the aggregate demand impact of a balance sheet shock without requiring the researcher to specify the full structural model.

Q2. Why does the sufficient statistics approach give larger estimates than reduced-form regressions?

Reduced-form regressions typically compare investment of firms exposed to distressed versus healthy intermediaries, capturing the partial equilibrium direct effect of credit supply reduction; the sufficient statistics approach accounts for the general equilibrium propagation — the fall in asset prices and investment that affects even firms not directly borrowing from distressed intermediaries. The 3–4 percentage point estimate includes these spillovers; the reduced-form estimate misses them.

Q3. What is the policy implication?

The larger aggregate demand estimate implies that recapitalizing intermediaries during financial crises generates larger macroeconomic benefits than direct-effect estimates would suggest, strengthening the case for bank bailouts, TARP-style capital injections, and central bank emergency lending as counter-recessionary tools. The sufficient statistics framework also provides a natural way to compare intervention magnitudes: a policy that restores $X of intermediary capital generates an aggregate demand boost proportional to the measured elasticity.

Key concepts

sufficient statistics for financial intermediation : the small set of model-free moments (investment elasticity to net worth, intermediary financing share, asset price sensitivity) that summarize the aggregate demand impact of intermediary distress, derived in this paper from a general class of heterogeneous-firm models.

general equilibrium propagation : the amplification of an intermediary balance sheet shock through asset price declines and economy-wide investment responses, which the sufficient statistics approach captures and reduced-form regressions miss; the source of the larger 3–4 pp GDP estimate relative to partial equilibrium benchmarks.

Firm Quality Dynamics and the Slippery Slope of Credit Intervention

Mon, 01 Jan 0001 00:00:00 +0000

Crises have cleansing effects—low-quality firms face greater financial shortfalls and invest less than high-quality firms—but public credit support dampens these effects by reducing financing cost differentials, distorting the firm quality distribution downward and reducing total productivity. This trade-off between preserving output capacity and distorting quality determines the optimal size of intervention. The distortionary effects are self-perpetuating: a downward bias in quality necessitates interventions of greater scale in future crises, implying further distortions—a “slippery slope.” The distortions are amplified by expectations: because low-quality firms expect underpriced government funding in future crises, their Tobin’s q is biased upward, leading them to overinvest even in normal times, while high-quality firms may underinvest. A low interest rate environment exacerbates the distortionary effects because the low yield on savings discourages firms from accumulating precautionary internal liquidity against crises.

In depth

Q1. What are the cleansing effects of crises and how does credit intervention dampen them?

Crises have cleansing effects because low-quality firms face tighter financial constraints and have lower Tobin’s q, causing them to invest less than high-quality firms; public credit support reduces this differential, preserving overall production capacity but distorting the quality distribution downward. The model follows the limited-commitment literature (Kehoe-Levine, Kiyotaki-Moore, Rampini-Viswanathan): firms differ in productive capital quality that also serves as collateral. Government intervention is valued because the government has superior enforcement ability compared to private investors, but its credit support cannot be perfectly priced by quality—due to informational limits or political constraints—so it pulls financing costs of high- and low-quality firms closer together, dampening the cleansing mechanism.

Q2. What is the “slippery slope” mechanism?

The slippery slope arises because the downward bias in the quality distribution induced by one intervention necessitates larger interventions in future crises, generating a ratchet toward ever-larger public credit support. After intervention, high-quality firms accumulate capital less rapidly than they would absent intervention, while low-quality firms’ capital shares remain higher than in the laissez-faire equilibrium. The resulting lower aggregate productivity means that future crises are more severe in terms of output loss, requiring a larger optimal intervention, which in turn further distorts the quality distribution.

Q3. How do expectations of future intervention amplify the distortions?

Because low-quality firms expect underpriced credit support in future crises, their Tobin’s q is biased upward, motivating them to overinvest even in normal times; simultaneously, high-quality firms may underinvest because their Tobin’s q may fall below the first-best level. The self-perpetuating distortion thus operates through both the crisis-time reallocation channel and the pre-crisis investment channel, amplifying the divergence from the efficient allocation relative to a setting with no anticipation effects.

Q4. Why does a low interest rate environment exacerbate the distortionary effects?

A low interest rate environment exacerbates the distortionary effects of credit intervention because the low yield on savings discourages high-quality firms from accumulating precautionary internal liquidity against crises, causing them to invest less in crises and requiring a greater scale of credit support. Low-quality firms, expecting underpriced government funding, have even less incentive to self-insure through savings when interest rates are low, further worsening the quality distribution. The paper’s findings echo cautions against ultra-low interest rates (Brunnermeier and Koby, 2018; Quadrini, 2020) by providing a distinct mechanism operating through firm quality dynamics.

Q5. Can intervention be welfare-improving despite the distortions?

The paper shows that when carefully designed, intervention can improve welfare even though it generates distortionary effects on the firm quality distribution—the trade-off between preserving production capacity and distorting quality determines the optimal size of intervention. This framing does not suggest intervention should be avoided, but that its optimal scale requires balancing the quantity-preserving benefit against the quality-distorting cost. The paper previously circulated as “The Distortionary Effects of Central Bank Direct Lending on Firm Quality Dynamics.”

Key concepts

cleansing effect of crises : the tendency for crises to reduce the investment of low-quality firms relative to high-quality firms through tighter financial constraints, reallocating capital toward higher-productivity uses; credit intervention dampens this by reducing the financing cost differential. slippery slope of intervention : the self-perpetuating dynamic in which intervention-induced downward distortion of the quality distribution necessitates larger interventions in future crises, generating a ratchet toward ever-larger public credit support. credit mispricing : the inability of public credit support to differentiate financing costs by firm quality, arising from informational limits or political constraints on discriminatory treatment; the proximate source of the quality-distribution distortion.

From Doubt to Devotion: Trials and Learning-Based Pricing

Mon, 01 Jan 0001 00:00:00 +0000

This paper studies a dynamic mechanism design problem in which an informed seller sells an experience good to a skeptical buyer who learns about the product through consumption. The central question is: how does a seller leverage proprietary data about product-buyer match quality together with the buyer’s ability to learn, and what are the welfare implications in equilibrium?

The model features a seller who privately observes a binary match quality (theta in {H, L}) between their service and the buyer. The buyer does not observe match quality and has an initially unknown private value v for the good, drawn from a Myerson-regular distribution F with support [v_low, v_high] and normalized mean E[v] = 1. If the match is high, the buyer receives instantaneous utility rewards according to a Poisson process with flow rate lambda*I, where I in [0,1] is the seller-controlled access level. Upon receiving the first reward, the buyer perfectly learns both match quality theta and their own value v. The seller commits to a dynamic mechanism over time horizon T = [0, T] specifying access and prices conditional on reported histories. Both parties are risk-neutral and there is no discounting in the baseline.

Two benchmark cases show the first-best is attainable absent both key features simultaneously. If trade is static (prices set only at time 0) or if the seller is uninformed about theta, the seller achieves first-best revenue of lambdamu_0T by selling the entire service upfront. Proposition 1 establishes both cases; this implies that consumer data on theta is not required for maximizing social welfare, and it is weakly dominant for a seller to never collect consumer data in static environments.

The central result is that the combination of dynamic pricing and seller private information breaks the first-best. A high-type seller can deviate by offering a “Myersonian free trial”: provide full access up to time tM (defined as argmax_t {(1 - exp(-lambdat))(T - t)}), then offer the remaining service at post-trial price lambdavM(T - tM), where vM is the Myerson monopoly price. The buyer accepts the trial regardless of beliefs (participation is weakly dominant) and purchases the post-trial service if and only if v >= vM. This deviation yields payoff pi_F = (1 - exp(-lambdatM))(1 - F(vM))lambdavM*(T - tM). Proposition 2 states that the first-best cannot be implemented in any equilibrium if and only if pi_F > lambdamu_0T. Corollary 1 shows this condition holds for sufficiently large T, since pi_F grows proportionally with T while the first-best also grows with T but the ratio converges to a constant less than 1 only for some parameter configurations and exceeds 1 for others.

Theorem 1 (the main mechanism design result) characterizes the boundary of the IC-IR feasible payoff set: any mechanism on this boundary is outcome-uniquely implemented by a trial mechanism, defined by a triple (v0, t0, p0) — a trial length, a post-trial value threshold, and a trial price. During [0, t0] uninformed buyers receive full access; after t0 only buyers who received a reward with v >= v0 continue at a premium. Trial length t0 is weakly increasing in the weight placed on the low-type seller and in the prior mu_0; post-trial threshold v0 is weakly decreasing in the same objects (Proposition 3).

Equilibrium payoffs (Proposition 5) are precisely the IC-IR feasible pairs satisfying pi_H >= pi_F, implemented by pooling trial mechanisms in which both seller types propose identical mechanisms and the buyer updates beliefs only through private consumption signals. Under the D1 refinement (Proposition 6), only mechanisms with trial length tM and post-trial threshold vM survive. These have the shortest trial and highest post-trial price of all equilibrium mechanisms, minimize social surplus, and may leave both seller types strictly worse off than in a world without private information — directly contrasting the static informed principal result of Koessler and Skreta (2016) where data always helps the seller.

When the seller can control service quality q in addition to access I (Section 6), the relevant equilibrium mechanisms become dynamic tiered pricing rather than binary trials: a low-quality, high-ad-load free tier provides learning opportunities while reducing information rents; convinced buyers upgrade to a premium ad-free tier. Counterintuitively, enriching the seller’s screening technology can reduce both revenue and social efficiency in equilibrium because additional instruments create additional signaling opportunities that distort outcomes further.

Q: What is the core tension that prevents the first-best from being an equilibrium?

A: When the seller is privately informed and pricing is dynamic, the high-type seller anticipates a greater likelihood of the buyer receiving a utility shock than the buyer’s own prior implies. This belief gap makes it profitable for the high-type seller to deviate from a proposed first-best mechanism by offering a free trial that “proves” high match quality and then extracting rent from convinced buyers. Because this deviation is profitable — yielding pi_F > lambdamu_0T under some parameters — the first-best pooling contract unravels. The interaction of both ingredients (dynamic pricing and informed seller) is necessary: either ingredient alone is insufficient to break the first-best (Proposition 1).

Q: What exactly is the Myersonian free trial and why does the buyer always accept it?

A: The Myersonian free trial provides full service access up to time tM = argmax_t {(1 - exp(-lambdat))(T - t)} at (approximately) zero price, then offers the remaining service at price lambdavM(T - tM) where vM is the Myerson monopoly price. The buyer accepts the trial regardless of their prior belief about match quality because the trial itself is free and provides non-negative payoff. After the trial, the buyer purchases the post-trial service if and only if they received a reward with v >= vM; otherwise they exit. The deviation payoff is pi_F = (1 - exp(-lambdatM))(1 - F(vM))lambdavM*(T - tM).

Q: Under what parametric conditions can the first-best not be supported in equilibrium?

A: By Proposition 2, the first-best cannot be implemented if and only if pi_F > lambdamu_0T. Corollary 1 states that for sufficiently large T this always fails, since as T grows, pi_F grows proportionally (the post-trial term (T - tM) dominates) while tM converges to a finite value. More precisely, for large T, pi_F / (lambdamu_0T) converges to (1 - exp(-lambda*tM)) * (1 - F(vM)) * vM / mu_0, which exceeds 1 under appropriate parameter configurations. Conversely, when mu_0 is high or the service horizon is short, the first-best may remain implementable.

Q: What is a trial mechanism and how does Theorem 1 characterize it?

A: A trial mechanism is defined by a triple (v0, t0, p0): uninformed buyers receive full access on [0, t0] and no access thereafter; a buyer who reports a reward of value v >= v0 at time t receives full service for the remainder [t, T] at a price increment of lambdav0(T - t0); the trial itself is priced at p0. Theorem 1 states that any payoff pair on the boundary of the IC-IR feasible set is outcome-uniquely attained by such a trial mechanism with appropriately determined (v0, t0, p0). The proof uses a relaxed problem retaining only two key constraint families: local incentive constraints on value reporting (IC-V) and a global intertemporal constraint preventing buyers from hiding the arrival of rewards forever (IC-U).

Q: How does the trial length respond to changes in prior belief mu_0 and distributional spread?

A: Proposition 3 states that t0 is weakly increasing in mu_0: as market belief becomes more optimistic, both seller types extract higher revenue from the trial, so the mechanism designer extends the trial. Proposition 4 adds that for a uniform distribution on [1-delta, 1+delta], trial length t0 is weakly increasing in delta (greater spread). The post-trial threshold v0 is weakly decreasing in mu_0, meaning that a more optimistic prior leads to a less exclusive post-trial cutoff.

Q: What are the equilibrium payoffs and how does the high-type seller’s free-trial option constrain them?

A: Proposition 5 states that (pi_L, pi_H) is an equilibrium payoff if and only if it lies in the IC-IR feasible set and pi_H >= pi_F. The lower bound pi_H >= pi_F reflects the high-type seller’s outside option: they can always deviate to the Myersonian free trial. Corollary 4 then shows that all “reasonable” equilibrium payoffs (those with pi_H >= pi_L, surviving a mild off-path refinement) are implemented by trial mechanisms with complete pooling — both seller types propose the same mechanism and the buyer updates beliefs only through private consumption signals, not the mechanism’s structure.

Q: What does the D1 refinement select and why do it lead to worse outcomes?

A: Proposition 6 shows that the only equilibrium trial mechanisms surviving the D1 criterion have trial length tM and post-trial threshold vM — the Myersonian free trial parameters. These have the shortest trial and highest post-trial price among all equilibrium mechanisms, resulting in the minimum social surplus. The intuition is that the high-type seller signals credibly by proposing mechanisms that generate high revenue from post-trial price discrimination (which the low type cannot profit from), pushing toward maximum learning-based discrimination. All D1-surviving payoffs are Pareto dominated by the point H (the unconstrained IC-IR optimum) for any prior mu_0, and Pareto dominated by point B when mu_0 is small.

Q: Can having consumer preference data hurt the seller, and under what conditions?

A: Yes. The distortion from signaling incentives can be so large that both seller types earn strictly less in the D1-surviving equilibrium than they would if neither possessed private information (where the first-best is attained). This result holds when the condition of Proposition 2 is satisfied — i.e., when pi_F > lambdamu_0T. This contrasts sharply with the static result of Koessler and Skreta (2016), in which the ex-ante profit-maximizing mechanism is always supportable in equilibrium and data always (weakly) helps sellers.

Q: How do trial mechanisms differ from the prior literature on signaling through introductory prices?

A: The earlier literature (Milgrom and Roberts 1986; Bagwell 1987; Bagwell and Riordan 1991; Judd and Riordan 1994) uses two-period models with no seller commitment, so all pricing behavior is necessarily trial-like by model restriction. The present model instead allows the seller full flexibility to design any dynamic mechanism — including selling everything ex-ante, which would prevent buyers from gaining information rent. Trials emerge endogenously as the equilibrium outcome rather than being imposed by the model structure, and the paper provides new economic content on what determines trial length and price thresholds.

Q: What happens when the seller controls service quality in addition to access?

A: Section 6 extends the baseline by allowing the seller to choose (I, q) from a subset of [0,1]^2, where I governs the Poisson arrival rate and q scales the reward value (utility from a reward is v*q). Theorem 2 shows that the relevant equilibrium mechanisms now take the form of dynamic tiered pricing: a low-quality tier (interpreted as high ad load) provides learning opportunities while reducing information rents; once convinced, buyers upgrade to a premium high-quality tier. Enriching the screening technology in this way can reduce both revenue and social efficiency in equilibrium, because additional instruments create additional signaling opportunities that distort outcomes further from the revenue-maximizing benchmark.

Q: What are the two sources of welfare loss relative to the first-best in D1-surviving equilibria?

A: The welfare analysis in Appendix F identifies two sources. First, exclusion inefficiency: buyers with values v in [v_low, vM) who would generate positive surplus are excluded from post-trial service. Second, service truncation inefficiency: service access is cut off after trial length tM for buyers who were never convinced (theta = L type realizations and high-type buyers with v < vM), reducing total surplus below the first-best of mu_0 * lambda * T. Both losses are minimized (welfare is maximized) among trial mechanisms by longer trials and lower post-trial cutoffs, precisely the opposite of what D1 selects.

Q: Does the model extend to continuous seller types or multiple buyer types?

A: Appendix K outlines an extension to continuous seller types theta drawn from a distribution G on [theta_low, theta_high], where rewards arrive at rate lambdaItheta. The main economic forces persist: higher seller types anticipate faster buyer learning and have stronger incentives to offer trials. The main results generalize: equilibrium mechanisms are trial mechanisms, and under D1, pooling equilibria with maximum post-trial discrimination are selected. Appendix G similarly notes that the multiple-buyer-type extension preserves complete pooling and the D1 selection result.

Q: What is the role of the “global intertemporal constraint” (IC-U) in the proof of Theorem 1?

A: The canonical approach to dynamic mechanism design (Eso and Szentes 2007; Pavan, Segal, and Toikka 2014) relaxes the problem to only local incentive constraints on the initial report. This fails here because the informed seller causes buyer and seller to disagree on the evolution of buyer beliefs, making the timing of trade matter and requiring tracking of incentive constraints at every point in time. The paper identifies two key binding constraints in the relaxed problem: (IC-V) the buyer does not misreport their reward value, and (IC-U) the buyer does not remain silent about the arrival of a reward forever. Retaining only these two constraint families yields a tractable bang-bang solution for the optimal access policy, which is then verified to satisfy all original IC-IR constraints.

Q: What are the implications for platform design and data collection strategy?

A: The results imply that the value of consumer data depends critically on market dynamics. In static markets, collecting data about consumer match quality is weakly beneficial for sellers (Proposition 1, first point). In dynamic markets with buyer learning and sufficiently long service horizons, the same data can strictly reduce seller revenue by enabling a deviation that unravels first-best pricing. This suggests platforms in dynamic digital markets should weigh whether possessing and acting on proprietary match data improves or worsens their equilibrium position, and that regulatory attention to consumer data collection in dynamic markets may have welfare-ambiguous effects.

Trial mechanism: A dynamic mechanism parameterized by (v0, t0, p0) in which the seller provides full service access during [0, t0] for uninformed buyers, offers continued service after t0 only to buyers who received a reward with value v >= v0, and charges a post-trial price of p0 + lambdav0(T - t0) for those who qualify. In the paper’s usage, this is the unique outcome-implementing mechanism on the boundary of the IC-IR feasible payoff set.

Myersonian free trial: The limiting trial mechanism as the trial price epsilon approaches zero, with trial length tM = argmax_t {(1 - exp(-lambdat))(T - t)} and post-trial threshold vM equal to the Myerson monopoly price. It yields payoff pi_F = (1 - exp(-lambdatM))(1 - F(vM))lambdavM*(T - tM) to the high-type seller, and constitutes the binding outside option constraining equilibrium payoffs.

Belief gap: The divergence between the seller’s and buyer’s beliefs about the rate at which the buyer will receive Poisson rewards. Because the high-type seller knows theta = H, they anticipate a higher probability of reward arrival than the buyer’s prior implies. This gap makes the buyer’s belief process non-martingale from the seller’s perspective, breaking the standard dynamic mechanism design approach and creating profitable deviation incentives.

IC-IR feasible payoff set: The set of seller payoff pairs (pi_L, pi_H) achievable by mechanisms satisfying both incentive compatibility (for seller type reports and buyer learning reports) and individual rationality (non-negative ex-ante payoffs for all parties). Theorem 1 establishes that the boundary of this set is uniquely implemented by trial mechanisms.

Dynamic tiered pricing: The equilibrium mechanism form that emerges when the seller controls both access I and service quality q. It features a low-quality tier (high ad load) providing learning opportunities at reduced information rent, and a premium tier offering full quality to buyers convinced of high match quality. This generalizes trial mechanisms to settings with richer screening technology.

Global intertemporal constraint (IC-U): The constraint requiring that, upon receiving a Poisson reward, the buyer finds it suboptimal to remain silent about its arrival forever. Together with the local value-reporting incentive constraint (IC-V), these two constraints constitute the binding restrictions in the paper’s relaxed mechanism design problem, replacing the full continuum of incentive constraints that would otherwise be intractable.

D1 criterion: A standard equilibrium refinement from signaling games applied here to the space of mechanism proposals. Among all pooling equilibrium trial mechanisms, D1 selects only those with parameters (tM, vM) — the shortest trial length and highest post-trial threshold — because the high-type seller has a strictly larger set of buyer responses for which deviation to a high-discrimination mechanism is profitable. These surviving mechanisms Pareto dominate no other equilibrium mechanism and minimize social surplus.

From Interaction to Business Fluctuations: How Credit Network Explains Cycles

Mon, 01 Jan 0001 00:00:00 +0000

Layer 1: Overview

This paper investigates how the endogenous structure of credit, deposit, and interbank networks shapes business cycle fluctuations and large financial crises in the U.S. economy. Ciola and Tedeschi build and estimate a microfounded heterogeneous-agents macroeconomic model in which households, firms, and banks interact through decentralized matching in three markets — deposits, credit, and interbank lending — with agents choosing partners based on both posted interest rates and the size of the counterpart, generating a preferential-attachment mechanism that endogenously concentrates the financial sector. The structural parameters governing network formation are estimated on U.S. quarterly interest rate and GDP growth data from 1947 to 2019 via an Extended Method of Simulated Moments (EMSM) procedure combined with a Bayesian Adaptive Random Walk Metropolis–Hastings sampler; the calibrated model reproduces the empirical autocorrelation structure of these series. The model’s key finding is that preferential attachment endogenously concentrates roughly three-quarters of deposits, credit, and interbank transactions into a single hub bank, whose dominance raises markups, suppresses deposit rates, and depresses aggregate capital accumulation relative to the initial symmetric state. Bank runs against this hub — rare but endogenously generated when households reallocate deposits simultaneously — collapse the interbank market completely and produce deep recessions that last multiple quarters, with recovery requiring approximately five years.

In depth

Q1. What is the model’s core structure and how do agents interact?

The model consists of a fixed number of households (N_H = 1,000), banks (N_I = 10), and firms (N_F = 1,000) who interact in deposit, credit, and interbank markets through a decentralized preferential-attachment matching mechanism in which agents assess both current interest rates and the size of potential counterparts. Households deposit savings in a single bank chosen based on a fitness index combining the bank’s promised deposit rate and its size (used as a proxy for long-run quality), and they search for a new partner each period with probability ζ_H. Firms borrow from one bank at a time, also choosing based on a fitness that weighs the promised profit share against bank size, and switch with probability ζ_F. Banks set interest rates in all three markets to maximize expected profits, exploiting their monopolistic power (higher when they are larger), subject to a balance sheet constraint that links deposits, credit extended to firms, and interbank borrowing. The interbank market exists specifically to cover unexpected deposit withdrawals: when a bank’s deposits fall below its outstanding credit, it borrows in the interbank market or closes credit lines.

Q2. How does the estimation methodology work and what parameters does it identify?

The paper employs the Extended Method of Simulated Moments (EMSM) of Smith (1993) and Gourieroux et al. (1993), which minimizes the weighted distance between the coefficients of a VAR auxiliary model estimated on observed U.S. data and on H simulated time series generated from a given structural parameter vector, with the optimal weighting matrix set to the inverse of the Newey–West covariance of the auxiliary parameter estimates. Because gradients of the criterion function are not analytically available for this nonlinear agent-based model, the authors use a two-step approach: first, a Particle Swarm Optimization (PSO) algorithm explores the parameter space to locate a neighborhood of the global minimum; second, a Bayesian Adaptive Random Walk Metropolis–Hastings (ARWMH) algorithm generates posterior draws from the structural parameter distribution using the chi-square distributional properties of the EMSM criterion function. The estimated structural parameters include the nine network formation parameters {ω_X, ζ_X, ψ_X} for each of the three markets — governing competition intensity, switching probability, and the weight agents assign to counterpart size — while the production coefficient (α = 0.37) and household discount factor (β = 0.997) are calibrated directly to U.S. labor share and real interest rate data. Estimation uses 1947:Q1–2019:Q4 U.S. real GDP growth and real interest rate data; with three VAR lags and d = 9 structural parameters, the overidentification chi-square test can be assessed.

Q3. What are the long-run dynamics and how does the financial network concentrate?

Starting from an equal distribution of agents across banks, the model converges to a pseudo-steady-state in which a single hub bank intermediates approximately three-quarters of deposits, credit lines, and interbank transactions, because the preferential-attachment mechanism is self-reinforcing: larger banks attract more depositors (providing more stable funding), more firms (generating more profit), and more interbank counterparts, which further enlarges their size and attractiveness. This concentration has clear aggregate consequences: as the hub’s monopolistic power grows, it widens the markup over the perfect competition interest rate in the credit market and the markdown below it in the deposit market, reducing the deposit rate paid to households and thereby depressing household capital accumulation. Simulations across 1,000 independent replicas show that the aggregate production level in the pseudo-steady-state is below the initial competitive equilibrium, credit and interbank interest rates rise, and approximately 10% of total capital circulates through the interbank market as periphery banks rely on the hub for liquidity provision.

Q4. How do cyclical fluctuations and crises emerge endogenously?

Business cycles arise from the continuous reallocation of household deposits across banks, which generates endogenous liquidity shocks that do not require an exogenous crisis trigger: when a critical mass of households simultaneously reallocates away from the hub — a rare but endogenous event driven by the stochastic matching process — the hub faces a severe liquidity shortage, must close credit lines and interbank lending, and produces a systemic economic contraction. In a representative 100-year simulation, aggregate production fluctuates around a stable trend with mild recessions most of the time, but the model occasionally generates a catastrophic bank run against the hub. When this occurs, the hub’s weighted degree in all three markets collapses to near zero within one or two quarters, the interbank market freezes completely, and firm production stops because firms cannot immediately reallocate their credit demand to alternative banks. The impulse response to a sudden reduction in hub deposit centralization shows that aggregate production falls sharply in the short run (as credit contracts) and only surpasses its pre-run level after approximately five years (20 quarters).

Q5. What does the VAR impulse response analysis reveal about recovery dynamics?

An estimated VAR on all simulations — with aggregate production and the volume, centralization, and interest rates of each of the three markets as endogenous variables — shows that a negative shock to deposit market centralization (i.e., a bank run against the hub) triggers an immediate spike in deposit interest rates (as competing banks compete for the displaced funds), a contraction in credit and interbank supply (as periphery banks lack sufficient liquidity to expand), and a rise in credit interest rates (as the pool of surviving credit lines is concentrated in the most profitable projects). In the medium run, higher deposit rates promote household capital accumulation, which ultimately expands the aggregate supply of productive capital; at the same time, the dissolution of the old hub reduces the sector’s average monopolistic markup, permanently lowering credit market interest rates. This self-correcting mechanism underlies the five-year recovery window and also illustrates why prompt policy intervention during hub-collapse crises is particularly effective — early stabilization prevents the reinforcing deposit-withdrawal spiral that deepens the contraction.

Q6. What is the paper’s contribution relative to existing macroeconomic network literature?

The paper makes three distinct contributions over prior agent-based macroeconomic network models: first, it treats households as active depositors whose reallocation choices generate endogenous liquidity shocks rather than simply passive shock absorbers; second, it models banks as profit-maximizing agents that optimally set interest rates exploiting market power rather than assuming perfect competition or regulatory constraints; and third, it produces a Bayesian estimator of all structural parameters rather than relying on calibration to observed moments. Prior work in this tradition (Delli Gatti et al. 2010; Riccetti et al. 2013; Lenzu and Tedeschi 2012) typically either omits households from the deposit market or assumes exogenous mechanisms of crisis formation. By endogenizing all three sources of network dynamics — deposit, credit, and interbank — and estimating the model on U.S. data, the paper provides a framework in which large financial crises emerge as intrinsic system properties rather than imposed scenarios, and quantifies the structural parameters driving them.

Key Concepts

preferential attachment : a matching mechanism in which agents preferentially form links with larger counterparts; in this model it causes households and firms to favor large banks, endogenously concentrating the financial sector into a hub-and-spoke structure with a dominant hub bank.

hub bank : the single largest financial intermediary that endogenously emerges in the model’s long-run equilibrium, intermediating approximately three-quarters of deposits, credit lines, and interbank transactions; its size confers monopolistic power but makes it the systemic node whose failure triggers economy-wide crises.

Extended Method of Simulated Moments (EMSM) : the estimation strategy used to identify the nine network formation structural parameters; it minimizes the weighted distance between VAR coefficients estimated on observed U.S. data and on model-simulated data, with a Bayesian ARWMH sampler used to generate the posterior distribution given the chi-square-distributed criterion function.

endogenous bank run : the crisis mechanism in this model — a simultaneous reallocation of household deposits away from the hub, triggered by the stochastic matching process rather than an external shock, that freezes the interbank market and produces a deep recession lasting approximately five years (20 quarters) in impulse response analysis.

FX Interventions and Capital‐Constrained Banks: Evidence from USD/ILS Spot, Forward, and Option Markets

Mon, 01 Jan 0001 00:00:00 +0000

This paper uses confidential daily data on the Bank of Israel’s (BOI) foreign exchange purchase program in the USD/Israeli new shekel (ILS) spot market from 2013 to 2019 to study how FX interventions affect the spot exchange rate, the forward rate (through covered interest parity deviations), and the risk-neutral probability distribution of future exchange rates reflected in the options market. Interventions of USD 1 billion are found to be associated on average with a depreciation of the ILS by 0.82%–0.85%—at the upper bound of estimates in the existing literature—while the indirect effect on the forward rate is smaller because the BOI’s USD purchases widen the negative deviation from covered interest parity (CIP). The higher moments of the risk-neutral distribution—including crash risk—are found to be unaffected; USD purchases shift the entire distribution toward higher USD/ILS values without altering its shape. An additional finding is that the USD/ILS options market appears to anticipate intervention episodes and prices them in before they occur. This paper is the first academic study to empirically quantify the effect of FX interventions on CIP deviations. Note: this summary is based on Bundesbank DP 20/2022 “Foreign exchange interventions and their impact on expectations: Evidence from the USD/ILS options market,” an earlier version; the published JMCB paper title indicates expanded scope including capital-constrained banks and spot/forward/option markets.

In depth

Q1. What is the data and research design?

The paper uses confidential daily data on the BOI’s intervention program in the USD/ILS spot market from 2013 to 2019, together with USD/ILS option price data, to identify the effect of sterilized FX purchases on the spot rate, forward rate, and option-implied expectations. The authors note that results from older studies may not be representative because FX markets have changed substantially over the past decade and the sustained low-interest-rate environment of this period is historically exceptional, making updated empirical evidence important.

Q2. What is the estimated effect on the spot exchange rate?

Interventions of USD 1 billion are associated on average with a depreciation of the ILS by 0.82%–0.85%, which is at the upper bound of the estimated impact found in other studies. The direction is consistent with portfolio balance and signaling channels: BOI purchases of USD increase demand for dollars and supply of shekels, driving the spot USD/ILS rate higher.

Q3. How do interventions affect the forward rate and covered interest parity?

The indirect effect of BOI USD purchases on the forward rate is smaller than the spot effect because the purchases widen the negative deviation from covered interest parity—this paper is the first to empirically quantify the effect of FX interventions on CIP deviations. The CIP deviation widens because the spot rate moves more than the forward rate, creating a cross-currency basis that is not fully closed by the intervention.

Q4. How are the higher moments of the exchange rate distribution affected?

The higher moments of the risk-neutral probability distribution of future exchange rates—including crash risk—are found to be unaffected by BOI USD purchases; the purchases simply shift the entire distribution toward higher USD/ILS values without compressing its variance or altering its shape. This finding indicates that FX interventions move the level of expected future exchange rates but do not reduce tail risk or change the perceived skewness of the distribution from the market’s perspective.

Q5. Do options markets anticipate interventions?

The USD/ILS options market is found to anticipate intervention episodes and price them in before they occur. This anticipation is consistent with market participants forming rational expectations about the BOI’s reaction function based on observable exchange rate dynamics, and adjusting option prices accordingly ahead of actual intervention.

Key concepts

risk-neutral probability distribution (RND) : the probability distribution over future exchange rates recovered from observed option prices; reflects market forward-looking beliefs including higher moments such as crash risk and skewness, under risk-neutral pricing conventions.

covered interest parity (CIP) deviation (cross-currency basis) : the departure from the no-arbitrage relationship linking spot rates, forward rates, and interest rate differentials; a negative CIP deviation for the ILS means the forward USD premium exceeds the USD-ILS interest rate differential, implying the dollar is cheap in the forward market relative to the spot-and-roll strategy.

sterilized FX intervention : central bank foreign currency purchases or sales offset by domestic open market operations to prevent the domestic money supply from changing, isolating the exchange rate channel from monetary policy effects.

Gendered Spheres of Learning and Household Decision-Making over Fertility

Mon, 01 Jan 0001 00:00:00 +0000

This paper investigates whether information asymmetries within households about maternal health risk can explain persistent spousal disagreement over fertility in a high-fertility, high-maternal-mortality setting. The authors develop a theoretical model and conduct a randomized field experiment among approximately 500 couples in peri-urban Lusaka, Zambia, where the lifetime risk of maternal death is 1 in 59 women and the maternal mortality ratio is 398 deaths per 100,000 live births.

The central mechanism is a communication barrier that arises from conflicting fertility preferences between spouses. When husbands have higher desired fertility than wives (4.43 vs. 4.19 children on average in the study sample), wives who are better informed about maternal health risk lack the incentive to credibly transmit that information to their husbands. Strategic communication concerns — not a generically lower propensity of men to learn from women — drive this asymmetry. The model predicts a pooling equilibrium in which no informative communication flows from wives to husbands when preference divergence is sufficiently large.

The experiment randomized whether the maternal mortality information curriculum was delivered to the husband or the wife in each couple, with both spouses in all arms also receiving a family planning curriculum. This design isolates the incremental effect of the maternal mortality information and permits identification of direct versus spillover effects within the household.

Consistent with the model, treated husbands significantly update their beliefs about maternal health risk factors, and their wives also update — information flows from husbands to wives. By contrast, treated wives update their own beliefs, but their husbands do not update at all. The test that spillover effects are symmetric is rejected (p-value = 0.097 for risk factors index; p-value < 0.001 for direct vs. indirect effects on men). The communication asymmetry is most pronounced among husbands who, at baseline, want a child as soon as possible — precisely the households with the greatest preference conflict.

Both treatment arms reduce fertility. Households in which the husband is treated experience a 43% reduction in the probability of having a child or being pregnant in the year following the intervention. The fertility reduction is strongest when the wife faces higher ex ante risk based on her birth history, consistent with the model’s prediction that treatment effects are concentrated among households with high maternal health costs.

The transfers evidence is the key differentiator between the two arms. When the wife is treated, fertility declines but is accompanied by a significant reduction in transfers from husband to wife, consistent with the wife updating her own beliefs without being able to convey them to her husband, who then reduces compensation. When the husband is treated, fertility declines without the same reduction in transfers — and treated husbands report higher communication with their spouse about family planning and higher relationship satisfaction. This combination is consistent with the husband treatment resolving the information gap directly, enabling efficient contracting, whereas the wife treatment leaves the information asymmetry in place.

The study is conducted in informal settlements of Lusaka, a prime-age urban sample in which the average woman is 28 years old with 2.6 children at baseline. Scope conditions: results apply to a setting with very high maternal mortality, large baseline spousal fertility gaps, and strong traditional beliefs (55.5% of men cite marital infidelity as a leading cause of maternal complications). Generalizability to lower-risk or lower-preference-gap settings is explicitly circumscribed by the model’s comparative statics.

Q: What is the baseline gender gap in knowledge of maternal health risk? A: Men are less likely than women to identify high parity (72.0% vs. 77.7%) and advanced maternal age (74.3% vs. 84.6%) as risk factors. In seven hypothetical scenarios rating complication likelihood on a 0–10 scale, men report lower scores than women in six out of seven cases. Despite Zambia’s 1-in-59 lifetime maternal mortality risk, only 27.6% of men (vs. 53.4% of women) report having attempted to discuss maternal health risk with their spouse.

Q: What drives the gender gap in knowledge? A: The authors argue the gap stems from “gendered spheres of direct and indirect knowledge accumulation of maternal labor and delivery outcomes.” Women are embedded in social networks where maternal mortality episodes are more salient: 11.0% of women report knowing a close friend who died giving birth, vs. 6.8% of men knowing a close friend whose wife died. The gap widens with social distance to the victim, suggesting women’s networks give them systematically more exposure to maternal mortality events.

Q: How does the model explain the failure of within-household communication? A: The model places husband and wife preferences as minimizing the distance between realized fertility and their respective net fertility optima (ideal fertility minus weighted maternal health cost). When the husband’s ideal fertility is high enough, he makes transfers to induce the wife to bear more children than her private optimum. Given these incentives, a wife who is informed about high health costs has an interest in exaggerating the cost to extract larger transfers. Because the husband anticipates this, no informative communication occurs in equilibrium — the only equilibrium is a pooling equilibrium where the wife’s message is uninformative regardless of her true cost realization.

Q: What is the specific asymmetry in belief updating observed in the experiment? A: Among treated husbands, both husbands and their wives update beliefs about maternal risk factors — information flows from husband to wife. Among treated wives, only the wife updates; her husband does not. The Wald test rejects equal direct and indirect effects on men at p < 0.001 and rejects symmetric spillovers at p = 0.097 for the risk factors index. There is no symmetric restriction binding for women’s updating across arms.

Q: How large is the fertility effect and which arm drives it? A: Households in which the husband is treated experience a 43% reduction in the probability of having a child or being pregnant in the year following the intervention. This effect is described as of the same order of magnitude as other household-level interventions shown to reduce pregnancy (citing Ashraf, Field, and Lee 2014). The fertility reduction is strongest among households where the woman faces higher ex ante risk based on birth history, consistent with the model’s Prediction 5 that effects are concentrated where theta_j is high.

Q: How do transfers differ between the wife-treated and husband-treated arms? A: When the wife is treated, the fertility decline is accompanied by a significant reduction in transfers from husband to wife. When the husband is treated, the fertility decline is not accompanied by a similar reduction in transfers. The authors interpret this pattern as: wife treatment leaves the husband uninformed, so he reduces transfers when he observes her reducing fertility without understanding why; husband treatment resolves the information gap, allowing efficient renegotiation without penalizing the wife.

Q: Which husbands fail to update beliefs even when their wife is treated? A: Husbands who at baseline want a child “as soon as possible” do not update their beliefs in response to their wife’s treatment status. These men also reduce transfers to their wife more than other groups when she is treated. In the model, these are precisely the households with the highest conflict of interest (high alpha_H), where the pooling equilibrium prediction is sharpest.

Q: What is the role of traditional beliefs about maternal mortality? A: 55.5% of men and 42.0% of women report (without prompting) marital infidelity as a leading cause of maternal labor and delivery complications — greater weight than assigned to lack of healthcare and poor health status combined. This stigma directly reduces women’s willingness to raise concerns about birth complications with their spouse, reinforcing the communication barrier the model formalizes.

Q: What are the welfare implications of targeting men vs. women with information? A: The fertility reduction from husband treatment is not inferior to that from wife treatment, but husband treatment also produces improvements in marital surplus — treated husbands report higher communication with spouse about family planning, higher relationship satisfaction, and greater closeness — whereas wife treatment reduces transfers to the wife, indicating she bears a financial cost. The authors argue male-targeted information can reduce unmet need for family planning while enhancing rather than exacerbating household conflict.

Q: Does this paper provide field experimental evidence on strategic communication models? A: The authors claim this is the first field experimental evidence directly testing models of strategic communication (Crawford and Sobel 1982; Mailath 1987; Crawford 1998, 2019), wherein persistent preference differences and conflict of interest impede communication and beliefs updating. Prior tests of these models were conducted in the lab; this paper provides the first real-world behavioral test with consequential decisions (fertility) in a high-stakes setting.

Q: What is the unmet need for family planning in the study sample? A: Overall, 32% of women in the sample report not using modern contraceptives at baseline. Of the 33% of women who want no more children, 27% are not using any modern contraceptive (8% of the overall sample). Of the 52% of women who wish to delay giving birth by at least one year, 23% are not using any modern contraceptive (12% of the overall sample).

Q: How does the model characterize the husband’s partial internalization of maternal health costs? A: The husband’s utility function includes the maternal health cost theta_j scaled by delta (0 ≤ delta ≤ 1), capturing how much weight he places on his wife’s risk. When delta is sufficiently high and the husband’s ideal fertility (alpha_H) is sufficiently low, or when his disutility of transfers (gamma) is sufficiently low, informative communication can occur after the husband is treated. When delta is low, the husband discounts his wife’s risk and communication barriers are more severe regardless of treatment.

Maternal health cost (theta): A random variable representing the welfare cost borne by the wife from childbearing, including mortality risk and morbidity. In Zambia, distributed with a higher mean than the worldwide distribution. Enters the wife’s utility directly and the husband’s utility only scaled by delta, his degree of internalization of her cost.

Gendered spheres of learning: The paper’s term for the systematic differential in experiential exposure to maternal mortality outcomes between men and women, arising from gender-segregated social networks. Women witness maternal mortality events more directly through closer social ties, while men’s networks provide systematically less exposure.

Communication barrier (pooling equilibrium): The equilibrium outcome in the model where no informative signal is transmitted from an informed wife to her uninformed husband about the true realization of maternal health cost. Arises because the wife’s incentives to misreport are independent of the true cost realization, making any message uninformative when preference conflict is sufficiently large.

Intra-household information spillover: The transmission of information learned by one spouse to the other as a consequence of the treated spouse’s belief update. The paper documents asymmetric spillovers: information flows from treated husbands to their wives, but not from treated wives to their husbands.

Husband’s demand for children (alpha_H): The husband’s ideal fertility level, which governs the degree of preference conflict within the household. Baseline husband desire for a child as soon as possible serves as the empirical proxy for high alpha_H and is the key moderator of spillover and transfer effects.

Degree of internalization (delta): The parameter in the husband’s utility function (0 ≤ delta ≤ 1) capturing how much weight he places on his wife’s maternal health cost. When delta is high and gamma (disutility of transfers) is low, communication can occur in equilibrium after the husband is treated.

Unmet need for family planning: Women who wish to space or limit births but are not using modern contraception. In the study sample, 32% of women report not using modern contraceptives at baseline, with substantial shares among both those wanting no more children and those wishing to delay.

Global Working Hours

Mon, 01 Jan 0001 00:00:00 +0000

Drawing on about 5,000 labor force and household surveys from 160 countries that cover 97% of the world’s population, this paper builds a new global database of hours worked and shows that hours worked per adult decline only slightly with GDP per capita and are weakly correlated with economic development overall: the unconditional elasticity of hours with respect to GDP is about -0.04 across countries and -0.01 within countries over time, GDP explains roughly 5% of cross-country and under 1% of within-country historical variation in hours, and the implied reduction is 0-20% over the entire development spectrum. The strong age and gender gradients the authors document are, in their cross-country regressions, driven less by development itself than by institutions: hours worked by the young (aged 15-19) and the elderly (aged 60+) fall with development almost entirely because of rising school attendance and public pension coverage, while prime-age (20-59) hours stay roughly flat but undergo what the authors call a “great gender reshuffling,” in which falling male hours per worker are quantitatively offset by rising female labor force participation. Across countries and over time, labor taxes are strongly negatively correlated with prime-age hours worked; controlling for government transfers only partly reduces this link, which the authors read as ruling out income and substitution effects on labor supply as the only driver, while controlling for working-hours regulations and the size of the formal sector reduces the link much more sharply, suggesting to them that regulation—not just the incentive effects of taxes—plays a large role in shortening intensive-margin hours in richer countries. The authors conclude that collective choices and social norms, often encoded in public policy (schooling, pensions, cultural norms about women’s work, and hours regulation), powerfully shape working hours over and above pure economic development. These are correlational cross-country and time-series patterns rather than identified causal effects, and hours are measured as weekly hours in all GDP-producing jobs (including unpaid agricultural work but excluding unpaid home services).

In depth

Q1. What new data does the paper assemble, and how does it improve on prior global hours databases?

The authors mobilize roughly 5,000 nationally representative household and labor force surveys to build a database of hours worked covering 160 countries and 97% of the world population in cross section, plus time series spanning over 20 years in 86 countries. They combine six groups of sources, principally the ILO’s Microdata Repository (about 1,800 surveys in 150 countries since 1990) and the World Bank’s I2D2 database, which include survey data not publicly disclosed by the countries that created them. This extends the most comprehensive prior effort, Bick, Fuchs-Schündeln, and Lagakos (2018), whose core database covered 49 countries (23% of world population) and whose extended database covered 80 countries (41%); large countries such as China and India (35% of world population) that were absent from that study are now included. The authors state they are publishing and plan to regularly update the underlying database at the country×year×age×gender level so that researchers can reproduce their results.

Q2. How seriously does the seasonality concern affect the estimates?

The authors investigate seasonality directly and conclude that monthly seasonality in hours worked is limited in developing countries—actually larger in richer countries because of summer holidays—which gives them confidence that surveys not fielded over the full year still provide reliable annual hours estimates. This matters because Bick, Fuchs-Schündeln, and Lagakos (2018) had restricted their core sample partly out of concern that surveys run in specific months (e.g., around seasonal agricultural work) could bias hours estimates. Resolving this concern is what lets the authors retain the far larger country coverage.

Q3. How much do hours worked actually vary with economic development?

Hours worked per adult slightly decline with GDP but are only weakly correlated with development overall, with an unconditional elasticity of about -0.04 in the cross section and -0.01 in panel data—implying a reduction in hours of 0-20% over the entire development spectrum. GDP explains around 5% of cross-country variation in hours worked and less than 1% of historical within-country variation. Decomposing the margins, employment rates are essentially uncorrelated with development, while hours per worker are bell-shaped: they rise at low levels of development because of structural change (hours in manufacturing and services are very high in middle-income countries, while agricultural hours are moderate and flat with GDP), then flatten. Globally, 59% of the adult population (aged 15+) is employed, working an average of 42 hours per week, which implies about 25 weekly hours per adult; hours are strongly bell-shaped with age, and women supply 35% of GDP-producing hours versus 65% for men, a gap driven mostly by the extensive employment-rate margin.

Q4. Why do hours worked by the young and the elderly fall with development?

In simple cross-country regressions, the decline in hours worked by the young (15-19) and the elderly (60+) as countries develop is entirely driven by rising school attendance for the young and rising public pension coverage for the elderly, in line with a broad body of prior work. In the time series the two margins diverge: the fall in youth work is particularly pronounced, whereas elderly work is stable rather than falling. The authors read this as consistent with developing countries expanding schooling faster, but rolling out elderly pensions more slowly, than frontier economies did historically.

Q5. What happens to prime-age hours, and what is the “great gender reshuffling”?

Prime-age (20-59) hours worked are flat, if not slightly increasing, with GDP per adult, but this stability masks a large compositional shift the authors term a “great gender reshuffling”: female hours rise with development while male hours decline, and the fall in male hours (driven by reduced hours per worker) is quantitatively offset by increases in female employment rates. The authors interpret this as development tending to equalize hours across genders—shortening the long hours of working men while allowing more women into GDP-generating employment. They emphasize considerable heterogeneity across countries and over time in this pattern.

Q6. What role do religion and political history play in female hours worked?

The authors report that Muslim/Hindu religion depresses female hours worked enormously, while former communist status increases them. Grouping countries into former-communist, Muslim/Hindu-majority, and other categories, they show female hours rise with development on average but with large level differences across these groups, which they treat as evidence that cultural and institutional factors—not development alone—shape the gender allocation of work. These are descriptive cross-country associations, not causal estimates.

Labor taxes are strongly negatively related to prime-age hours worked, both in international comparisons and within-country time series; once tax variables are controlled for, GDP per capita is only weakly positively correlated with hours, with an elasticity of around 0.1. The authors probe what drives the tax-hours link. Controlling for social spending (cash or quasi-cash transfers) attenuates it, consistent with income effects from transfers playing some role—but the attenuation is only partial, which the authors read as ruling out income and substitution effects on labor supply as the sole driver. Controlling instead for the share of formal workers and working-hours regulations reduces the link much more sharply. They therefore suggest labor taxes depress hours not mainly through income and substitution effects but rather because high labor taxes correlate with the development of a formal sector with regulated working hours.

Q8. Can a standard labor supply model rationalize these findings?

The authors note that a standard labor supply model with a low uncompensated but large compensated labor supply elasticity can rationalize the joint pattern of weak hours-GDP but strong hours-tax correlations. The logic they invoke from the macroeconomics literature is that economic growth raises the wage rate (an uncompensated labor supply effect, which is weak here) while labor taxes fund transfers (a compensated labor supply effect, which is stronger). The partial attenuation of the tax effect when social spending is controlled is consistent with this account, but the sharper attenuation from regulation and formal-sector controls leads the authors to give regulation a large role alongside—rather than instead of—these labor supply channels.

Q9. What is the paper’s overall interpretation?

The authors conclude that collective choices and public policies—schooling and pension systems, cultural norms regarding women, and regulations on hours worked—have first-order effects on the level and allocation of working hours by age and gender, over and above economic development. They argue that while growth may help develop such institutions, many are only partially determined by it, which is why large cross-country variations in hours worked persist at all levels of development. The paper is framed as documenting and interpreting robust correlations across countries and over time, not as identifying causal policy effects.

Q10. What are the main scope conditions and caveats?

Throughout, hours worked follow international conventions: weekly hours in all jobs that contribute to GDP, including unpaid agricultural work but excluding unpaid home services such as cleaning, cooking, and care. Coverage is 97% of world population, with the missing 3% concentrated in parts of the Middle East and North Africa. The central results on taxes, transfers, regulations, religion, and communist history are correlational—drawn from cross-country regressions and within-country time series—and the authors repeatedly use calibrated language (“correlated,” “suggests,” “consistent with”) rather than claiming identified causal effects.

Key concepts

Hours worked (GDP-producing) : Weekly hours in all jobs that contribute to GDP, following international conventions—this includes unpaid agricultural work (which produces goods counted in GDP) but excludes unpaid home services such as cleaning, cooking, and caring for children or the elderly. Great gender reshuffling : The paper’s term for the pattern in which, as countries develop, declining male hours per worker are quantitatively offset by rising female labor force participation, leaving prime-age (20-59) hours worked roughly stable while its gender composition shifts markedly. Unconditional elasticity of hours with respect to GDP : The raw cross-country (about -0.04) or panel (about -0.01) elasticity of hours worked to GDP per adult before conditioning on taxes, transfers, or institutions; its small size is the paper’s headline evidence that development per se explains little hours variation. Uncompensated vs. compensated labor supply elasticity : In the standard labor supply model the authors invoke, growth raises wages (an uncompensated effect, weak in their data) while labor taxes fund transfers (a compensated effect, stronger in their data); a low uncompensated and large compensated elasticity reconciles weak hours-GDP with strong hours-tax correlations. Formal sector / working-hours regulations : Regulated wage employment in which statutory limits on hours bind; the authors emphasize that the expansion of this regulated formal sector with development, rather than the incentive effects of taxes alone, is the channel that most sharply accounts for shorter intensive-margin hours in richer countries.

Key concepts

How Bad Are Weather Disasters for Banks?

Mon, 01 Jan 0001 00:00:00 +0000

Using FEMA disaster declarations matched to SHELDUS property-damage estimates and Call Report data for 1995–2018, this paper finds that weather disasters — even at their most severe — have had modest effects on U.S. bank safety over the last quarter century. For single-county banks exposed to 95th-percentile disasters, Z-scores decline by roughly 9 percent at a five-year horizon under the panel estimates; reaching failure thresholds from sample mean Z-score levels would require a disaster approximately 6.7 standard deviations more destructive than a 95th-percentile event. Federal disaster aid does not appear to be the primary driver of this resilience, since banks exposed to weather events without FEMA declarations exhibit similar stability. Instead, the paper points to a loan demand channel — multi-county bank lending increases roughly 0.25 percentage points per standard deviation of damage at five years without an accompanying interest-rate increase — and to local banks’ apparent avoidance of mortgage lending in flood-prone areas beyond what official flood maps predict, consistent with local information about true flood risk limiting exposure before disasters strike.

In depth

Q1. How severe are weather disaster effects on bank safety?

The paper finds that weather disasters at any severity level produce small and often statistically insignificant effects on the key bank safety measures — charge-offs, capital ratios, return-on-assets volatility, and Z-scores — at single-county banks, with the largest measured effect being roughly a 9 percent decline in Z-scores at the 95th percentile of disaster damage at a five-year horizon. The regression framework uses bank and state-year fixed effects, with SHELDUS damage as the continuous severity measure and FEMA disaster declarations as a binary indicator. For multi-county banks, charge-offs increase by roughly 10 percent at five years, but net income also rises, suggesting disaster-area loan demand partially offsets credit losses. The paper’s calculation is that pushing a typical bank from its mean Z-score of 135.9 to the failure threshold would require a Z-score decline of 127.9 — far exceeding the estimated −9 percent impact of a 95th-percentile disaster, which would need to be approximately 6.7 standard deviations more destructive to close that gap.

Q2. Is bank resilience an artifact of federal disaster aid?

The paper presents evidence that federal disaster aid is not the primary source of bank resilience, since banks exposed to weather events that did not receive FEMA disaster declarations exhibit similarly modest effects on bank safety measures. The test is designed to separate the insurance mechanism (FEMA aid replacing household income and debt service capacity) from intrinsic bank resilience. The fact that non-FEMA disasters produce comparable stability redirects attention to the demand-side and local-knowledge channels as the more fundamental explanations for the resilience finding.

Q3. What is the loan demand channel and how large is it?

Multi-county banks experience an increase in lending of roughly 0.25 percentage points per standard deviation of SHELDUS damage at a five-year horizon, and the authors find no accompanying increase in loan interest rates, which is consistent with a demand-side shift rather than a tightening of lending standards. The demand interpretation is that disasters create a wave of borrowing demand as households and firms repair or replace damaged assets, and the increased loan volume helps offset the increase in charge-offs. The pattern is found at multi-county banks — which can serve affected and unaffected areas simultaneously — but not at single-county banks, consistent with lending capacity mattering for capturing the demand increase.

Q4. What does “local knowledge” mean in this context?

Local banks originate approximately 6.4 percent fewer log mortgage dollars per application in FEMA flood zones than would be predicted by the official flood map classifications alone, with the gap widening to 7–8 percent in areas that have experienced more than five FEMA flood declarations compared to areas with fewer than three, which is consistent with local lenders holding information about true flood risk not captured in official maps. The finding is consistent with local banks having access to community-level information — observed flooding history, property-level characteristics, local drainage and elevation — that is not incorporated into official FEMA flood zone classifications. This pre-disaster selectivity limits mortgage accumulation in the highest-risk areas before disasters occur.

Q5. What are the implications for climate risk assessment?

The paper explicitly frames the historical resilience documented for 1995–2018 as informing rather than settling assessments of physical risk to banks from future climate change, since more frequent or more severe disasters could overwhelm the demand-offset and local-knowledge mechanisms that the paper identifies as sustaining bank performance. The key qualification is temporal scope: the demand-side recovery effect requires that affected areas have the income and economic capacity to service new loans, and the local-knowledge effect requires that banks have experienced enough repeated flooding to develop accurate private flood risk assessments. Both conditions could become less reliable as climate change alters the frequency, geography, and severity of weather events relative to the historical distribution.

Key concepts

Z-score : a bank-level distance-to-insolvency measure equal to (return on assets + capital ratio) divided by return-on-assets volatility; higher values indicate greater distance from failure; used here as the primary measure of disaster impact on bank safety.

SHELDUS : the Spatial Hazard Events and Losses Database for the United States, providing county-level property damage estimates for weather events; used in this paper as the continuous measure of disaster severity in panel regressions.

single-county bank : a bank whose entire depositor base is drawn from one county, making it fully exposed to local disaster effects with no geographic diversification across other counties.

loan demand channel : the mechanism by which disasters increase demand for credit from households and firms repairing or replacing damaged assets, generating new loan volume that partially offsets credit losses at banks serving affected areas.

local knowledge : the paper’s label for the informational advantage that local banks appear to have about true flood risk beyond what official FEMA flood zone classifications capture, inferred from lower mortgage originations in areas with a history of repeated flooding.

How Banks Create Gridlock in Payment Systems to Save Liquidity: The Case of Canada

Mon, 01 Jan 0001 00:00:00 +0000

This paper uses detailed transaction-level data from Canada’s new high-value payment system (HVPS) to show how participants save liquidity by strategically exploiting the gridlock resolution arrangement built into the system. Observed behaviors are found to be consistent with the equilibrium of a “gridlock game” that captures the key incentives participants face: by withholding outgoing payments to induce gridlock events, participants trigger the system’s bilateral netting algorithm, which settles stuck payment queues at lower liquidity cost than bilateral sequential settlement would require. The findings have implications for the design of high-value payment systems and shed light on financial institutions’ liquidity preference in payment system environments.

In depth

Q1. What is the gridlock resolution arrangement and why do banks exploit it?

Modern high-value payment systems (HVPSs) include a gridlock resolution mechanism that activates when a set of payments are mutually stuck in queues—each waiting for an incoming payment before it can be sent—and resolves them simultaneously via bilateral netting, which requires less settlement liquidity than sequential settlement; banks strategically withhold outgoing payments to trigger these events and thereby save liquidity. The HVPS studied is Canada’s new large-value transfer system, which replaced the older LVTS. The gridlock game captures the incentive structure: if a bank expects counterparties to send payments that would be netted against its own obligations in a gridlock, it is optimal to withhold and wait rather than settle bilaterally at higher liquidity cost.

Q2. How is the gridlock game formalized?

The “gridlock game” is a formal game-theoretic model that captures the key incentives participants face in the HVPS: players choose whether and when to send payments, and the equilibrium characterizes the strategic withholding behavior as a rational response to the liquidity-saving opportunities created by the gridlock resolution mechanism. The equilibrium of this game is shown to be consistent with the actual patterns observed in the HVPS data: the timing, magnitude, and counterparty structure of strategic withholding are aligned with the game’s equilibrium predictions.

Q3. What are the implications for HVPS design?

The finding that participants strategically exploit the gridlock resolution mechanism has implications for HVPS design: while gridlock resolution was intended as an exception-handling mechanism for unintended payment queue build-ups, participants have adapted to use it as a routine liquidity management tool, changing the system’s effective operation in ways the designers may not have anticipated. System designers must account for the strategic response of sophisticated participants when evaluating the performance of gridlock resolution mechanisms, since the equilibrium behavior changes the frequency, timing, and magnitude of gridlock events relative to the non-strategic benchmark.

Q4. What does the evidence reveal about banks’ liquidity preferences?

The strategic gridlock behavior reveals that financial institutions place significant value on conserving payment system liquidity—enough to coordinate timing of payment submissions in ways that exploit system-level netting opportunities—consistent with liquidity being a scarce and valuable resource in modern payment systems. This preference for liquidity conservation is amplified in environments where central bank reserves are costly and where payment system participants face collateral or reserve constraints.

Key concepts

gridlock in high-value payment systems : a situation in which a set of payments are mutually stuck in queues—each waiting for incoming funds before outgoing payment can be made—requiring the system’s bilateral netting algorithm to simultaneously settle them; exploited strategically by banks to save settlement liquidity. gridlock game : the paper’s game-theoretic model of strategic payment submission timing in an HVPS; captures the incentive to withhold outgoing payments to trigger gridlock resolution events that settle payment queues at lower net liquidity cost. bilateral netting in HVPS : the gridlock resolution mechanism that settles multiple mutually stuck payments by computing net obligations among participants and settling only the differences; requires less total settlement liquidity than sequential bilateral settlement and is the mechanism banks exploit in the gridlock game.

Identification and Estimation of Dynamic Random Coefficient Models

Mon, 01 Jan 0001 00:00:00 +0000

This paper studies linear panel data models where regression coefficients are individual-specific (random coefficients) and regressors may be predetermined — that is, sequentially exogenous rather than strictly exogenous, as occurs when a lagged dependent variable appears on the right-hand side. The canonical example is the AR(1) model Yit = gamma_i + beta_i * Yi,t-1 + epsilon_it, where both the intercept and the autoregressive coefficient vary across individuals. The setting is short panels (small T), which rules out learning about individual-level coefficient values.

The paper’s central finding, building on Chamberlain (1993, 2022), is that the mean of the coefficient distribution is not point-identified in this dynamic setting. Chamberlain established this for discrete regressors; the paper’s Proposition 1 extends the non-identification result to continuous regressors under stronger assumptions. The paper then characterizes finite lower and upper bounds for the mean, variance, and CDF of the random coefficient distribution. The identification strategy recasts the problem as an infinite-dimensional linear program and exploits the dual representation of that program (following Galichon and Henry (2009) and Schennach (2014)) to derive tractable closed-form bounds for the mean and optimization-based bounds for the variance and CDF.

For the mean parameter, the bounds take a closed-form expression involving the individual OLS estimator, the pooled OLS estimator, and cross-sectional moments of the data. The bounds remain finite even when the data are unbounded, provided certain moments of the data are finite. Tighter (refined) bounds are available when instrumental variables are brought in as additional unconditional moment restrictions. A numerical illustration shows how the outer identified set for E(beta_i) with a true value of 0.5 shrinks as T increases: at T=3 the outer set is approximately [0.216, 0.617]; at T=5 it narrows to approximately [0.306, 0.613]; the corresponding sharp identified sets (available for T=3 through T=5) range from [0.401, 0.593] at T=3 to [0.473, 0.532] at T=5.

The paper proposes computationally tractable inference procedures matched to each parameter. For mean parameters, the closed-form bounds permit a delta-method asymptotic approach augmented with Stoye’s (2020) smooth approximation to handle cases where the sample analog of the bound width can be negative (due to overidentification or mild misspecification). The resulting confidence intervals are valid and robust to overidentification. For the variance and CDF of the coefficient distribution, the paper uses the Andrews and Shi (2017) procedure for inference on a continuum of moment inequalities, which remains computationally feasible.

The empirical application estimates a generalization of Guvenen’s (2007, 2009) lifecycle earnings models using the Panel Study of Income Dynamics (PSID). Where Guvenen compared a restricted income profile (RIP, homogeneous persistence rho) against a heterogeneous income profile (HIP, heterogeneous time trend beta_i), this paper allows persistence rho itself to vary across households (rho_i). The key empirical findings are: (1) under both the RIP and HIP specifications, the estimated average earnings persistence E(rho_i) is significantly below 1; (2) the two specifications produce similar mean-persistence estimates once heterogeneity in rho_i is permitted, suggesting that misspecifying HIP as RIP or vice versa may not cause serious model misspecification when earnings persistence is allowed to vary; (3) the identified sets for the variance of rho_i provide evidence of genuine heterogeneity in earnings persistence across households, implying that households face different levels of earnings risk, which in turn contributes to heterogeneity in their consumption and savings behavior.

Q: Why is the mean of the random coefficient not point-identified in a short dynamic panel? A: Chamberlain (1993, 2022) first established this non-identification for discrete regressors. The paper’s Proposition 1 extends the result to continuous regressors under stronger assumptions. The fundamental obstacle is Lemma 1: E(beta_i) is point-identified if and only if there exists an unbiased estimator of beta_i in the individual time series, and no such estimator exists in short panels where T is small relative to the number of individual parameters.

Q: How does the paper characterize the identified set for the mean parameter? A: The identification problem is recast as an infinite-dimensional linear program. Using the dual representation (Galichon and Henry, 2009; Schennach, 2014), Theorem 1 yields a closed-form interval [L, U] = [BR - (1/2)sqrt(ERDR), BR + (1/2)sqrt(ERDR)], where BR is a weighted average of the individual OLS estimator and the pooled OLS estimator, ER is a non-negative term capturing cross-sectional variation in design matrices, and DR is a non-negative term related to residual variation. The bounds are finite whenever the relevant moments of the data are finite, even with unbounded data.

Q: How are the bounds tightened using instruments? A: Proposition 2 introduces refined bounds [LS, US] by incorporating additional unconditional moment restrictions from instruments Sit. The refined bounds use a larger set of restrictions and are weakly tighter than the baseline bounds. The empirical application employs up to 59 regressors with homogeneous coefficients (handled by Proposition 3), and instruments from lagged earnings levels and differences, substantially increasing the number of moment conditions.

Q: How are the variance and CDF of the coefficient distribution identified? A: Theorem 2 provides a general duality result for any parameter theta of the coefficient distribution. The lower bound is the maximum of E[min_{b} {m(Wi,b) + sum_k lambda_k phi_k(Wi,b)}] over Lagrange multipliers lambda, and the upper bound is the minimum of the corresponding maximum. Proposition 5 and Proposition 6 specialize this to the second moment (variance) of beta_i, with the upper bound requiring an eigenvalue assumption (Assumption 9) that the smallest eigenvalue of the individual design matrix R’R is bounded away from zero. Proposition 7 derives lower and upper bounds for the CDF P(e’Bi <= c) using a two-step optimization that separates the support into two regions.

Q: What guarantees computational tractability of the optimization problems? A: Proposition 4 establishes that GL(lambda, w) is globally concave in lambda for every w, and GU(lambda, w) is globally convex in lambda for every w. This means the optimization problems for the lower and upper bounds are concave maximization and convex minimization problems respectively, which can be solved with standard convex optimization methods.

Q: How does the inference procedure for mean parameters handle overidentification and misspecification? A: In finite samples, the sample analog of the bound-width term D_hat_S can be negative, which would make the estimated bounds degenerate. The paper adopts Stoye’s (2020) approach using the smooth approximation s(x,y) = sqrt((xy + sqrt((xy)^2 + r^2))/2). The (1-alpha)-level confidence interval combines a standard bound-based interval with an interval for a pseudo-true parameter mu*_e, ensuring validity under both correct specification and mild overidentification or misspecification.

Q: How does this paper’s approach to inference on the variance and CDF differ from that for the mean? A: For the mean, closed-form bounds permit a straightforward delta-method asymptotic argument and explicit confidence intervals. For the variance and CDF, the paper uses the Andrews and Shi (2017) procedure for inference on a continuum of moment inequalities, constructing a test statistic TAS(theta) = sup_{lambda} max{sqrt(N)(mu_hat_GL - theta)/sigma_hat_GL, sqrt(N)(theta - mu_hat_GU)/sigma_hat_GU}^2, 0, with the confidence set being the set of theta values not rejected. This procedure is computationally more demanding but remains feasible.

Q: What are the main empirical findings from the PSID application? A: In both the RIP and HIP specifications extended to allow heterogeneous persistence rho_i, the estimated average earnings persistence E(rho_i) is significantly below 1. Both specifications produce similar mean-persistence estimates once rho_i heterogeneity is permitted, suggesting that the HIP vs. RIP misspecification debate may be less consequential when persistence itself varies across households. The identified sets for the variance of rho_i provide evidence of genuine unobserved heterogeneity in earnings persistence.

Q: What is the economic significance of heterogeneous earnings persistence? A: Heterogeneity in earnings persistence rho_i means households face different levels of earnings risk: a household with high rho_i experiences earnings shocks that are more persistent, reducing its ability to smooth consumption over time and strengthening its motive for precautionary savings. The paper argues this heterogeneity contributes directly to heterogeneity in consumption and savings behavior, making rho_i a first-order parameter in lifecycle consumption models such as those of Hall and Mishkin (1982), Blundell, Pistaferri, and Preston (2008), and Arellano, Blundell, and Bonhomme (2017).

Q: How does the paper situate itself relative to Guvenen (2007, 2009)? A: Guvenen showed that allowing for heterogeneity in the time trend of earnings (HIP: heterogeneous income profile) yields estimated persistence significantly below 1, whereas imposing no such heterogeneity (RIP: restricted income profile) yields persistence near 1. This paper generalizes both models by additionally allowing persistence itself to vary across households (rho_i). The finding that both HIP and RIP deliver similar E(rho_i) estimates significantly below 1 suggests that Guvenen’s contrast may be partly an artifact of restricting persistence to be homogeneous.

Q: What is the scope of the identification results? A: The results apply to short panels (small T, large N), accommodate discrete, continuous, and unbounded data, and require the idiosyncratic error epsilon_it to be mean-independent of the full history of strictly exogenous regressors and of the current history of predetermined regressors. The bounds for the mean are finite under finite moment conditions on the data. The bounds for the variance additionally require the eigenvalue assumption (Assumption 9). The paper notes that the results extend to probit and logit models with individual-specific coefficients, panel VAR models, and systems of panel data regressions, though these extensions are not developed in detail.

Dynamic random coefficient model: A linear panel data model in which both the intercept and slope coefficients are individual-specific (gamma_i, beta_i), the regressor is predetermined (sequentially exogenous rather than strictly exogenous), and T is small — so individual coefficient values cannot be estimated from the time series alone.

Partial identification: The property that a parameter of interest (such as E(beta_i)) cannot be consistently estimated from the data (it is not point-identified), but finite lower and upper bounds on its value can be characterized. The paper shows this is the generic situation for dynamic random coefficient models in short panels.

Dual representation of infinite-dimensional linear programs: The technique, following Galichon and Henry (2009) and Schennach (2014), of converting an infinite-dimensional linear programming problem (which arises when data or coefficients are continuous) into an equivalent dual problem that yields tractable closed-form or convex-optimization-based bounds.

Refined bounds (instrument-augmented bounds): Tighter identified sets for the mean parameter obtained by incorporating additional unconditional moment restrictions from instruments Sit, beyond the baseline moment conditions. These correspond to Proposition 2 and make the identification interval weakly narrower.

Sequential exogeneity (predetermined regressor): The assumption E(epsilon_it | gamma_i, beta_i, Zi1,…,ZiT, Xi1,…,Xit) = 0, which allows the regressor Xit (e.g., Yi,t-1) to be correlated with future errors but not current or past errors. This is weaker than strict exogeneity and is what makes the model dynamic and identification challenging.

Heterogeneous income profile (HIP) vs. restricted income profile (RIP): In Guvenen’s framework, HIP allows the time trend of earnings to vary across individuals (heterogeneous beta_i), while RIP does not. The paper extends both by also allowing the AR(1) persistence parameter rho to vary across individuals (rho_i), yielding an empirically more general earnings process.

Earnings persistence (rho_i): The individual-specific autoregressive coefficient in the lifecycle earnings process. High rho_i means earnings shocks last longer, increasing earnings risk, reducing the household’s ability to smooth consumption, and strengthening precautionary savings motives. The paper finds evidence that rho_i varies meaningfully across U.S. households in the PSID.

Identification of Time-Inconsistent Models: The Case of Insecticide-Treated Nets

Mon, 01 Jan 0001 00:00:00 +0000

This paper addresses two related problems: the formal identification of time-inconsistent preferences in dynamic discrete choice models with unobserved heterogeneous types, and the structural estimation of those preferences using data from a health intervention in rural Orissa, India. The identification challenge is fundamental — even the standard exponential discount factor delta is generically not identified in dynamic choice models (Rust 1994; Magnac and Thesmar 2002), and this non-identification extends a fortiori to the hyperbolic (beta, delta) parameterization. The paper’s first contribution is constructing identification conditions that overcome these results through two exclusion restrictions: a variable z that affects utility only through the perceived value of future states (played in the application by elicited beliefs about state evolution), and a variable r that acts as an imperfect signal of agent type but is uninformative about choices conditional on type.

The general model accommodates a finite but unknown number of agent types — time-consistent (beta=1), time-inconsistent naive (beta<1, unaware of future present-bias), and time-inconsistent sophisticated (beta<1, aware of future present-bias) — as well as sub-types within each class. The paper proceeds in four identification steps when types are unobserved: identifying the total number of types (via the rank of an observable matrix), recovering type-specific choice probabilities, assigning type identities, and recovering preference parameters. For time-consistent and sophisticated agents, both beta and delta are point-identified. For naive agents, the parameters are set-identified in general, with point identification available under a monotonicity condition (Assumption 14) or by imposing a common exponential discount factor across types (Assumption 15).

The empirical application studies demand for insecticide-treated nets (ITNs) and their periodic retreatment — a health-protective technology with low up-front cost but substantial future benefits — among households in malarious areas of rural Orissa. A key design feature is that households were offered either a standard ITN contract (with the option to purchase retreatment later) or a commitment contract bundling two consecutive retreatments, allowing the commitment product choice to serve as a noisy type signal r. Elicited beliefs about future state variables serve as the excluded z variable.

The main empirical findings are: approximately 21% of the population is time-consistent, 49% are naive time-inconsistent, and 30% are sophisticated time-inconsistent — so time-inconsistent agents account for approximately 79% of the sample. The preferred estimates of the hyperbolic parameter beta are 0.16 for naive agents and 0.08 for sophisticated agents, indicating substantial present-bias in both groups. These estimates of the population type distribution and type-specific beta parameters are described as new to the literature.

A counterfactual exercise quantifies the welfare cost of present-bias: the median undiscounted additional expected total cost of malaria during the study period attributable to under-investment in ITNs exceeds the price of a treated net by a factor of approximately six. However, because time-inconsistent households heavily discount future malaria costs, the discounted total costs of malaria are low for many inconsistent agents relative to the ITN price, explaining low demand from the agents’ own subjective perspective. The paper also finds that commitment products are not disproportionately chosen by sophisticated agents — take-up of the commitment contract is actually higher among naive households — contradicting the deterministic mapping from commitment product purchase to sophistication that is commonly assumed in the literature. Finally, differences in per-period utilities across agent types exist but are not substantively important in explaining differential outcomes in the sample.

Q: What is the core identification problem the paper addresses, and why is it hard? A: Even the standard exponential discount factor delta is generically not identified in dynamic discrete choice models (Rust 1994; Magnac and Thesmar 2002). This non-identification extends a fortiori to both beta and delta in the hyperbolic (beta, delta) model. When agents are also heterogeneous in unobserved type, the additional problem of identifying the population distribution of types — itself a key policy parameter — must be solved jointly with preference identification.

Q: What two exclusion restrictions provide the key identifying variation? A: The first restriction is a variable z that affects utility only via the perceived value of future states but not per-period utility (Assumption 3); in the application this is played by elicited subjective beliefs about future state evolution. The second is a variable r that predicts agent type but, conditional on type and observables, provides no additional information about choices (Assumption 16); in the application r includes elicited time-preference indicators and the choice of the commitment versus standard ITN contract.

Q: Why does the paper require at least three periods? A: Three periods are the minimum required to capture the notions of time-inconsistency studied here: with only two periods, no time-inconsistency problem would arise. Three periods allow the researcher to separately observe how an agent plans in period 1, how the agent actually behaves in period 2 (potentially deviating from the period-1 plan), and how the agent behaves in the terminal period 3 where the problem reduces to a static discrete choice.

Q: What is point-identified versus set-identified across agent types? A: For time-consistent agents, all per-period utilities and the (single) discount factor delta are point-identified. For sophisticated agents, both beta and delta are separately point-identified under the rank conditions in Assumptions 10-11. For naive agents, the parameters are in general only set-identified (Lemma 4 provides sharp bounds); point identification holds under either a monotonicity condition (Assumption 14) or the assumption that naive and sophisticated agents share the same exponential discount factor (Assumption 15).

Q: How does the paper identify the total number of types in the population? A: The number of types equals the rank of a directly identified matrix P formed from the joint distribution of actions and states in adjacent time periods (Proposition 1). The rank provides a lower bound in general and equals the true number of types when the state space is sufficiently rich and type-specific choice probabilities vary sufficiently across the state space (Assumptions 17 and 19).

Q: How does the paper distinguish naive from sophisticated agents among the identified type-specific choice probabilities? A: A key diagnostic is the function delta_hat_tau(x2,z2), which compares an agent’s period-1 view of the future against what would be expected given period 2-3 choices. For time-consistent and sophisticated agents, this function is constant across the state space (x2,z2); for naive agents it varies across the state space (Lemma 7, Proposition 2). This variation arises because naive agents incorrectly anticipate their future behavior in period 1, generating a wedge between planned and actual continuation values that shifts with the state.

Q: What fraction of the sample is time-inconsistent, and what are the estimated beta parameters? A: Approximately 79% of the sample is time-inconsistent: 49% are naive and 30% are sophisticated. The preferred estimates of the hyperbolic (present-bias) parameter beta are 0.16 for naive agents and 0.08 for sophisticated agents. Both estimates indicate substantial present-bias. The paper states that these estimates of the population type distribution and the type-specific beta values are new to the literature.

Q: What is the welfare cost of present-bias in terms of malaria risk? A: Present-bias leads to lower ITN purchases and fewer retreatments, which increases the likelihood of contracting malaria. The median undiscounted additional expected total cost of malaria during the study period attributable to under-investment in ITNs exceeds the price of a treated net by a factor of approximately six. However, because inconsistent agents heavily discount future health costs, the discounted total costs of malaria are low relative to the ITN price for many such agents, which explains low demand from the agents’ own subjective perspective despite large social costs.

Q: What does the paper find about commitment products and agent sophistication? A: The commitment contract — bundling two consecutive retreatments — was designed to appeal to sophisticated present-biased agents who anticipate their future self-control problems. Contrary to the deterministic mapping from commitment product purchase to agent sophistication commonly assumed in the literature, take-up of the commitment contract is actually higher among naive households than sophisticated ones. The paper argues this is possible because the model allows commitment product choice to only imperfectly predict type, enabling a richer analysis than prior work that rules out type heterogeneity by assumption.

Q: Are differences in per-period utilities across types an important alternative explanation for observed behavior? A: Per-period utilities do vary across agent types, but the paper finds they are not substantively important in explaining differential outcomes in the sample. This finding supports the interpretation that time-inconsistent preferences — rather than heterogeneity in static preferences over states — are the primary driver of the behavioral differences observed across agent types in this context.

Q: What is the role of elicited beliefs in the identification strategy? A: Elicited beliefs about the future evolution of state variables serve as the excluded variable z that shifts the forward-looking component of the value function while leaving per-period utility unchanged. The use of expectational data, as advocated by Manski (2004), provides a natural and interpretable source of identifying variation for the discount parameters. The paper argues that this plausible exclusion restriction contributes to the encouraging Monte Carlo simulation results relative to other work in the identification literature.

Q: What happens to identification under partial sophistication? A: When agents are partially sophisticated — aware of some but not all of their future present-bias, so that beta_tilde in [beta, 1] rather than exactly equal to beta or 1 — the three time-preference parameters (delta, beta, beta_tilde) are not point-identified in general (Proposition 4 provides a set identification result). Point identification requires that the exponential discount factor delta be identified separately. The paper shows that partial and complete sophistication can be distinguished from time-consistency by whether the function delta_hat varies across the state space, and partially sophisticated types can be distinguished from fully sophisticated types under an additional variability condition (Assumption 23, Proposition 3).

Hyperbolic (beta-delta) discounting: A model of time-inconsistent preferences in which future utility at time s discounted from time t carries the factor beta*delta^(s-t), where beta<1 introduces an additional present-bias relative to pure exponential discounting. The parameter beta governs the wedge between the discount rate applied to immediate versus purely future tradeoffs; delta governs the intertemporal rate of substitution between any two future periods.

Sophisticated vs. naive agents: Both types are time-inconsistent (beta<1) and both are aware of their current present-bias. Sophisticated agents (tau_S) also correctly anticipate the extent of their future present-bias (beta_tilde = beta), while naive agents (tau_N) incorrectly believe their future self will behave as if beta_tilde = 1. This difference in beliefs about future behavior drives distinct choice dynamics across the three periods, providing the key observable variation used to distinguish the two types.

Exclusion restriction (z variable): A state variable that enters the transition probabilities and thus the value of future states but does not enter the current per-period utility function (Assumption 3). Variation in z shifts the forward-looking component of the Bellman equation while holding current utility fixed, providing the identifying variation needed to separately recover discount parameters from per-period utility parameters.

Type indicator / type proxy (r): An observed variable that is informative about an agent’s time-preference type but, conditional on type and other observables, provides no additional information about choices (Assumption 16). In the application, r includes elicited time-preference indicators and whether the agent chose the commitment versus standard ITN contract. Critically, the mapping from r to type is imperfect, so r does not directly reveal type for each individual.

Conditional choice probability (CCP) inversion: Following Hotz and Miller (1993), the type-specific conditional choice probabilities P_tau(a_t|x_t, z_t) — directly identified from data given type — can be inverted to recover per-period utility differences and combinations of discount parameters without solving the full dynamic programming problem. This approach underpins the constructive identification arguments throughout the paper.

Commitment contract: A product design in which two consecutive ITN retreatments are bundled at purchase, intended to mitigate the time-inconsistency problem by removing the future self-control decision about retreatment. The commitment contract is theoretically predicted to be preferred by sophisticated present-biased agents; the paper finds this prediction fails empirically, with naive households showing higher take-up.

Present-bias welfare cost: The undiscounted additional expected total cost of malaria attributable to under-investment in ITNs driven by present-bias. The paper estimates this cost exceeds the price of a treated net by a factor of approximately six at the median, capturing the gap between the social planner’s valuation of ITN adoption and the discounted valuation of time-inconsistent agents.

Illiquid Lemon Markets and the Macroeconomy

Mon, 01 Jan 0001 00:00:00 +0000

The paper develops a quantitative capital-accumulation model in which capital trades in illiquid markets with asymmetric information — sellers know the quality of their capital but buyers do not. It combines this model with microdata on nonresidential capital units listed for trade to measure the degree of information asymmetry and quantify its macroeconomic effects.

Model: The economy features heterogeneous capital units characterized by observed quality ω (e.g., size, location, age — observable to both buyers and sellers) and unobserved quality a (known only to the seller). Capital trades in directed-search markets: sellers post a price and a target submarket; buyers direct their search; a matching function determines trade probabilities. Buyers observe announced quality and have an inspection technology that reveals true quality with probability ψ (“lemon detection probability”); with probability 1−ψ a low-quality unit goes undetected. In equilibrium, sellers of high-quality capital signal their type by listing at higher prices and accepting lower trading probabilities (the Guerrieri-Shimer-Wright 2010 competitive search separating equilibrium, adapted to the capital accumulation setting). The key model prediction is that the residual price — the component of a listed price orthogonal to observed characteristics — is positively correlated with duration on the market, with the slope increasing as the degree of asymmetric information (1−ψ) rises.

Data: Idealista, Spain’s largest online real estate platform, provides monthly listings for all nonresidential structures (retail, office, and industrial space) listed for sale from 2005 to 2018 — approximately 8.9 million property-month observations from over 1.15 million distinct capital units. The average listed price per square foot is $162 (2017 dollars); the average duration on the market is 10.5 months; each listing receives on average 800 views, 45 clicks, and 3 emails per month from prospective buyers.

Empirical facts (Section 4): Two cross-sectional regularities confirm the model’s predictions:

Predicted price (from a hedonic regression on observable characteristics) is negatively correlated with duration — units with better observable characteristics sell faster, consistent with full-information competitive search (higher buyer valuation → higher matching rate)
Residual price (orthogonal to observables) is positively correlated with duration — estimated slope coefficient ŷq ≈ 0.148 — consistent with asymmetric-information signaling (high-quality capital sellers post high residual prices to separate from low-quality sellers, accepting lower trading probabilities)
The residual-price/duration slope exhibits strong countercyclical variation, roughly doubling during the Euro crisis (peak slope ≈ 0.38, compared to baseline ≈ 0.148), consistent with asymmetric information worsening during downturns

Calibration (monthly frequency, Table 4 fixed; Table 5 fitted):

Fixed parameters: β = 0.9966 (annual rate of time preference 4%), α = 0.35 (capital share), δ = 0.0074/month (8.5% annual nonresidential depreciation), γ = 1.004 (1.6% annual TFP growth), γn = 1.0027 (1% annual population growth), ϕ = 0.0027 (3.2% annual firm exit rate), η = 0.8 (matching curvature), φ = 0.5 (seller bargaining power)
Fitted to four data moments (slope ŷq, SD of predicted prices, SD of residual prices, mean duration): ψ = 0.9795 (probability a lemon goes unnoticed = 2% per inspection); σω = 0.72 (SD observed quality); σa = 0.58 (SD unobserved quality); m̄ = 0.267 (matching efficiency)
Model-simulated moments match targets essentially exactly (Table 5); untargeted relationship between duration and predicted prices is also well-matched (Table 6)

Steady-state output effects (Table 7, relative to full-information benchmark):

Total output: −1.22% in baseline (ψ = 0.9795)
Effective capital input: −2.55% (main driver of output loss)
Capital stock: −1.12% (32% of output effect — reduced returns to producing new capital)
Capital unemployment rate: +1.0 pp above full-information rate of 5% (25% contribution — high-quality capital remains listed longer)
Allocation channel: 16% contribution — information asymmetries disproportionately reduce trading of high-quality capital, lowering average quality of employed capital
Labor input: −0.5% (26% contribution — reduced capital input lowers labor demand)
Moving to full information (ψ → 1): output gain of +1.5% — modest at baseline, indicating the baseline economy is not far from full information
Moving to Euro-crisis level (ψ = 0.96): output decline of ~2% — large response because the economy’s output elasticity to ψ is high

Crisis experiment (Section 5.3): An unexpected 2 percentage-point decline in ψ (to 0.96, calibrated to match the observed increase in the residual-price/duration slope during the Euro crisis), lasting 3 years and reverting with persistence ρψ = 0.94:

Output contraction on impact: 2%
Time to recover half the output decline: more than 5 years (slow recovery driven by persistent capital underinvestment)
Primary mechanism: lower inspection accuracy → high-quality capital sellers reduce trading probability to signal quality → capital unemployment rate rises (especially for high-quality units) → expected return to producing new capital falls → investment contracts → capital input declines persistently
Secondary interaction: at higher steady-state asymmetric information (ψ = 0.96), other shocks (TFP, exit rate, discount factor) are amplified — e.g., the cumulative output response to an exit rate shock is 26% larger than in a full-information economy

Scope conditions: The model abstracts from aggregate uncertainty (the baseline is steady-state analysis), financial intermediaries, and endogenous information technology. The dataset covers Spain’s nonresidential real estate market 2005–2018; the measurement of ψ from listed prices and duration assumes that residual prices fully reflect unobserved capital quality (Proposition 5’s small-search-cost approximation). The quantitative results are robust to alternative bargaining protocols (TIOLI), higher firm exit rates, inelastic labor supply, and narrower observable-characteristic sets.

In depth

Q1. Why does asymmetric information generate a positive correlation between residual prices and duration?

In the model’s separating equilibrium, sellers of high-quality capital choose prices and targeting strategies that prevent low-quality sellers from mimicking them; since low-quality sellers have a lower marginal cost of accepting lower trading probabilities (their capital is worth less to them in continued use), high-quality sellers can separate by listing at higher residual prices paired with lower market tightness and lower matching rates. The correlation between residual price and duration is therefore a direct measure of the degree of asymmetric information: the slope coefficient ŷq increases monotonically as ψ decreases (Proposition 5 and Figure 4), allowing the researcher to back out ψ from the micro data.

Q2. Why is the residual-price/duration slope countercyclical?

The data show that the slope roughly doubled during Spain’s 2008–2013 downturn and euro crisis, consistent with the model’s prediction that asymmetric information (1−ψ) worsens during economic contractions. The paper interprets this as evidence that buyers’ ability to evaluate capital quality deteriorates when economic uncertainty rises — for example, during crises it is harder to assess the profitability of retail or office space based on observable characteristics alone. This countercyclical pattern motivates the crisis experiment in Section 5.3, where a 2pp increase in 1−ψ (the degree of information asymmetry) replicates the observed slope dynamics.

Q3. Why is the 2% crisis output contraction slow to recover?

The sluggishness of recovery operates through the investment channel: when high-quality capital sellers reduce trading probabilities to signal their type, they slow the transfer of used capital from sellers (firms that exit) to buyers (firms that expand), reducing the effective capital input; this lower capital input reduces the expected marginal return to producing new capital, depressing investment; because capital accumulates gradually, the output recovery inherits the slow pace of investment recovery. The persistence parameter ρψ = 0.94 (monthly) adds further sluggishness from the slow normalization of the information environment itself.

Q4. Why are the steady-state output losses modest while the crisis response is large?

The economy features a moderate baseline degree of asymmetric information (ψ = 0.9795 — only 2% lemon-detection failure), so the steady-state distortion is small (−1.22% output relative to full information); however, the economy has a large elasticity of output to ψ, so even a small deterioration in information quality (2pp) generates large output effects (−2%). This high sensitivity arises because the effects of asymmetric information are highly nonlinear: at low levels of information frictions, small increases in the lemon probability generate proportionally large increases in the required signaling by high-quality sellers, sharply reducing their trading probabilities.

Q5. How does asymmetric information interact with other shocks?

At the baseline degree of asymmetric information (ψ = 0.9795), the aggregate responses to standard shocks (TFP, discount factor, exit rate) are similar to an economy with full information; however, at the Euro-crisis level (ψ = 0.96), the cumulative output response to an exit rate shock is 26% larger than under full information. The mechanism is that asymmetric information taxes the reallocation of capital: when more capital must be reallocated (due to higher firm exit), more of it passes through the illiquid, distorted lemon market, amplifying the output effect of the underlying shock.

Q6. What policies can reduce the distortions from asymmetric information?

The paper notes two broad policy directions: (1) policies that improve information transparency — making previously private capital characteristics public, e.g., mandatory disclosure or standardized quality certification — directly raise ψ and shift the economy toward full information, eliminating the signaling distortion; (2) policies that reduce the incentive for mimicking — for example, by allowing post-transaction renegotiation after quality is revealed (the TIOLI bargaining extension in Table 8) — have similar quantitative effects to the baseline. The paper leaves the welfare analysis of specific information-provision policies for future research.

Q7. What is the role of the data in identifying the model parameters?

The four targeted moments — slope of duration on residual prices, standard deviation of predicted prices, standard deviation of residual prices, and mean duration — jointly identify the four structural parameters {ψ, σω, σa, m̄} (Proposition 5); the key insight is that ψ and m̄ are separately identified because ŷq and mean duration respond differently to each: ψ and m̄ both affect ŷq positively, but m̄ reduces mean duration while ψ increases it, providing orthogonal variation. The calibration achieves an essentially exact match of the four targeted moments (Table 5) and also matches the untargeted negative slope between duration and predicted prices (Table 6), providing an overidentification check.

Key concepts

lemon market : a secondary market for heterogeneous assets in which sellers have private information about quality; following Akerlof (1970), lemons (low-quality assets) crowd out high-quality assets unless high-quality sellers can credibly signal their type; in the paper, signaling takes the form of higher listed prices paired with lower trading probabilities.

residual price : the component of a capital unit’s listed price orthogonal to its observable characteristics (the residual from a hedonic regression); the paper’s key empirical variable, theoretically shown to be positively correlated with unobserved capital quality and with duration under asymmetric information.

inspection technology : a buyer’s technology that reveals the true quality of a capital unit with probability ψ before (or after) purchase; the accuracy ψ governs the degree of asymmetric information in the economy — lower ψ implies worse information, requiring more costly signaling by high-quality sellers.

countercyclical asymmetric information : the empirical finding that the slope between residual prices and duration roughly doubles during the Euro crisis, interpreted as deterioration in buyers’ ability to evaluate capital quality during economic downturns; motivates the crisis experiment.

three channels of output loss : the three mechanisms through which asymmetric information reduces output: (i) lower capital stock (reduced investment incentives); (ii) higher capital unemployment rate (high-quality capital remains listed longer); (iii) adverse allocation effect (high-quality capital trades less frequently, lowering average quality of employed capital).

Income Inequality and Job Creation

Mon, 01 Jan 0001 00:00:00 +0000

The paper establishes a causal link from rising top income shares to reduced net job creation at small firms, working through a bank funding channel rooted in non-homothetic household portfolio allocation: because high-income households hold a smaller fraction of financial wealth in bank deposits (less than one-fifth for the top decile versus two-thirds for the bottom quintile, per the Survey of Consumer Finance), a redistribution of income toward top earners shifts aggregate saving away from deposits toward stocks and bonds. Banks must raise deposit rates to retain funding, which passes through to loan rates; since small, informationally-opaque firms depend disproportionately on bank credit while large firms have direct capital-market access, higher loan rates compress small firms’ net job creation relative to large firms. Using U.S. state-level panel data from 1981 to 2015, a shift-share instrumental variable, and a quantitative general equilibrium model, the paper documents this channel and finds it accounts for 13% of the 4.97 percentage-point rise in large-firm employment share and between 7.5% and 15% of the decline in the labor share since 1980.

Motivating facts (Section 2):

The U.S. net job creation rate of small firms (1–499 employees) declined from roughly +4% in 1980 to near 0% by 2015 and co-moves strongly with the top 10% income share (Figure 1a), suggesting a systematic relationship
SCF data show that the deposit share of financial wealth falls monotonically with income: bottom quintile (Q1) ≈ 65–70%; middle quintile ≈ 45%; top decile < 20% (Figure 2a). Non-financial wealth and stocks/bonds rise sharply with income
FDIC data show deposits account for 93% of total liabilities for the average bank and 75% of total liabilities on aggregate (Figure 2b); average bank raises 98% of deposits in its headquarters state (capital-weighted: 89%), so local deposit supply directly constrains local bank credit

Empirical specification (Section 3): Panel regression at the state–firm-size–year level, 47 states, 1981–2015, 16,435 observations. Dependent variable: net job creation rate (JCR − JDR). Key regressor: interaction of the top 10% income share with a “small firm” dummy (firms 1–499 vs. 500+). Regression includes state–firm-size fixed effects and state–time fixed effects, the latter absorbing all time-varying unobservable state-level factors common to firms of different sizes (e.g., globalization, technology). Identification via a pre-determined share IV: each state’s top 10% income share in 1970 (ten years before the sample) interacted with the leave-one-out national trend in top income shares — exploiting cross-state variation in sensitivity to the aggregate national trend while isolating it from local cyclical conditions.

Empirical results (Table 1, Table 2):

IV estimate: a 10 percentage-point rise in the top 10% income share reduces the relative net job creation rate of small firms by 1.2 percentage points (Table 1, col. 3)
Extensive margin (entry, exit, private-to-public transitions): accounts for approximately 20% of the 1.2pp effect (Table 1, col. 4)
One standard deviation higher top income share (5.4pp) → 0.7pp lower small-firm net JCR (Figure 1b, binned scatter OLS preview)
Counterfactual: had the U.S. top 10% income share remained at its 1980 level (instead of rising ~16pp from 34.5% to 50.5%), small firms’ net job creation rate would be 1.9 percentage points higher — more than 50% above its 2015 level
Bank-level regressions (Table 2): rising top income shares in a bank’s headquarters state lead to higher deposit rates and lower total deposit volumes — consistent with banks raising rates to retain a declining deposit supply

Model (Section 4): General equilibrium model with two types of households and two types of firms. Households differ by income group (high, H, and low, L), each endowed with heterogeneous productivities {si,χ}; households choose consumption, labor supply, and portfolio allocation between bank deposits (providing liquidity services captured by a CES deposit utility term ψd·η) and direct capital investment in public firms. Non-homotheticity: the deposit utility weight is calibrated so high-income households hold fewer deposits per unit of wealth. Firms are either public (large, direct capital-market access, production function with capital share θ and returns to scale γ) or private (small, bank-dependent; labor-only production with bank working capital constraint ϕ̃ governing the loan demand; entry/exit governed by stochastic fixed cost f̃ ~ U[0,f̃max] and a cost of going public κ ~ U[0,κ̃max]). Banks intermediate deposits into loans at a fixed cost, implying a zero-profit loan rate above the deposit rate.

Calibration (Table 3): Two panels:

Panel (a) externally fixed: capital depreciation rate (NIPA), mean US stock market return = 1.08, top 10% income share target = 34.6% (initial, Frank 2009 data), deposit rate = 4% (national average)
Panel (b) internally calibrated to BDS and SCF (early 1980s):
- Labor supply to public firms = 46.9%; private firms = 53.1% (BDS baseline)
- Labor demand to public firms = 46.9%; private firms = 53.1% (matched exactly)
- Deposit share of Q3 household = 0.45; top 10% deposit share = 0.22 (SCF)
- Household discount factor β = 0.9182; deposit utility scale ψd = 0.0632; deposit utility elasticity η = 2.6096
- Capital share in public firms θ; returns to scale γ set to match labor demand targets
- Firm productivity SD σz = 0.0315; bank dependence ϕ̃ and fixed cost bound f̃max matched to Table 1 empirical estimates (intensive and extensive margin); public-share cost bound κ̃max matched to share of firms >500 employees (BDS)

GE experiment (Section 6): Top 10% income share raised permanently from 34.5% to 50.5%, matching Frank (2009) data evolution, via lump-sum transfers from low- to high-income households (holding average income constant to isolate the portfolio reallocation channel). Key aggregate outcomes (Figure 3):

Aggregate deposits fall by more than 2%; savings flow into public firm capital, which rises 2% — the portfolio reallocation effect in levels
Deposit rate rises 0.4pp; loan rate rises 0.7pp; public firm capital return falls 0.14pp — consistent with bank-level empirical estimates
Private firm employment falls ~2%; public firm employment rises ~1%; aggregate employment falls modestly
Private firm employment share falls 0.64 percentage points — the channel explains 13% of the actual 4.97pp BDS decline in employment at firms below 500 employees (1980–2015)
Around one-fifth of the employment share decline comes from the extensive margin (private firm exit and transitions to public status), matching the empirical ratio
Labor share falls 0.3pp, explained by public firms growing relatively larger and being more capital-intensive; this accounts for 7.5% to 15% of the observed 2–4pp decline in the US labor share
Aggregate output falls 0.3%, driven by resource reallocation: private firms have marginal product of labor roughly one-sixth higher than public firms (consistent with the higher small-firm net JCR coefficient), so shifting employment to public firms suppresses aggregate productivity

Welfare effects (Section 6.2, Figure 4): The top 10% experience an increase in consumption-equivalent welfare; bottom 90% experience a decrease. The full model amplifies both effects relative to a counterfactual model with fixed portfolio shares: portfolio reallocation raises top-earner welfare by an additional ~1% (consumption equivalent) relative to the fixed-share benchmark and lowers bottom-earner welfare by ~1% — because in the full model, private firm wages fall (loan rate rise reduces labor demand) while in the fixed-share benchmark private firm wages rise (tops save more deposits, lowering loan rates). Ignoring portfolio heterogeneity thus significantly understates the welfare consequences of income redistribution.

Scope conditions: The mechanism operates through portfolio reallocation only; the paper holds average income constant (lump-sum redistribution) to isolate the channel, abstracting from any direct effects of rising incomes on aggregate savings rates. The IV exploits state-level variation in top income shares; cross-state spillovers in bank credit markets would attenuate estimated coefficients. The model assumes banks cannot replace lost deposits one-for-one with non-deposit liabilities, consistent with institutional frictions documented in the banking literature (Stein, 1998; Hanson et al., 2015). The analysis covers pre-tax income shares; post-tax redistribution through the tax code would dampen the mechanism.

In depth

Q1. Why does the portfolio composition of saving matter more than the aggregate savings rate?

The key non-homotheticity is in the composition of saving, not the level: high-income households allocate less than one-fifth of financial wealth to bank deposits while low-income households allocate two-thirds; as income shifts to the top, total deposits decline even if aggregate saving rises modestly. Banks cannot substitute deposit funding with non-deposit liabilities without cost — deposits provide cheap, stable funding because of their unique liquidity and monitoring properties (Stein, 1998; Hanson et al., 2015). An increase in the deposit rate is thus the equilibrating mechanism: banks must bid deposits back from higher-return assets, and the higher funding cost passes through to loan rates.

Q2. Why are small firms disproportionately harmed by higher loan rates?

Small, informationally-opaque firms rely on bank credit for external finance — 92% of small firms in the 1993 National Survey of Small Business Finances use bank loans — while large public firms can raise equity and bonds directly, bypassing banks entirely. When loan rates rise, small firms face a tighter credit constraint on their working capital and fixed costs of operation; the higher loan rate simultaneously reduces their demand for bank credit and raises the value of exiting or transitioning to public status (reducing the private-firm fixed cost burden). Large firms, by contrast, experience lower financing costs as the capital return falls and equity markets absorb more saving — amplifying the relative job creation gap.

The IV uses each state’s top 10% income share in 1970 — ten years before the sample begins, when income shares were flat nationally — interacted with the leave-one-out national trend; any factor driving both job creation outcomes and income inequality in a state would need to have affected firms of different sizes within that state in the same direction as the national trend, while also having had no such effect in all other states. The instrument’s validity rests on: (i) national income share trends after 1980 being driven by aggregate forces (technology, globalization) exogenous to any single state’s labor market; (ii) the pre-1980 period showing no systematic co-movement between state income shares and subsequent employment trends; and (iii) robustness to excluding industries that account for a large share of a state’s employment (Table OA4).

Q4. What explains the aggregate output decline when private firms have higher marginal products?

The output decline of 0.3% arises because the reallocation from private (higher marginal product) to public (lower marginal product) firms outweighs the positive capital accumulation effect: as more saving flows into public firm equity/capital, output would rise, all else equal — but the capital stock increase is modest and aggregate savings rise only slightly, so the dominant effect is misallocation. The marginal product gap between private and public firms is not an assumption of the model but a calibration consequence: matching the empirical estimate that small firms’ net JCR responds more to loan rate changes (Table 1) requires their marginal product to be higher, generating the misallocation loss when resources shift toward large firms.

Q5. How does rising inequality amplify its own effect through welfare and further portfolio reallocation?

In the full model with heterogeneous portfolios, the redistribution from low- to high-income households directly reduces aggregate deposits (because the recipients hold fewer deposits per dollar), which raises deposit and loan rates, which lowers wages at private firms, which further reduces low-income households’ labor income. This GE feedback loop — portfolio composition → bank rates → wages → income distribution → portfolio composition — amplifies the initial redistribution effect by approximately 1 percentage point of consumption-equivalent welfare compared to a model in which households are forced to hold fixed portfolio shares. In the fixed-portfolio model, tops invest more in deposits when they receive transfers, partially offsetting the deposit supply decline, and private firm wages rise — the opposite of the full model.

Q6. What fraction of US macroeconomic trends since 1980 can the channel explain?

The channel accounts for 13% of the 4.97pp rise in large-firm employment share, 7.5–15% of the 2–4pp fall in the aggregate labor share, and a 0.3% output loss from resource misallocation — meaningful but partial contributions to trends that are multi-causal. The partial contributions reflect that rising income inequality is one of several forces driving these trends (technology adoption, trade, market concentration, capital-skill complementarity); the paper explicitly abstracts from these other forces by using lump-sum transfers that hold average income constant, isolating the portfolio reallocation channel alone.

Q7. What happens to firm entry and exit under rising inequality?

A higher loan rate raises the effective cost of operating as a private firm (working capital is more expensive), reducing the threshold productivity level below which private firms exit and raising the threshold above which private firms find it worthwhile to incur the IPO-type cost of going public; both margins reduce the number of private firms in equilibrium, consistent with declining business dynamism. The model implies approximately one-fifth of the employment share decline at small firms comes from this extensive margin — closely matching the data decomposition from the BDS — and the public firm share rises by 0.003pp, consistent with the small but positive trend in the share of large-firm establishments observed in the data.

FDIC data show deposits represent 93% of average bank liabilities and 75% of aggregate bank liabilities; banks rely on their headquarters-state deposit base for the vast majority of funding because regulatory and institutional frictions constrain inter-state deposit gathering — even the four largest US banks (JP Morgan, Citi, Wells Fargo, Bank of America) raise over 70% of deposits in their headquarters state. The literature (Stein, 1998; Jakab and Kumhof, 2015) establishes that deposits provide uniquely stable, cheap funding that cannot be replaced at equivalent cost by wholesale liabilities or interbank borrowing; any substitution requires costly premium over the deposit rate, implying the attenuation bias if anything understates the true causal effect on loan rates.

Key concepts

non-homothetic deposit preference : the empirical regularity that the share of financial wealth allocated to bank deposits declines with income — two-thirds for the bottom quintile, under one-fifth for the top decile; this non-homotheticity means that a mean-preserving income redistribution toward top earners reduces the aggregate deposit supply relative to total saving, the paper’s foundational portfolio channel.

pre-determined share IV : the paper’s instrumental variable for state-level top income shares: each state’s 1970 top 10% income share interacted with the leave-one-out national trend in top 10% shares; identifies causal effects by exploiting differential state sensitivity to national inequality trends, purged of local cyclical factors and large-firm wage premia.

private versus public firm : the model’s key firm heterogeneity; private firms are small, bank-dependent (working capital constrained), and pay fixed operating costs; public firms are large, equity-financed, and face no bank credit constraint. The intensive-margin effect of higher inequality (rising loan rates) and extensive-margin effect (higher exit rates, more IPO transitions) both compress the private firm employment share.

deposit rate pass-through : the mechanism by which a decline in aggregate deposit supply forces banks to raise deposit rates to retain funds; the higher deposit rate is passed through to loan rates via the bank’s zero-profit condition, raising the cost of credit for bank-dependent private firms by approximately twice the deposit rate increase (0.7pp loan rate rise for 0.4pp deposit rate rise in the model).

business dynamism channel : the extensive margin of the paper’s mechanism — rising top income shares increase loan rates, which increase private firm exit rates and the rate of private-to-public firm transitions, reducing firm entry and contributing to documented trends of falling startup rates and declining business dynamism in the US since 1980.

Income taxation across countries

Mon, 01 Jan 0001 00:00:00 +0000

The paper provides the most comprehensive cross-country empirical characterisation of effective income tax functions to date, estimating the two-parameter log-linear tax function — pioneered by Feldstein (1969) and applied in structural macroeconomics by Heathcote, Storesletten, and Violante (2017) — for over thirty countries across approximately four decades using harmonized household microdata from the Luxembourg Income Study (LIS). The log-linear function fits income tax systems worldwide with median R² of 0.984 (mean 0.976), extending a finding previously known mainly for the United States to essentially all LIS countries. Five main facts emerge. First, income tax progressivity (τ) and average tax level (λ) are positively correlated across countries: Northern European countries with the highest average tax rates — Belgium, Netherlands, Germany, Finland — also have the highest progressivity; countries such as Brazil, Colombia, Peru, and the Republic of Korea exhibit effectively flat income taxes (τ near zero or negative) despite progressive statutory codes, because actual enforcement and effective coverage are limited. Second, progressivity increases with economic development: richer countries systematically operate more progressive income tax systems, consistent with greater institutional capacity to enforce income taxation. Third, progressivity differs significantly by family structure: married couples with children face the highest progressivity across countries, single households without children the lowest, reflecting child tax credits, joint filing rules, and other family-based provisions. Fourth, the United States ranks toward the lower end of progressivity among high-income countries, with τ ≈ 0.046 in 2010; Belgium, Finland, Germany, Iceland, Ireland, the Netherlands, and Spain are more than twice as progressive as the US. Fifth, transfers account for most redistribution: the combined tax-and-transfer system’s progressivity substantially exceeds that of income taxes alone, indicating that analyses focusing solely on income tax progressivity understate total redistributive effort.

In depth

What is the log-linear tax function and why does the paper adopt it for cross-country comparison?

The log-linear tax function expresses post-tax income as T(y) = λy^(1−τ) + (1−λ)y, equivalent to log(y − T(y)) = α + (1−τ)log(y), where τ measures progressivity (τ > 0: marginal rates rise with income; τ = 0: flat tax) and λ captures the average tax level. The function is attractive because it (a) is used widely in structural macro models, enabling direct calibration from these estimates; (b) can be estimated consistently from microdata with just two parameters; (c) permits clean cross-country and over-time comparisons. A richer functional form would sacrifice the comparability across 30+ countries and 40 years of data.

How well does the log-linear function fit income tax systems across all countries in the sample?

Very well. Across all 200+ country-wave regressions, the median R² is 0.984 and the mean is 0.976. The fit is robust to different income definitions, imputation methods, and country-specific data sources. This extends the well-known finding for the United States (HSV 2017) to countries with very different income tax structures, suggesting the log-linear form is an adequate empirical approximation to real-world progressive tax schedules worldwide.

What is the cross-country pattern of progressivity in 2010?

Spain (τ ≈ 0.157), Belgium (τ ≈ 0.139), and the Netherlands (τ ≈ 0.127) have the most progressive income taxes in 2010. The Republic of Korea (τ ≈ −0.006) is slightly regressive in effective terms, along with Peru (τ ≈ 0.013) and other low-income countries where income tax coverage is limited. The United States has τ ≈ 0.046, placing it toward the lower end of progressivity among developed countries. In terms of the Progressivity Tax Wedge (PTW) — how much marginal tax rates rise between the average income earner and one at twice the average — Belgium, Finland, Germany, Iceland, Ireland, the Netherlands, and Spain are more than twice as progressive as the US.

How does income tax progressivity relate to economic development?

The paper documents a systematic positive relationship: richer countries (measured by median income, mean income, or GDP per capita) have more progressive income tax systems. Low-income countries like Peru and Guatemala collect most revenue through goods and services taxes and exhibit low income tax progressivity; high-income Northern European countries have both high tax capacity (the institutional ability to enforce income taxation) and high progressivity. This complements the tax capacity literature and suggests that the development-progressivity link operates through institutional channels, not solely through political demand for redistribution.

How does family structure affect income tax progressivity?

Estimated separately for four household types — single without children, single with children, married without children, married with children — progressivity is consistently highest for married couples with children and lowest for single households without children. This pattern holds across countries and over time, reflecting child tax credits, joint filing rules, and other family-based tax provisions that steepen the effective marginal tax schedule. The paper quantifies this heterogeneity by family type, filling a gap in cross-country comparisons that typically focus on single households without children.

What do transfers add to the redistributive picture, and what is the implication for welfare analysis?

When estimating a combined tax-and-transfer function (post-tax-and-transfer income regressed on pre-tax income), the progressivity of the combined system substantially exceeds that of income taxes alone. Countries with high income tax progressivity also tend to have high transfer system progressivity, but the transfer channel dominates. Analyses that focus solely on the income tax progressivity parameter τ therefore understate the total redistributive effort of high-income countries and overstate the tax-side role. This has direct implications for welfare analyses and cross-country comparisons using the log-linear framework.

Inference Based on Time-Varying SVARs Identified with Sign Restrictions

Mon, 01 Jan 0001 00:00:00 +0000

Layer 1 — Overview

Research Question. The paper asks how to conduct valid Bayesian inference in time-varying structural vector autoregressions (SVARs) identified with sign restrictions, a setting in which existing algorithms are shown to be theoretically flawed. As an empirical illustration, the authors use the new framework to examine three questions about the 2022–2023 Federal Reserve tightening cycle: (i) how did the Fed respond to the state of the economy; (ii) how would more dovish or hawkish stances have fared; and (iii) was the Fed behind the curve in 2021, and at what cost?

Methodology. The paper defines a class of rotation-invariant time-varying SVARs, building on Bognanni (2018). A model belongs to this class when its prior over sequences of structural parameters is invariant to orthogonal transformations of those sequences—i.e., it assigns equal prior density to all observationally equivalent structural parameter sequences (Proposition 1 establishes that observational equivalence corresponds exactly to orthogonal rotation of the sequence). The authors prove an if-and-only-if characterization (Proposition 2): a prior belongs to this class if and only if the induced prior over sequences of orthogonal matrices is uniform and independent of the time-varying reduced-form parameters.

A specific member of this class, the Random Correlations SVAR (RC-SVAR), is constructed by combining a prior over time-varying reduced-form parameters based on Archakov and Hansen’s (2021) parametrization of correlation matrices with a uniform prior over sequences of orthogonal matrices. The RC-SVAR is preferred over alternatives (Primiceri 2005’s decomposition, which is order-dependent; Bognanni’s 2018 discounted Wishart model, whose marginal likelihood significantly underperforms) because, for the type of empirical applications considered, it generally implies a higher log-predictive score than most orderings of the Primiceri (2005) model.

The authors introduce three algorithms. Algorithm 1 (simple acceptance sampling) is theoretically correct but computationally infeasible when sign restrictions span many periods because the probability of satisfying all restrictions simultaneously converges to zero as sample length T grows. Algorithm 2, the current approach in the literature (Baumeister and Peersman 2013; Bognanni 2018; Debortoli, Galí and Gambetti 2020), draws orthogonal matrices period-by-period from the sign-restriction-truncated uniform distribution; the authors show this does not draw from the correct target posterior because the resulting prior over orthogonal matrices is not independent of the reduced-form parameters and therefore the prior does not satisfy the rotation-invariance condition. Algorithm 3, the paper’s contribution, uses a Gibbs sampler that incorporates the Particle Gibbs with Ancestor Sampling (PGAS) method of Lindsten, Jordan and Schon (2014) to draw sequentially from the correct target posterior conditional on sign restrictions over an arbitrary number of periods.

An important additional contribution is the allowance for time-varying sign restrictions—restrictions that are imposed only in selected periods—enabling researchers to tailor identification to institutional knowledge about when particular restrictions are economically appropriate.

Data and Empirical Application. The RC-SVAR is estimated at a quarterly frequency with five variables: output growth (log difference of real GDP), core inflation (log difference of core PCE price index), the federal funds rate, money growth (log difference of M2), and the Moody’s Baa corporate bond yield relative to the 10-year Treasury yield (credit spread). The sample runs from 1959:Q1 to 2023:Q2, with a constant and two lags (n=5, p=2, m=11). Four independent MCMC chains of 20,000 draws are used, keeping every tenth draw after discarding the first 2,500; 1,800 particles approximate the reduced-form posterior and 3,600 particles approximate the posterior of the orthogonal matrices.

Main Findings. Decomposing the unexpected change in the federal funds rate from 2022:Q2 to 2023:Q2 into contributions from the predictable component, the systematic monetary policy response to non-monetary-policy shocks, and pure monetary policy shocks, the authors find that the lion’s share of the unpredictable rate increase was a systematic response to non-monetary policy shocks. Monetary policy shocks contributed about 100 basis points of the unexpected change in the federal funds rate by 2023:Q2 (out of roughly 4.99 percentage points of cumulative actual funds rate).

In the Dovish Fed counterfactual—where the response of the federal funds rate to contemporaneous inflation is halved for the first quarter of 2022—the economy would have marginally overheated, with inflation running persistently above 5 percent. In the Hawkish Fed counterfactual—where the response to inflation is doubled—inflation would have quickly declined at a small output cost: focusing on posterior medians, real GDP in 2023:Q2 would have been about 0.7 percent lower than in the data, though the lower envelope of the 68 percent probability bands indicates the output cost could have been as large as 3.1 percent.

Regarding the “behind the curve” question, the model finds evidence that the Fed was accommodative in 2021 (expansionary monetary policy shocks in that period), consistent with Summers (2021b). However, monetary policy shocks contributed only about 0.6 percentage points to annualized core inflation during 2021:Q2–2021:Q4 on a cumulative basis; the larger and dominant source of the unexpected inflation surge was non-monetary policy shocks. A comparison of the RC-SVAR with a constant-parameter SVAR identified only by Restriction 1 (Uhlig 2005) shows substantively different conclusions: the constant-parameter model attributes the unexpected increase in the federal funds rate to shocks that affect money growth and credit spreads, without a clear connection to the real economy, whereas the RC-SVAR links the rate increases to shocks that made the economy run hotter.

In depth

Q1. What is the fundamental theoretical flaw in existing algorithms for time-varying SVARs identified with sign restrictions, and why does it matter?

Existing algorithms (e.g., Baumeister and Peersman 2013; Bognanni 2018; Debortoli, Galí and Gambetti 2020) draw orthogonal matrices period-by-period from the uniform distribution restricted to those matrices satisfying the sign restrictions at each t. This construction implicitly defines a marginal density for the orthogonal matrices conditional on the reduced-form parameters that is not uniform: it is proportional to the reciprocal of the volume of the sign-restriction-satisfying subset of the orthogonal group, which depends on the reduced-form parameters. Consequently, the prior over structural parameters implied by these algorithms does not assign equal density to observationally equivalent sequences of structural parameters, violating Proposition 2’s necessary and sufficient condition. The resulting posteriors are therefore not correctly targeted to the desired posterior, meaning inference is distorted in a way that cannot be corrected by importance reweighting without prohibitive computation.

Q2. What does Proposition 1 establish, and how does it generalize the constant-parameter case?

Proposition 1 proves that two sequences of time-varying structural parameters are observationally equivalent if and only if there exists a sequence of orthogonal matrices such that one sequence is obtained from the other by post-multiplying each period’s structural parameters by the corresponding orthogonal matrix. This directly mirrors the constant-parameter result in Rubio-Ramírez, Waggoner and Zha (2010) and Uhlig (2005), where a single orthogonal matrix produces observational equivalence. The extension to sequences is non-trivial because the law of motion couples parameter draws across time, but the likelihood’s separability across periods preserves the period-by-period orthogonal rotation structure.

Q3. What is Proposition 2, and what is its practical implication for constructing valid priors?

Proposition 2 states that the prior over time-varying structural parameters satisfies the rotation-invariance condition (Equation 3) if and only if the induced prior over the time-varying orthogonal reduced-form parameters does not depend on the sequence of orthogonal matrices—equivalently, the prior over (Qt) is uniform over the product of orthogonal groups and is independent of the reduced-form parameters (Bt, Σt). The practical implication is constructive: any prior over time-varying reduced-form parameters (Bt, Σt), combined with an independent uniform prior over sequences of orthogonal matrices, automatically produces a rotation-invariant SVAR. This means that widely-used priors for reduced-form time-varying VARs (Primiceri 2005, Bognanni 2018, the new RC prior) can all be adapted for structural analysis without modification, as long as the orthogonal matrices are drawn uniformly and independently of the reduced-form parameters.

Q4. Why do models with heteroskedastic structural shocks (identification via heteroskedasticity) not belong to the class of rotation-invariant SVARs?

In models identified through heteroskedasticity, the time-varying structural parameters take the form (A Ψt^{-1/2}, F Ψt^{-1/2}), where Ψt is a time-varying diagonal matrix. For any permissible sequence, post-multiplying by a non-diagonal orthogonal matrix at one period produces a sequence where the ratio of structural parameters across consecutive periods is not diagonal, which violates the permissibility constraint of those models. Thus, the class of rotation-invariant SVARs and models identified through heteroskedasticity are mutually exclusive when the heteroskedastic specification has constant impulse responses up to scale—a restriction that the authors note has been criticized as a potential weakness of the heteroskedasticity-based approach.

Q5. Why is the Random Correlations SVAR (RC-SVAR) chosen as the baseline, and how does it compare to alternatives?

The RC-SVAR uses the Archakov and Hansen (2021) parametrization of correlation matrices to define a prior over time-varying reduced-form parameters that is order-invariant (unlike Primiceri 2005, which produces n! different elements depending on variable ordering) and avoids the highly restrictive structure of Bognanni’s (2018) discounted Wishart model, which significantly underperforms in marginal likelihood. For the empirical applications considered, Arias, Rubio-Ramírez and Shin (2023) show the RC-SVAR generally achieves a higher log-predictive score than most orderings of the Primiceri (2005) model, motivating its use as the baseline. The theoretical results apply to any member of the rotation-invariant class, so the algorithm is not specific to the RC-SVAR.

Q6. Why are time-varying sign restrictions important, and how are they implemented in the monetary policy application?

Time-varying sign restrictions allow researchers to impose identification restrictions only in periods where those restrictions are economically appropriate, adhering to the principle “If you know it, impose it; if you do not know it, do not impose it” (Uhlig 2017). In the monetary policy application, Restriction 2 (which constrains the contemporaneous elasticities in the policy rule to plausible ranges, following Arias, Caldara and Rubio-Ramírez 2019) is not imposed during three exceptional periods: 1979:Q4–1982:Q4 (non-borrowed reserves targeting under Volcker), 2009:Q1–2015:Q3 (quantitative easing following the Great Recession), and 2020:Q2–2021:Q4 (QE and effective zero lower bound during COVID-19). Restriction 1 (sign restrictions on impulse responses to a monetary policy shock, following Uhlig 2005) is imposed throughout the entire sample.

Q7. What do the estimated contemporaneous elasticities reveal about how monetary policy has changed over time?

The model estimates show substantial time variation. The contemporaneous elasticity of the federal funds rate to output growth exhibits three peaks: during Arthur Burns’s chairmanship in 1974 (capturing the sharp rate cut during the 1974–1975 recession), during Volcker’s chairmanship in 1983–1984 (when annualized real GDP growth averaged 6.8 percent), and during Greenspan’s tenure in 2001 (when the federal funds rate fell from 6.4 percent in December 2000 to 1.8 percent by end-2001). Outside these peaks, the elasticity averaged about 0.1, implying a 0.1 percentage point rise in the annualized federal funds rate per 1 percentage point increase in annualized GDP growth. The elasticity to inflation averaged about 0.3 percentage points per 1 percentage point rise in annualized core inflation, with a range from above 0.5 in the early 1970s and early Volcker years down to about 0.15 during Yellen’s tenure. The elasticity to the credit spread moved from about −1.4 at the beginning of Burns’s tenure to −2.2 at the end of Nixon’s presidency, then declined through the mid-1970s to the Great Recession, and stood at about −1 by mid-2023.

Q8. What is the exact decomposition of the 2022–2023 tightening cycle into predictable, systematic non-monetary, and monetary policy shock components?

Table 1 from the paper shows the federal funds rate decomposition. In 2022:Q2, the predictable component was 0.27 percentage points, the unpredictable component due to systematic response to non-monetary shocks was 0.24 pp, and the unpredictable component due to monetary policy shocks was 0.26 pp, summing to 0.77 pp. By 2023:Q2, these were 1.70 pp (predictable), 2.25 pp (systematic/non-monetary), and 1.04 pp (MP shocks), totaling 4.99 pp. Thus, at the tightening cycle’s end in 2023:Q2, the systematic response to non-monetary shocks accounted for about two-thirds of the unpredictable component (2.25 / (2.25 + 1.04) ≈ 68 percent), consistent with the broader literature finding that most variation in policy instruments is driven by the systematic component of policy.

Q9. How do the Hawkish and Dovish Fed counterfactuals work, and what do they imply?

The Hawkish (Dovish) counterfactual replaces the estimated contemporaneous response to inflation in the policy rule with one that is twice (half) as large as the estimated response for the first quarter of 2022, then simulates history forward from 2022:Q2 under the modified rule. Under the Dovish Fed, the economy would have marginally overheated with output rising above CBO potential GDP estimates, and inflation would have run persistently above 5 percent. Under the Hawkish Fed, posterior medians show inflation quickly declining at a cost of about 0.7 percent of real GDP in 2023:Q2 relative to the data; the lower envelope of the 68 percent probability bands shows the output cost could have been as large as 3.1 percent. A parallel set of counterfactuals, designed to be robust to the Lucas critique by working through one-time monetary policy shocks rather than changes to the reaction function, yields broadly similar results.

Q10. What does the comparison with Romer and Romer (2023a) reveal about the model’s monetary policy shock series?

Romer and Romer (2023a) identify a contractionary monetary policy shock in July 2022 (2022:Q3) using a narrative approach. The RC-SVAR’s estimated monetary policy shock series is broadly consistent with this finding: the model detects a contractionary shock in 2022:Q3 and, like Romer and Romer, also finds some evidence of a contractionary shock in 2022:Q2 (though they characterized it as “signs but not definitive evidence”). Beyond the Romer-Romer estimation window, the RC-SVAR additionally finds evidence of an expansionary monetary policy shock in 2023:Q1, when the Fed decelerated the pace of rate increases from 50 to 25 basis points.

Q11. How does the RC-SVAR’s inference on the 2022–2023 tightening cycle differ from that of a constant-parameter SVAR identified only with Restriction 1?

Two salient differences emerge. First, through the lens of the constant-parameter SVAR, monetary policy shocks contribute insignificantly to unexpected output growth between 2022:Q2 and 2023:Q2; in fact, the posterior median output response to a contractionary monetary policy shock is positive in that model (consistent with Uhlig 2005’s finding), implying that the positive monetary policy shocks needed to explain the rate increase would propel rather than reduce output. In the RC-SVAR, the posterior median output response to a contractionary shock is negative, so contractionary monetary policy shocks worked to decelerate output against a backdrop of non-monetary shocks that made the economy run hotter. Second, in the constant-parameter SVAR, non-monetary policy shocks that drive the unexpected increase in the federal funds rate do not propagate through output or inflation, whereas in the RC-SVAR they do—yielding a much more coherent macroeconomic narrative for the tightening cycle.

Q12. What does the model find about whether the Fed was behind the curve in 2021, and what were the consequences?

The model’s 2021:Q1 forecasts predicted the federal funds rate would reach about 0.6 percent by end-2021, consistent with a view that rate normalization was already warranted. The actual federal funds rate remained at its effective lower bound through 2021:Q4, and the shock decomposition shows that the cumulative unexpected change in the funds rate during 2021:Q2–2021:Q4 was driven by expansionary monetary policy shocks—supporting the view that monetary policy was accommodative and the FOMC fell behind the curve. However, monetary policy shocks contributed only about 0.6 percentage points (annualized) to the unexpected increase in core inflation during this period; the dominant and larger source of the inflation surge was non-monetary policy shocks. The model therefore finds that the delay in tightening was not the primary driver of the 2021 inflation surge.

Q13. Do time-varying sign restrictions materially affect inference, as demonstrated in Section 6.8?

Yes. Comparing the baseline identification scheme (Restrictions 1 and 2, with Restriction 2 not imposed during exceptional periods) against an alternative scheme that imposes both restrictions throughout the entire sample reveals differences in the estimated monetary policy shocks, particularly in 2021:Q4. Under the alternative scheme, there was an expansionary monetary policy shock in 2021:Q4, while the baseline finds the shock was nearly centered around zero. Additionally, for 2021:Q2, the alternative scheme implies the contemporaneous output response to an expansionary monetary policy shock is more likely to have been positive, whereas the baseline scheme yields a different posterior distribution for this response. These differences illustrate that imposing or omitting restrictions in specific periods affects inference about structural shocks and impulse responses at economically important junctures.

Key Concepts

Rotation-Invariant Time-Varying SVAR: A class of time-varying SVAR models whose prior over sequences of structural parameters satisfies: for every permissible sequence of structural parameters and every sequence of orthogonal matrices, the orthogonally-rotated sequence is also permissible and receives the same prior density. This ensures the prior does not break the observational equivalence among structural parameter sequences related by orthogonal rotation, so that identification comes solely from the imposed restrictions.

Observational Equivalence in Time-Varying SVARs: Two sequences of time-varying structural parameters are observationally equivalent if and only if there exists a sequence of orthogonal matrices such that one sequence equals the other sequence post-multiplied period-by-period by the corresponding orthogonal matrix. This definition extends Rothenberg’s (1971) concept to the time-varying setting and directly implies the rotation-invariance restriction.

Random Correlations SVAR (RC-SVAR): A specific member of the rotation-invariant class constructed by using the Archakov and Hansen (2021) parametrization of correlation matrices to define the prior over time-varying reduced-form parameters, combined with a uniform prior over sequences of orthogonal matrices. The prior is order-invariant and, for the empirical applications considered, generally achieves higher log-predictive scores than the workhorse Primiceri (2005) model.

Time-Varying Sign Restrictions: Sign restrictions imposed only on selected time periods rather than uniformly across the sample, implemented by allowing the restriction function St() to differ across t (including the possibility that no restriction is imposed at some t). This allows researchers to tailor identification to periods in which the theoretical or institutional knowledge motivating the restriction is deemed applicable—e.g., imposing policy-rule contemporaneous restrictions only when the federal funds rate is the primary policy instrument.

Particle Gibbs with Ancestor Sampling (PGAS): The sequential Monte Carlo method (from Lindsten, Jordan and Schon 2014) used in the paper’s Algorithm 3 to draw the sequence of structural parameters At from its conditional posterior given the sign restrictions. PGAS conditions on the previous Gibbs draw of the structural parameter sequence to ensure an invariant distribution, which is the key property that makes the Gibbs sampler valid for drawing from the correct target posterior.

Systematic Component of Monetary Policy: In the paper’s structural monetary policy equation, the linear combination of contemporaneous endogenous variables (output growth, inflation, money growth, credit spread) that enters the federal funds rate equation, weighted by the contemporaneous elasticities ψ. It represents the portion of interest rate variation that is a predictable, rule-based response to economic conditions, as distinguished from the monetary policy shock (the residual).

Contemporaneous Elasticity: The coefficient ψi,t in the monetary policy equation measuring the response of the federal funds rate to a one-unit contemporaneous change in variable i at time t, defined directly in terms of the structural parameter matrix At. The paper’s time-varying framework allows these elasticities to evolve over the sample, revealing historically distinct episodes of how aggressively the Fed responded to output growth, inflation, money growth, and credit spreads.

Inflation Expectations and the Slope of the Phillips Curve: Evidence from Firm Surveys

Mon, 01 Jan 0001 00:00:00 +0000

Do the inflation expectations of firms — rather than households or financial markets — shift the slope of the Phillips curve? Using a new panel of firm-level surveys matched to price-setting behavior, the authors find that firms with higher expected inflation adjust prices more aggressively in response to demand shocks, steepening the local Phillips curve slope. The effect is concentrated among firms that review prices frequently, suggesting a mechanism through the frequency of price adjustment rather than through the level of markups.

In depth

Q1. What is the main empirical finding on expectations and the Phillips curve slope?

Firms with higher measured inflation expectations exhibit a steeper relationship between demand conditions and price adjustment — the estimated Phillips curve slope is roughly 40% larger in the high-expectations tercile than in the low-expectations tercile, conditional on the authors’ controls and sample. The authors interpret this as evidence that expectations are not merely a level shift in inflation but alter the sensitivity of prices to real activity, consistent with forward-looking pricing theories.

Q2. What is the mechanism, and how do the authors identify it?

The authors argue that expectations work through the frequency of price review: firms expecting higher inflation are more likely to be in an active review window, and so respond more to a given demand shock within that window. Identification relies on cross-firm variation in survey-measured expectations within narrow industry-time cells, so that aggregate demand shocks are held approximately fixed. The authors acknowledge this strategy absorbs industry-specific inflation trends and may understate the full expectational effect.

Q3. What does this imply for monetary policy?

If the Phillips curve slope varies with expectations, then a credible disinflation — by lowering expected inflation — flattens the curve and makes the output cost of reducing inflation larger, not smaller. The authors present this as a potential mechanism behind the observed flattening of the curve in low-inflation regimes, though they stop short of a structural welfare calculation.

Key concepts

Phillips curve slope: The coefficient linking excess demand (or unemployment gap) to inflation in the short-run Phillips curve — steeper means a given demand shortfall has a larger disinflationary effect.
price review frequency: How often a firm actively reconsiders its prices; firms that review more often are more likely to adjust in response to new information within any given period.
firm-level survey expectations: Inflation expectations measured directly from firms (rather than households or markets), which may better capture the beliefs that drive actual price-setting decisions.

Input Sourcing under Climate Risk: Evidence from U.S. Manufacturing Firms

Mon, 01 Jan 0001 00:00:00 +0000

Blaum, Esposito, and Heise study how supply chain risk — specifically, the risk of unexpected shipping delays caused by ocean weather conditions — affects U.S. manufacturing firms’ import sourcing decisions. The paper asks three related questions: Do weather-induced shipping delays harm firm performance? Do firms adapt their sourcing strategies ex ante in response to shipping time risk? And what are the aggregate welfare costs of heightened supply chain risk from climate change, geopolitical tensions, and port congestion?

The empirical foundation is the U.S. Census Bureau’s Longitudinal Firm Trade Transactions Database (LFTTD), covering the universe of U.S. import transactions from 1992 to 2016, merged with the Longitudinal Business Database and Annual Survey of Manufacturers for firm-level outcomes. For ocean shipments, the authors reconstruct vessel routes using vessel names, foreign port stops, and U.S. ports of entry, then map those routes to hourly wave height and direction data from NOAA’s WaveWatch III model at 0.5-degree resolution across more than 40,000 distinct maritime routes (period: 2011–2016 for weather data).

The identification strategy proceeds in two steps. First, observed shipping times are regressed on a rich set of fixed effects — supplier, product, route-month, vessel, buyer, relationship status — plus controls for shipping charges and weight, to strip out anticipated determinants of delivery time. Second, the residuals are projected onto realized wave height and direction along the vessel’s route to isolate the weather-induced, unexpected component of shipping time variation. The identifying assumption is that realized wave conditions along the entire multi-week ocean crossing are not predictable by importers at the time orders are placed, beyond seasonal patterns absorbed by route-month fixed effects. This assumption is supported by the literature on weather forecasting, which finds accuracy degrades sharply beyond seven days.

The paper’s first empirical result concerns the consequences of weather-induced delays. Defining an extreme delay as a weather-induced shipping time above the 95th percentile for a given product-route, the authors estimate that a one standard deviation increase in the share of input costs that are weather-delayed (2.66 percentage points) reduces firm sales by 6.5%, profits by 3.5%, and employment by 1.0% within the same year. These effects are estimated from panel regressions for 2011–2016, with importer, product, and year fixed effects. The magnitudes indicate that firms are typically unable to fully hedge supply chain disruptions through insurance or financial instruments.

The paper’s second empirical result concerns ex ante adaptation. Risk exposure is measured as the standard deviation of weather-induced shipping times over three-year rolling windows for each supplier-route-product combination, then aggregated to the importer-product-year level using pre-determined import shares as weights (Bartik shift-share). Moving from the 25th to the 75th percentile of this shipping risk distribution increases the number of routes used by 7.7% and the number of foreign suppliers by 4.9%, while reducing total import value by 5.1%, route concentration (HHI) by 4.6%, and supplier concentration (HHI) by 3.2%. The risk effect on imports is estimated conditional on average shipping time, indicating that uncertainty exerts an additional, independent negative effect on import demand beyond the level of delays.

To rationalize these findings, the authors build a quantitative general equilibrium model of importing with firm heterogeneity. Firms source domestic and foreign inputs; foreign input quality is reduced when delivery is late, and firms face uncertainty about shipping times when placing orders. Risk-neutral firms nonetheless face a concavity in expected revenues from monopolistic competition, so higher variance in input quality reduces expected profits. Firms can diversify by adding foreign suppliers (at a per-supplier fixed cost), and a key theoretical result is that a mean-preserving spread in supplier quality variance increases the optimal number of suppliers but, because the extensive-margin elasticity is less than one, total import value necessarily falls.

The calibrated model is used to evaluate three counterfactual scenarios. Ocean wave height volatility increased by 0.34% per year on average between 2011 and 2023; projecting this trend forward 50 years generates a climate change scenario. The Houthi attacks in the Red Sea caused rerouting that raised both the mean and variance of navigation time. Post-Covid port congestion (2021–2022) increased the variance of port waiting times. Across all three scenarios, U.S. real income falls by 0.4% to 1.33%, driven by firms substituting toward more expensive domestic inputs as they reduce exposure to risky foreign sourcing.

The sample scope is U.S. manufacturing importers using ocean shipping during 2011–2016 for the main empirical results (weather data period), with an extended robustness sample of 1992–2016 using residualized shipping time volatility. The study covers 43,080 origin-destination port pairs, 401,700 unique vessels, and approximately 35.8 million seaborne transactions.

Q: What is the paper’s core research question? A: The paper asks how supply chain risk — specifically, the risk of unexpected delays in ocean shipping caused by weather conditions — affects U.S. manufacturing firms’ import sourcing decisions and aggregate welfare. It examines both the disruption effects of realized delays and the ex ante adaptation of sourcing strategies to risk exposure, then quantifies aggregate costs through a calibrated general equilibrium model.

Q: What data sources underpin the empirical analysis? A: The primary dataset is the LFTTD, which covers the universe of U.S. import transactions from 1992 to 2016, recording importer and exporter identities, HS-10 product codes, values, quantities, shipping dates, vessel names, and port pairs. This is merged with the Longitudinal Business Database for employment and industry, and with Census of Manufactures and Annual Survey of Manufacturers for sales, material costs, and payroll. Weather data come from NOAA’s WaveWatch III model at hourly, 0.5-degree resolution for 2011–2016. Ocean routes are constructed using Eurostat’s SeaRoute program, covering over 40,000 distinct routes across approximately 10,500 route segments.

Q: How do the authors isolate the unexpected component of shipping time variation? A: They use a two-step residualization. In step one, observed log shipping times are regressed on supplier, product, route-month, vessel, buyer, and relationship-status fixed effects, plus controls for log shipping charges and log weight; the residuals capture variation not explained by anticipated factors. In step two, these residuals are projected onto realized average wave height and relative wave direction along the vessel’s route to extract the weather-induced component. The identifying assumption is that importers cannot forecast realized wave conditions beyond seasonal patterns when placing orders that initiate multi-week ocean crossings, consistent with evidence that weather forecasts lose accuracy beyond seven days and that ocean wave height is particularly hard to predict.

Q: What are the estimated effects of weather-induced shipping delays on firm performance? A: A one standard deviation increase in the share of input costs that are weather-delayed (2.66 percentage points) reduces firm sales by 6.5%, profits by 3.5%, and employment by 1.0% within the same year. Using a broader measure of residualized shipping time delays (not restricted to the weather-induced component) produces similar results: a one standard deviation increase reduces sales by 6%, profits by 3.2%, and employment by 0.9%. These effects are estimated from panel regressions for 2011–2016 with importer, product, and year fixed effects.

Q: How do firms adjust their sourcing strategies in response to higher shipping time risk? A: Moving from the 25th to the 75th percentile of the shipping risk distribution (a 61 log-point increase) raises the number of routes used by 7.7% and the number of foreign suppliers by 4.9%, while reducing route HHI by 4.6%, supplier HHI by 3.2%, and total import value by 5.1%. The margin of route diversification is larger than supplier diversification, consistent with shipping risk being determined primarily at the route level. Higher risk also increases the likelihood of switching to air freight by 1.0% over the same interquartile range.

Q: Does the risk effect on imports operate independently of the level of shipping times? A: Yes. The regressions of total import demand on risk exposure control for average shipping time, and the coefficient on risk remains negative and significant after this control. This indicates that the variance of shipping times has an independent negative effect on import demand beyond the first-moment effect of longer average delays.

Q: What is the theoretical mechanism through which shipping time risk reduces import demand? A: In the model, firms are risk-neutral but face monopolistically competitive output markets, which introduces curvature in the revenue function. Higher variance in input quality (stemming from unpredictable shipping times) reduces expected revenues even for risk-neutral firms. Firms can diversify by adding foreign suppliers at a per-supplier fixed cost, which reduces variance in average input quality. However, the elasticity of the optimal number of suppliers with respect to quality variance is less than one, so total import expenditure necessarily falls as variance rises — diversification is incomplete and firms substitute toward domestic inputs.

Q: What does Proposition 1 state about the extensive margin response to risk? A: Proposition 1 establishes that, under the condition that shipping time risk is small relative to expected revenues, a mean-preserving spread in the variance of supplier quality increases the optimal number of foreign suppliers. However, the elasticity of the optimal number of suppliers with respect to quality variance is strictly less than one, which implies that total import value necessarily falls whenever quality variance increases, regardless of the extensive margin diversification response.

Q: How is the calibration structured and what moments does it target? A: The model features firm heterogeneity in both productivity and shipping time risk (variance of delivery times). The calibration targets three sets of moments: the estimated effect of shipping time risk on the extensive margin of importing (number of suppliers), the negative association between firm sales and average shipping times (which disciplines the timeliness elasticity parameter tau), and the joint distribution of firm size and risk observed in the data — specifically, the empirical finding that larger importers are matched with safer (lower-risk) foreign suppliers, with a correlation of -0.12. The calibrated model replicates the key moments of shipping time risk and import demand.

Q: What are the three counterfactual scenarios and their aggregate welfare costs? A: (1) Climate change: ocean wave height volatility increased by 0.34% per year on average between 2011 and 2023; projecting this trend forward 50 years and passing the resulting increase in shipping time variance through the model. (2) Red Sea/Houthi attacks: re-routing around the Suez Canal raises both the mean and variance of navigation time. (3) Post-Covid port congestion: greater variability in port waiting times during 2021–2022. Across all three scenarios, U.S. real income falls by 0.4% to 1.33%, driven by firms substituting from cheaper foreign inputs toward more expensive domestic production to reduce risk exposure.

Q: What is the role of the shift-share (Bartik) instrument in the risk exposure measure? A: The exposure measure aggregates supplier-route-product level risk (standard deviation of weather-induced shipping times over three-year rolling windows) to the importer-product-year level using pre-determined import shares from the prior three years as weights. Using lagged shares rather than contemporaneous shares ensures that the weights are not endogenous to current sourcing decisions. This construction is standard in the Bartik shift-share literature and helps isolate variation in risk that is plausibly exogenous to the firm’s current sourcing choices.

Q: How do the authors handle the endogeneity concern that firms may select into riskier routes? A: The weather-induced component of shipping time variation is by construction driven by realized ocean conditions that are unpredictable at the time orders are placed. The residualization removes all fixed-effect variation associated with route, season, vessel, supplier, and buyer characteristics. Additionally, the shift-share construction uses pre-determined weights, so risk exposure does not mechanically reflect current sourcing decisions. The authors also show robustness using the longer 1992–2016 sample with residualized (rather than weather-specific) shipping time volatility, obtaining qualitatively and quantitatively similar results.

Q: What does the paper contribute relative to the literature on shipping times and trade? A: Prior work by Evans and Harrigan (2005) and Hummels and Schaur (2010, 2013) focused on the level of shipping times (the first moment) as a trade cost. This paper is the first to systematically study the variance of shipping times (the second moment) as an independent determinant of import demand and sourcing structure, both empirically and theoretically. The authors show that uncertainty around delivery times has negative effects on trade that are separate from the effects of longer average delays.

Q: What are the robustness checks reported for the main empirical results? A: For the effects of risk on sourcing behavior, the authors show that using residualized shipping time volatility over the longer 1992–2016 sample (rather than the weather-induced measure over 2011–2016) produces similar results: moving from the 25th to the 75th percentile increases routes by 6.6%, suppliers by 3.7%, decreases route HHI by 3.9%, and supplier HHI by 2.5%, while reducing total imports by 10.5%. For the effects of delays on firm performance, applying the same specification with residualized (not weather-induced) delay shares yields coefficients on sales, profits, and employment that are very close to the baseline estimates.

Q: What are the welfare implications for firms that cannot hedge through financial markets? A: The large negative effects of weather-induced delays on sales, profits, and employment — and the finding that firms respond by ex ante restructuring their supply chains rather than relying on insurance — indicate that financial hedging instruments are largely unavailable or insufficient for managing input delivery risk. This motivates the model’s assumption that firms must manage risk through sourcing diversification, which is costly because of per-supplier fixed costs and because it ultimately requires substituting toward more expensive domestic inputs.

Weather-induced unexpected shipping time: The component of shipping time variation explained by realized ocean wave height and direction along the vessel’s route, after removing all variation attributable to anticipated factors (route, season, vessel, supplier, buyer characteristics, shipping charges, weight). Interpreted as unexpected because multi-week ocean crossings begin before accurate weather forecasts are available.

Shipping time risk: Measured as the standard deviation of weather-induced residualized shipping times over three-year rolling windows for each foreign supplier-route-product combination. This captures the second moment (variance) of delivery time uncertainty, distinct from the first moment (average shipping time level).

Shift-share risk exposure: An importer-product-year level risk measure constructed as a weighted average of supplier-route-product level risk, using pre-determined import shares from the prior three years as weights. This Bartik-style construction ensures exposure weights are not endogenous to current sourcing decisions.

Timeliness elasticity (tau): A structural parameter in the model governing how rapidly input quality degrades when delivery is later than expected. Specifically, when a shipment arrives di days late, quality is reduced by the factor exp(-tau*(di - E[di])). Calibrated to match the observed negative association between firm sales and average shipping times in the data.

Extensive margin diversification: The response of firms to higher shipping time risk by increasing the number of foreign suppliers and shipping routes used for a given product, rather than increasing the volume sourced from existing suppliers. In the model and data, this margin is the primary channel through which firms hedge delivery risk.

Mean-preserving spread condition: The theoretical condition (Proposition 1) under which higher variance in supplier quality increases the optimal number of foreign suppliers. The condition requires that shipping time risk be small relative to expected revenues, so that the diversification benefit of adding suppliers (reducing variance in average quality) dominates the revenue-reducing effect of higher variance.

Per-supplier fixed cost: A fixed cost in the model that must be paid for each foreign supplier relationship maintained. This cost limits the extent of diversification, ensuring that firms cannot fully eliminate shipping time risk by adding arbitrarily many suppliers, and that higher risk raises (rather than eliminates) per-unit sourcing costs.

Insurer Risk and Public Risk-Sharing: Quantifying the Value of Reinsurance

Mon, 01 Jan 0001 00:00:00 +0000

Kim and Li study how publicly provided reinsurance affects insurer behavior and market outcomes in health insurance markets where firms face substantial cost uncertainty. The central question is whether standard expected-profit models—which predict that reinsurance reducing only cost volatility (not expected cost) should leave prices unchanged—miss an important mechanism: insurers internalizing the implicit financial cost of bearing claims uncertainty through “risk charges.”

The paper develops a stylized monopoly-insurer model in which the insurer’s objective includes both expected claims cost and a risk charge term L(S), where S is a risk measure (e.g., standard deviation of total claims). This yields a first-order condition in which effective marginal cost includes both standard expected claims cost and a marginal risk charge. The model predicts that public reinsurance acts through two distinct channels: (1) a cost subsidy—reimbursing a share of high-cost claims reduces expected cost; and (2) risk protection—reducing the variance of claims lowers the risk charge and thus effective marginal cost. When both channels operate, the model predicts pass-through of public reinsurance to premiums can exceed unity, in contrast to the standard less-than-one pass-through under market power.

Empirically, the authors use three primary data sources for the U.S. individual health insurance exchange market. NAIC Schedule S filings (2014–2023) provide transaction-level private reinsurance contracts, including ceded premiums, realized claims, and financial solvency measures. CMS Public Use Files and MLR reports provide plan-level premiums, enrollment, and claims. The Colorado All Payer Claims Database (CO APCD, 2014–2022) and Connect for Health Colorado administrative records (2015–2021) provide individual-level claims and insurance choices for structural analysis.

Descriptive evidence establishes that 62% of exchange insurers purchase private reinsurance despite average reinsurance markups of 1.54 (reinsurance margin of 0.54), and that smaller, less financially solvent insurers are disproportionate buyers—consistent with risk charges driving demand for risk protection even at above-actuarially-fair prices.

An event study exploiting staggered adoption of state-level public reinsurance programs finds that public reinsurance reduces premiums by approximately 14.5% on average (27% in Colorado Tiers 1–2, 46% in Tier 3), with a pass-through rate of 1.3—significantly greater than one (p = 0.037 one-sided). Public reinsurance reduces the probability of purchasing private reinsurance by 26 percentage points (a 42% reduction from baseline) and per-member private reinsurance expenditures by $19.5 (a 68% reduction from baseline). Premium and private reinsurance effects are larger for financially constrained insurers (RBC ratio below 3). No significant effects are found on insurer entry/exit, total medical expenses (ruling out moral hazard), or private reinsurance markups.

The structural model, estimated on the Colorado exchange for 2017–2020, finds that the risk charge coefficient for regional insurers averages rho = 0.25, implying regional insurers face 9.8% higher effective costs than national insurers due to risk charges and private reinsurance expenses. Risk charges account for at least half the premium-cost wedge for small regional insurers. Counterfactual decomposition of Colorado’s program shows the direct cost subsidy accounts for approximately 75% of equilibrium price reductions; risk protection and competition effects together account for the remaining 25%. In a bang-for-buck comparison, public reinsurance dominates premium subsidies of equal government expenditure by approximately 20–30%, because reinsurance uniquely reduces risk charges and enhances competition by reducing smaller regional insurers’ cost disadvantage.

Q: What is the core theoretical innovation of the paper? A: The paper adds a risk charge term L(S) to the standard expected-profit objective, where S is a risk measure of the insurer’s cost distribution. This makes the insurer behave “as if risk averse,” with effective marginal cost including both expected claims cost and a marginal risk charge that decreases with insured pool size due to risk pooling. When rho = 0, the model collapses to the standard monopoly case; when rho > 0, cost uncertainty directly inflates prices and creates a novel role for reinsurance even when reinsurance is actuarially fair priced.

Q: What are the two distinct mechanisms through which public reinsurance affects insurer pricing? A: The first is a cost subsidy: by reimbursing a portion of high-cost claims without requiring an actuarially fair premium upfront, public reinsurance lowers the insurer’s net expected cost. The second is risk protection: by providing ex-post payments for extreme health shocks, reinsurance reduces the variance of claims costs, lowering the risk charge component of effective marginal cost. Together, these channels can produce pass-through exceeding unity even under imperfect competition, where standard cost-subsidy pass-through is typically below one.

Q: What does Proposition 1 say about actuarially fair reinsurance (theta = 1)? A: Proposition 1(i) states that actuarially fair reinsurance—which does not alter net expected cost—still lowers the insurer’s price if and only if the insurer faces a risk charge (rho > 0). An insurer without risk charges is entirely unaffected by actuarially fair reinsurance. This result isolates the risk-protection channel as theoretically distinct from cost subsidization and establishes that pass-through exceeding one requires risk charges to be operative.

Q: Why would an insurer purchase costly private reinsurance (theta > 1)? A: Proposition 1(iii) shows that an insurer with no risk charge would never purchase private reinsurance with theta > 1, since it increases net expected cost with no offsetting benefit. An insurer facing a risk charge (rho > 0) may purchase private reinsurance because the risk-protection benefit—the reduction in cost variance and thus the risk charge—can outweigh the net cost increase. The paper documents that 62% of exchange insurers buy private reinsurance at an average markup of 1.54 (reinsurance margin 0.54), with smaller and financially weaker insurers more likely to purchase, consistent with this mechanism.

Q: How does the paper establish empirically that insurers face and internalize cost uncertainty? A: Three lines of evidence are presented. First, the CO APCD shows the claims distribution has a long right tail: the top 5% (1%) of consumers account for 68% (38%) of total expenses, and 2.5% of consumers exceed the $30,000 reinsurance threshold. Second, simulations show that with 1,000 enrollees, the probability that realized claims exceed expected costs by 25% is approximately 7%; even at 10,000 enrollees there is a 17% probability of exceeding expected costs by 5%. Third, in over 24% of insurer-year observations premium revenue falls short of realized claims costs, and the within-firm standard deviation of the claims-to-premium ratio is 0.15.

Q: What are the event study findings on premiums? A: Using staggered introduction of state-level public reinsurance programs, the event study finds premiums fell by 14.5% on average following program adoption. In Colorado specifically, Tiers 1 and 2 experienced 27% decreases and Tier 3 (highest reinsurance generosity) experienced a 46% decrease. The implied pass-through rate for 2020 is 1.3, meaning for every dollar the government spent on reinsurance, health insurance premiums fell by $1.30. A one-sided t-test rejects pass-through equal to one at p = 0.037.

Q: What are the event study findings on private reinsurance? A: Public reinsurance reduces the probability that an insurer purchases private reinsurance by 26 percentage points, a 42% decline from the pre-program baseline. Average per-member private reinsurance expenditures fall by $19.5, a 68% reduction from baseline. The substitution away from private reinsurance is consistent with the model prediction that public reinsurance displaces the demand for risk protection previously met by private markets, and reinforces the interpretation that risk management is a key driver of private reinsurance demand.

Q: Do financially constrained insurers respond differently to public reinsurance? A: Yes. The premium-reduction effect is significantly larger for insurers with RBC ratios below 3 (an additional interaction effect of -0.161 log points on top of the baseline -0.135). The reduction in per-member private reinsurance expenditures is also significantly larger for insurers with significant prior private reinsurance purchases (-$108.8 vs. baseline of -$19.5). This heterogeneity supports the hypothesis that the risk protection channel is more valuable for financially constrained insurers who face higher implicit costs of bearing risk.

Q: Does public reinsurance affect insurer entry/exit, moral hazard, or private reinsurance markups? A: The event study finds no statistically significant effect on market entry, total monthly medical expenses per enrollee, the probability that individual expenses exceed the reinsurance threshold (ruling out insurer moral hazard), or private reinsurance markups paid by primary insurers. These null results support the interpretation that premium reductions reflect reduced cost uncertainty rather than cost containment distortions, and that the competitive structure of the private reinsurance market is not directly altered by public programs.

Q: What are the structural estimates of risk charges? A: The estimated risk charge coefficient for regional insurers averages rho = 0.25. This implies that regional insurers incur, on average, 9.8% higher effective costs than national insurers (who are assumed not to face risk charges due to scale and diversification), stemming from both direct risk charges and private reinsurance expenses required to manage risk. Risk charges account for at least half the observed wedge between premiums and marginal claims costs for small regional insurers.

Q: How does the structural model decompose the impact of Colorado’s reinsurance program? A: Counterfactual analysis decomposes the equilibrium price reduction into three channels. The direct cost subsidy effect—reimbursing a share of high-cost claims between the $30,000 attachment point and $400,000 cap—accounts for approximately 75% of the price reduction. The risk protection effect (reduction in risk charges from lower portfolio variance) and the competition effect (smaller regional insurers facing lower cost disadvantages and competing more aggressively with national insurers) together account for the remaining 25% of the equilibrium price reduction.

Q: How does public reinsurance compare to premium subsidies in bang-for-buck terms? A: For equal government expenditure, public reinsurance is estimated to be approximately 20–30% more cost-effective than premium subsidies at reducing premiums. The advantage stems from two sources: reinsurance reduces risk charges, shifting down the marginal cost curve for regional insurers in a way demand-side premium subsidies do not; and reinsurance enhances competition by reducing the cost disadvantage of smaller regional insurers relative to national ones. The dominant effect is risk reduction rather than markup inflation, making reinsurance the more efficient instrument when the degree of financial risk is considerable.

Q: What is the role of market size in risk charges, and why does this create a competitive asymmetry? A: The model shows that the marginal risk charge decreases as the insured population grows (risk pooling), with marginal standard deviation equal to sigma_0 / (2*sqrt(q)), which vanishes as q approaches infinity. This implies that larger national insurers, covering very large populations, effectively face no risk charges, while smaller regional insurers face meaningful marginal risk charges. This size-asymmetry is the fundamental reason why public reinsurance disproportionately benefits smaller insurers—by reducing their risk charges, it narrows the cost gap with national insurers and intensifies competition.

Q: What scope conditions apply to the structural findings? A: The structural estimates are based on the Colorado individual health insurance exchange, covering years 2017–2020, chosen to avoid unsatisfactory early data quality and to net out systematic pandemic effects. The model assumes national insurers do not face risk charges in the baseline specification, and that aggregate (correlated) risk is not the primary driver during the sample period. Results are robust to staggered-treatment corrections (Callaway-Sant’Anna 2021; Borusyak et al. 2024), alternative outcome measures (benchmark premiums, Silver plan averages), alternative aggregation levels, and sensitivity analyses allowing for insurer entry/exit, correlated risks, moral hazard, and alternative risk charge functional forms.

Q: What are the broader policy implications of the framework? A: The framework applies to any market where firms face substantial cost uncertainty and internalize financial risk, including property and casualty insurance, flood insurance, wildfire insurance, and government loan guarantee programs. The analysis suggests that ignoring the risk protection channel causes policymakers to underestimate the effectiveness of public reinsurance relative to demand-side subsidies. Supply-side risk-sharing policies are particularly important for markets with small, financially constrained firms, where cost uncertainty most severely distorts pricing and competition, and where the competitive benefits of risk reduction are largest.

Risk Charge: An additional cost term in the insurer’s objective function representing the implicit financial cost of bearing claims uncertainty, formalized as L(S) where S is a risk measure of total cost. Risk charges make the insurer behave “as if risk averse,” raising effective marginal cost above expected claims cost. In the baseline model the risk charge equals rho times the standard deviation of total claims.

Risk Charge Coefficient (rho): The parameter governing the insurer’s marginal cost of financial risk, estimated structurally at an average of 0.25 for regional insurers in Colorado. It can be interpreted as either a direct risk-aversion parameter, the marginal cost of regulatory capital, or a reduced-form representation of financial and regulatory frictions that make bearing cost uncertainty costly.

Risk Protection Channel: The mechanism through which reinsurance (public or private) reduces claims cost variance and thereby lowers the insurer’s risk charge, distinct from the cost-subsidy channel. The risk protection channel is operative even for actuarially fair reinsurance (theta = 1) and is responsible for pass-through rates exceeding unity under public reinsurance programs.

Cost Subsidy Channel: The mechanism through which subsidized public reinsurance (theta less than 1) lowers the insurer’s net expected claims cost by reimbursing a share of high-cost claims without charging an actuarially fair premium. This channel operates regardless of whether the insurer faces risk charges and is the primary channel in standard models.

Pass-Through Rate: The ratio of premium reduction to government expenditure on reinsurance. In standard models with market power, pass-through of cost subsidies is typically below one; the paper documents a pass-through rate of 1.3 in Colorado (p = 0.037 for the null of pass-through equal to one), attributing the excess to the risk protection channel reducing both expected cost and cost uncertainty simultaneously.

Stop-Loss Reinsurance: A contract structure in which the reinsurer reimburses the primary insurer for individual claims costs exceeding a deductible (attachment point) kappa up to a cap. In Colorado’s program the attachment point is $30,000 and the cap is $400,000, with government coinsurance rates of 40–80% depending on county tier. More generous reinsurance corresponds to lower kappa; full reinsurance is kappa = 0.

Risk-Based Capital (RBC) Ratio: The ratio of capital surplus (assets minus liabilities) to required risk-based capital, used by NAIC as a measure of insurer solvency. NAIC scrutinizes companies with RBC ratios below 200%; the paper uses RBC ratio below 3 as a proxy for financial constraint in heterogeneity analysis, finding larger premium and private reinsurance responses among constrained insurers.

Tail-End Risk: The risk arising from the possibility that a small fraction of enrollees incurs extremely high medical costs, concentrated in the right tail of the claims distribution. In Colorado, the top 5% of consumers account for 68% of total expenses; tail-end risk is especially severe for small insurers with fewer than 10,000–100,000 enrollees and is the primary motivation for private reinsurance purchases even at above-actuarially-fair prices.

International Reserve Management Under Rollover Crises

Mon, 01 Jan 0001 00:00:00 +0000

The paper extends the Cole-Kehoe (2000) sovereign rollover crisis model to include international reserves and derives the joint optimal management of sovereign debt and reserves in a small open economy subject to potential creditor coordination failure. The central results are: (i) reserves are only valuable as a rollover-crisis defense when debt has sufficiently long maturity; (ii) the optimal exit path from the crisis zone requires holding zero reserves while gradually reducing debt, then jumping simultaneously to the optimal safe pair (a*, b*) by issuing new debt while accumulating reserves; (iii) this seemingly paradoxical debt-financed reserve accumulation lowers bond spreads because it moves the economy fully into the safe zone.

Environment: The government issues long-maturity bonds with Macaulay duration 1/δ (δ=1 is one-period debt; δ→0 is a consol). In each period, creditors decide whether to roll over. If the economy is in the crisis zone C (defined below), a sunspot ζ ∈ {0,1} with P(ζ=1) = λ determines whether a coordination failure occurs: if ζ=1 and the government is in C, creditors refuse to roll over, and the government must use reserves to service debt; if reserves are insufficient, the government defaults. The government also holds reserves a ≥ 0 earning the risk-free rate r.

Three-zone structure (Definition 1, Figure 1): the debt-reserve space (b,a) is partitioned into:

Safe zone S: b < b−(a) — government can meet its debt obligations even if the rollover crisis sunspot realizes (ζ=1); reserves are sufficient to cover the redemption shortfall
Crisis zone C: b−(a) ≤ b ≤ b+(a) — a rollover crisis is possible but not inevitable; if ζ=1, the government defaults unless reserves cover the gap; if ζ=0, the government refinances normally
Default zone D: b > b+(a) — the government defaults regardless of the sunspot because its debt burden exceeds any feasible repayment

Proposition 2 — Reserves expand the safe zone: Both boundaries b−(a) and b+(a) are increasing in reserves a. The slope of b−(a) with respect to a is steeper than the slope of b+(a), so as reserves rise: the safe zone expands, the crisis zone narrows, and the default zone shrinks. Reserves improve debt sustainability by shifting both zone boundaries to higher debt levels, but the benefit falls with debt because high-debt governments are closer to the default zone where reserves cannot compensate.

Proposition 3 — Positive reserves require long debt maturity: Optimal reserves a* > 0 requires that debt maturity is long enough (condition (18): δ < δ̄ for some threshold δ̄ < 1). The intuition is mechanical: if there is a rollover crisis with one-period debt (δ=1), the government must immediately repay the full face value b of all outstanding bonds; moderate reserve stocks a « b cannot cover this, making reserves useless. With long-maturity debt (δ<1), a rollover crisis only forces repayment of the near-term cash flow (δb plus coupon), which a much smaller reserve buffer a can cover. Hence reserves only provide value — and are only demanded — when debt has sufficient duration.

Proposition 4 — No reserves with one-period debt: When δ=1 (pure short-term debt), the optimal reserve level is zero: a* = 0. This follows directly from Proposition 3: one-period debt lies above the maturity threshold, so the safe zone cannot be expanded by any feasible reserve level.

Proposition 5 and Corollary 1 — Optimal exit strategy: The optimal exit path from the crisis zone is non-monotone in reserves:

While in the crisis zone, hold zero reserves (a=0) and reduce debt b through primary surpluses
Continue reducing debt until the government can reach the optimal safe pair (a*, b*) in a single period
In that final period, simultaneously issue new debt (increase b) AND accumulate reserves (increase a to a*), jumping directly from the safe zone to (a*, b*)

The counterintuitive simultaneous debt issuance in step 3 lowers bond spreads immediately because the reserve accumulation moves the economy firmly into the safe zone, eliminating rollover risk for creditors who then demand a lower yield premium. The optimal path delays all reserve accumulation until this transition step — building reserves gradually while in the crisis zone is suboptimal because partial reserves still leave the economy vulnerable to sunspot crises while incurring the return cost of holding low-yield liquid assets.

Proposition 6 — One-period exit condition: If the government’s current net foreign asset position NFA = a − q·b exceeds the NFA at (a*, b*), the government can exit the crisis zone in a single period.

Calibration (Italy 2012 sovereign debt crisis as the target economy):

Endowment: y = 1 (normalized); relative risk aversion: σ = 2; risk-free rate: r = 3% annually; discount factor: β = (1+r)^{−1}
Debt maturity: 1/δ = 7 years (corresponding to Italy’s average debt maturity in 2012)
Default cost: consumption floor c = 0.70 (government can guarantee 70% of normal consumption even in default, with the residual representing trade balance adjustment and output losses)
Rollover crisis probability: λ = 0.5% per quarter (calibrated to historical sovereign crisis frequency in the data)
Crisis zone midpoint parameter ϕ calibrated to set the midpoint of the crisis zone at 90% of GDP debt (consistent with Italy’s 2012 position at the crisis zone boundary)
Optimal safe pair: a* = 0.05 (5% of GDP in reserves); b* = 0.93 (93% of GDP in debt)
With reserves a = a*: bond price at b = b* is higher than without reserves; the b+(a) boundary shifts outward, confirming reserves improve debt sustainability
Without reserves (a=0): for the same debt level b = b*, bond price is lower and rollover risk is higher — the counterfactual quantifies the reserves premium

Sensitivity analysis:

Shorter debt maturity (1/δ = 4 years): optimal reserves rise substantially, to approximately 30% of GDP, because shorter maturity means the government must cover a larger fraction of face value in a rollover crisis
Higher risk aversion (σ > 2): optimal reserves increase (the welfare cost of default is higher, raising demand for precautionary reserves)
Higher default cost (lower consumption floor c): optimal reserves decrease (default is so costly to avoid that the government maintains a small debt stock in the safe zone even without reserves)

Policy implication: The standard IMF prescription to immediately accumulate reserves after a sovereign crisis is suboptimal for highly indebted governments. The paper prescribes the opposite sequence: first reduce debt through fiscal adjustment until the government can jump to (a*, b*) in a single step, then execute the jump by simultaneously issuing debt and accumulating reserves. Importantly, this jump increases both debt and reserves relative to the pre-jump position but is welfare-improving because it eliminates rollover risk — the yield reduction from entering the safe zone more than offsets the higher debt service.

Scope conditions: The model abstracts from: reserves serving exchange rate management or import coverage purposes (only rollover crisis defense modeled); a domestic banking sector; capital controls; negotiated renegotiation after default (default is assumed final). The rollover crisis mechanism is purely self-fulfilling (no fundamental triggers); the calibration is specific to Italy’s 2012 maturity structure, output level, and crisis zone midpoint.

In depth

Q1. What are the three zones, and how do reserves shift their boundaries?

The safe zone S is the set of (b,a) pairs where the government can repay even under a rollover crisis sunspot (ζ=1), because reserves cover the financing shortfall; the crisis zone C is where self-fulfilling rollover crises are possible but not inevitable (government survives if ζ=0); the default zone D is where the government defaults regardless of the sunspot because debt exceeds any payable amount. Reserves shift both boundaries of the crisis zone to higher debt levels (Proposition 2), with the S/C boundary b−(a) rising more steeply than the C/D boundary b+(a), so the safe zone expands and the crisis zone narrows as reserves increase. This shift is the core channel through which reserves improve debt sustainability: at any given debt level b, a higher a makes it more likely that b < b−(a) (i.e., the economy is in the safe zone).

Q2. Why do reserves only matter for long-maturity debt?

With one-period debt, a rollover crisis forces immediate repayment of the full face value b — a total that any realistic reserve stock a « b cannot cover, so reserves provide zero marginal benefit against rollover risk. With long-maturity debt (duration 1/δ), a rollover crisis only requires repayment of the current-period obligation (δb + coupon), which scales with δ; as δ → 0 (near-perpetuity), this obligation becomes arbitrarily small and any positive reserve stock can cover it. Proposition 3 formalizes this by showing that a* > 0 requires δ < δ̄ (a maximum maturity threshold), and Proposition 4 confirms that δ=1 (one-period debt) implies a*=0 regardless of other parameters.

Q3. Why should a government in the crisis zone hold zero reserves?

Holding reserves while in the crisis zone is costly because reserves earn the risk-free rate r, which is lower than the sovereign’s borrowing rate (which includes a rollover risk premium); the cost of holding reserves is therefore the spread between the sovereign’s borrowing cost and the risk-free rate. The benefit of reserves while in the crisis zone is partial: positive reserves reduce the probability of default in a rollover crisis but do not eliminate rollover risk entirely (the economy remains in C for moderate a). The return on accumulating reserves jumps discontinuously when crossing from C into S — only in the safe zone do reserves entirely eliminate rollover risk. Hence the optimal strategy concentrates all reserve accumulation at the transition step when the economy crosses into the safe zone.

Q4. Why does the optimal exit involve simultaneously issuing debt and accumulating reserves?

The jump to (a, b) requires the government to reach a higher reserve level a* and a higher-than-current debt level b* simultaneously; b* > current b because (a*, b*) is inside the safe zone at a debt level the government can afford, not at the minimum possible debt level.** The debt issuance at the moment of transition is financed at the safe-zone bond price (lower spread) rather than the crisis-zone price, making the gross financing cost of the extra debt affordable. More importantly, the simultaneous reserve accumulation moves the economy into the safe zone, raising the bond price immediately: creditors see that a = a* makes b = b* safe, and they lower the yield premium accordingly. This feedback means the jump is self-financing in terms of expected debt service — the yield reduction partially covers the cost of holding reserves.

Q5. Why is the IMF prescription of immediate reserve accumulation suboptimal?

The standard prescription is to begin accumulating reserves as soon as a crisis episode passes, which keeps the government in the crisis zone longer (because reserve accumulation diverts fiscal resources from debt reduction) while paying the spread cost on all reserves held at crisis-zone yields. The paper’s prescription is to instead prioritize debt reduction until the government can make the one-step exit (Proposition 6: NFA(current) > NFA(a*, b*)), then execute the jump. This path reaches the safe zone with total lower expected cost because: (i) time spent in the crisis zone is minimized; (ii) the carry cost of reserves (spread between borrowing rate and safe asset return) is paid only for the brief period of the transition, not throughout the exit path.

Q6. How do reserves affect bond prices and spreads?

Reserves reduce sovereign spreads through two channels: (i) a direct precautionary channel — for a government already in the safe zone, reserves make the safety guarantee more credible and support the high bond price; (ii) a zone-transition channel — crossing from the crisis zone to the safe zone by accumulating reserves to a eliminates the rollover risk premium that was embedded in crisis-zone yields.* In the calibration, at Italy’s 2012 debt level (≈127% of GDP), zero reserves implies the government is in the crisis zone or default zone — bonds trade at distressed prices. At the calibrated safe pair (a*=5%, b*=93%), bonds price at the risk-free rate plus a default risk premium that excludes rollover-crisis risk. The counterfactual (same b*, a=0) yields a lower bond price, quantifying the reserves’ contribution to debt sustainability.

Q7. What does the Italy 2012 calibration imply for actual Eurozone crisis management?

Italy’s 2012 debt-to-GDP ratio of approximately 127% places it well above the optimal target b=93%, suggesting Italy was not in the safe zone even had it held substantial reserves; the primary prescription for Italy at that moment — debt reduction, not reserve accumulation — follows directly from the model’s exit strategy (Propositions 5-6).* The model also implies that European bailout mechanisms (ESM, OMT) shifted the effective boundary of the safe zone by providing contingent external reserves, consistent with the empirical observation that ECB President Draghi’s “whatever it takes” announcement in July 2012 moved Italy’s bond yields toward safe-zone pricing without any actual reserve or debt movement.

Key concepts

rollover crisis : a self-fulfilling coordination failure in which creditors refuse to roll over maturing sovereign debt not because solvency fundamentals require default but because they expect other creditors to refuse; modeled by a sunspot ζ=1 with probability λ that triggers a crisis when the economy is in the crisis zone C.

safe zone : the set of (b,a) pairs where the government can service its debt even under the worst-case sunspot (ζ=1); defined by b < b−(a); entering the safe zone eliminates rollover risk entirely and immediately lowers bond yields to the risk-free rate plus a pure credit-risk premium.

crisis zone : the set of (b,a) pairs where rollover crises are possible but not certain; b−(a) ≤ b ≤ b+(a); the government survives if ζ=0 but defaults if ζ=1; bonds are priced to include a rollover risk premium while in this zone.

optimal exit strategy : Proposition 5 and Corollary 1 — the welfare-maximizing path out of the crisis zone; involves holding zero reserves while reducing debt, followed by a simultaneous jump to (a*, b*) that increases both reserves and debt, moving the economy immediately to the safe zone and eliminating rollover risk in a single step.

long-maturity debt advantage : the property (Proposition 3) that reserves only provide rollover-crisis protection when debt has sufficiently long maturity (δ < δ̄); with short-maturity debt, a rollover crisis forces repayment of the full face value, which no realistic reserve stock can cover; with long-maturity debt, only the near-term cash flow must be covered.

debt-financed reserve accumulation : the seemingly paradoxical simultaneous issuance of new long-maturity bonds and accumulation of reserves at the moment of exit (a=0→a*, b<b*→b*); welfare-improving because the jump moves the economy into the safe zone, lowering bond yields immediately and making the higher debt affordable.

Investing in Influence: Investors, Portfolio Firms, and Political Giving

Mon, 01 Jan 0001 00:00:00 +0000

This paper investigates whether institutional investors influence the political activities of their portfolio firms, using political action committee (PAC) giving as a window into the broader question of whether institutional investors can leverage their concentrated ownership to extract benefits from portfolio firms for their own interests rather than those of their clients.

The sample covers 574 institutional investors (those with at least $100 million in assets under management, i.e., 13-F filers) matched to 2,456 portfolio firms that had PACs, over the period 1980–2018. The primary source of variation is the first acquisition by an institutional investor of at least one percent of a portfolio firm’s outstanding shares, yielding 68,387 large acquisition events. PAC giving data come from FEC records matched by name to investor and firm entities. The main regression specification examines how the relationship between investor and firm PAC contributions to the same congressional district changes after such an acquisition, using a saturated set of fixed effects including firm × investor, firm × congressional district, firm × election cycle, investor × congressional district, investor × election cycle, and district × election cycle.

The central finding is that, following a large block purchase, a firm’s PAC giving mirrors more closely that of the acquiring investment management company. In the preferred specification (column 8 of Table 2), the probability that a portfolio firm gives to a politician supported by its investor’s PAC increases by 31 percent after an acquisition. Using a cosine similarity measure of investor-firm PAC giving, the mean similarity of 0.10 at the acquisition cycle rises by 0.02–0.03 (a 20–30 percent increase) by the fourth post-acquisition election cycle.

A key identification concern is that acquisitions may be driven by shared political preferences rather than representing a causal effect. To address this, the authors exploit stock index inclusions as exogenous shifters of institutional investor block purchases: when a firm is added to an index for the first time, passive indexers are compelled to rebalance toward that firm regardless of political alignment. Restricting to 5,601 index-inclusion acquisitions by passive investors, the authors find near-identical effect sizes (beta1 = 0.0132 in column 8 versus 0.0135 in the full sample), and an event study shows no pre-trend in giving convergence for the index subsample, in contrast to a slight pre-trend in the full sample. Divestment events exhibit the symmetric negative pattern: the interaction of post-divestment and investor PAC giving falls by between -0.074 and -0.058 across specifications.

The authors argue that investors drive the convergence rather than portfolio firms adjusting investor preferences. Around acquisition dates, firms exhibit a larger drop in between-election-cycle cosine similarity than investors do. In a difference-in-differences comparison of the acquisition period relative to the preceding period, the difference in stability between investors and firms is 0.075 (significant at the 1 percent level), indicating that firms shift their giving more than investors. Investors obtaining a board seat at the portfolio firm amplifies the effect: in the preferred specification, the board-seat interaction is more than twice as large as the acquisition-alone interaction.

Heterogeneity analysis provides evidence that the convergence reflects investors’ partisan tastes rather than coordinated profit-maximizing political strategy. Acquisitions by more partisan investors (those whose giving is more skewed toward one party) produce a convergence coefficient roughly twice as large (0.020) as less partisan investors (0.010). Private fund families show more than twice the convergence effect of publicly owned fund families. The partisan composition of firm giving also shifts: a firm acquired by an investor giving exclusively to Republicans sees its Republican share increase by 2.8 percentage points relative to a baseline of 47.4 percent (a 5.9 percent increase).

Finally, higher overall institutional ownership is associated with an increase in total PAC giving at the firm level, and this expanded giving does not go disproportionately to politicians on committees overseeing issues the firm actively lobbies — suggesting the ownership-driven increment in political spending is non-strategic from the firm’s profit standpoint and likely serves investors’ own interests.

Q: What is the central research question and why does it matter? The paper asks whether institutional investors influence the political giving of portfolio firms, motivated by the broader concern that the rise of institutional ownership — from 6 percent of U.S. public equities in 1950 to 65 percent in 2017 — concentrates not only economic but also political power in the hands of a small number of asset managers. This matters because if investors shape firms’ PAC giving to serve investors’ own preferences rather than firms’ profit interests, it represents a misuse of corporate resources and a potential amplification of a small group’s political voice.

Q: What data are used and how is the sample constructed? The analysis draws on 13-F filings (investors with at least $100M AUM) from Thomson-Reuters, matched to FEC PAC records via fuzzy and manual name matching. The resulting sample contains 574 investors with PACs and 2,456 portfolio firms with PACs, spanning 1980–2018. The Cartesian product of investor-firm pairs is restricted to those connected by at least one large acquisition event (defined as first acquisition of at least 1 percent of outstanding shares), yielding 68,387 such events. PAC contributions are measured at the investor- and firm-congressional-district-election-cycle level, linked to House of Representatives winners using MIT Election Data files.

Q: What is the baseline regression and what does it find? The baseline regression (equation 1) interacts Log Investor PAC with a Post indicator (equal to 1 after the first large acquisition and while the stake is maintained) at the investor-firm-congressional-district-election-cycle level, with a saturated set of fixed effects. The coefficient on the interaction (beta1) is positive and highly significant (p < 0.001) across all eight specifications, ranging from 0.013 to 0.032. In the preferred specification, the increase in giving similarity is 31 percent relative to the pre-acquisition baseline.

Q: How do the authors establish causality and rule out endogenous acquisitions? The primary identification strategy uses first-time inclusions of firms in stock indices (approximately 1,000 indices tracked in the sample) as exogenous shifters: passive indexers must rebalance toward the included firm regardless of political alignment. This subsample of 5,601 index-inclusion acquisitions produces near-identical coefficient estimates (0.0132 versus 0.0135 in the full sample), and the event study for this subsample shows no pre-trend in giving convergence, unlike the slight pre-trend in the full sample. Equality of the two coefficients cannot be rejected at standard significance levels.

Q: What evidence shows it is firms adjusting to investors rather than the reverse? The authors compute between-election-cycle cosine similarity separately for investors and firms around acquisitions. On average, investors exhibit more stable giving than firms at acquisition dates (Cos(xi,t, xi,t+1) > Cos(xf,t, xf,t+1)). The difference-in-differences estimate — comparing the acquisition period to the preceding period — is 0.075 (significant at 1 percent), indicating a relatively larger break in firm giving. Over a two-cycle window, the difference-in-differences estimate is 0.083, again indicating convergence is driven by firms shifting toward investors rather than the reverse.

Q: What role does board representation play? In approximately 5 percent of acquisitions in the sample, the investor obtains a board seat. In specifications that include both the acquisition effect (Post × Log Investor PAC) and a board-membership interaction (Board × Log Investor PAC), both terms are positive and significant at the 1 percent level. In the preferred specification, the board-seat interaction is more than twice as large as the acquisition-alone interaction, indicating that a direct governance channel — board representation — substantially amplifies the convergence in political giving.

Q: What does the divestment analysis show? Symmetric to the acquisition results, divestment events (where an investor exits a stake of at least 1 percent held for at least one election cycle) are associated with a decline in investor-firm PAC giving correlation. Post-divestment interaction coefficients range from -0.074 to -0.058 across specifications, and an event study confirms the correlation falls sharply after the divestment cycle.

Q: Does investor partisanship affect the magnitude of influence? Yes. Classifying investors as “More Partisan” (above-mean absolute deviation from 50/50 party split) versus “Less Partisan,” the interaction coefficient for More Partisan investors (0.020) is roughly twice that of Less Partisan investors (0.010). After a large acquisition by a fully Republican-giving investor, the acquired firm’s giving to that politician increases by 23.5 percent; the comparable figure for a Less Partisan investor is 7.6 percent. This pattern holds in both the full sample and the index-inclusion subsample.

Q: How do private versus public fund families differ in their influence? Private fund families (e.g., Vanguard, Fidelity) show more than twice the convergence coefficient of publicly owned fund families (e.g., BlackRock, State Street, Invesco). The authors attribute this to private fund managers facing less outside scrutiny, allowing their giving to more readily reflect the preferences of owners and managers. Private investors also show greater partisan polarization: the 10th–90th percentile Republican-giving range for private investors is 6.3–100 percent, versus 21.7–88.3 percent for public investors.

Q: Does increased institutional ownership expand overall firm PAC spending? Yes. In firm-year level regressions, institutional ownership is a positive and significant predictor of total firm PAC giving (significant at at least the 5 percent level in both cross-sectional and firm-fixed-effects specifications). Total corporate political expenditure by sample firms increased by nearly a factor of six over 1980–2018. The authors note that while many factors contribute, increased institutional ownership may be at least partly responsible for this expansion.

Q: Does the additional giving driven by institutional ownership go to strategically important politicians for the firm? No. Regressions relating institutional ownership to giving to politicians on congressional committees overseeing issues the firm actively lobbies (a standard measure of politicians’ strategic importance to firms) yield near-zero and statistically weak point estimates. In the preferred firm-fixed-effects specification, the share of total PAC giving devoted to such strategically relevant politicians is negatively associated with institutional ownership at marginal significance (p < 0.10), consistent with the interpretation that ownership-driven incremental political spending is non-strategic from the firm’s own profit perspective and expands total giving rather than displacing strategic giving.

Q: What are the policy and legal implications? The authors flag three concerns: (i) the ownership-driven increment in political spending may represent a misuse of corporate resources that does not serve portfolio firm shareholders; (ii) it may constitute an illegal activity, since using a firm’s PAC to reimburse or proxy for an investor’s own political preferences can run afoul of campaign finance law; and (iii) it is a channel through which unequal resources amplify the political voice of a small number of fund managers at the expense of dispersed ultimate investors who are likely unaware of and do not sanction these contributions. The findings challenge the Supreme Court’s premise in Citizens United that corporate political speech reflects shareholder profit maximization.

PAC comovement (investor-firm giving similarity): The increase in the probability that a portfolio firm’s PAC donates to a politician also supported by an acquiring investor’s PAC, measured as the interaction coefficient between Log Investor PAC and a Post-acquisition indicator in the baseline regression. In the preferred specification this represents a 31 percent increase relative to the pre-acquisition baseline.

Cosine similarity (cross-time and cross-entity): A measure defined as the Euclidean dot product between two vectors of PAC giving (either the same entity across adjacent election cycles, or investor versus firm in the same cycle), taking values between 0 and 1, where 1 indicates identical giving patterns. Used both to confirm convergence post-acquisition and to attribute that convergence to firm rather than investor adjustment.

Index-inclusion acquisition: A large block purchase that results from a firm being added for the first time to a stock index tracked by a passive institutional investor, used as an exogenous shifter of investor stakes that is orthogonal to investor-firm political alignment. There are 5,601 such events in the sample.

Partisanship (investor): Classified as “More Partisan” if an investor’s absolute deviation from a 50/50 party split in PAC donations is above the sample mean. More partisan investors produce roughly twice the convergence effect on portfolio firm giving compared to less partisan investors, used as evidence that personal political preferences rather than profit-maximizing business strategy drive the convergence.

Post indicator (Postift): A binary variable equal to 1 for all election cycles following an investor’s first acquisition of at least 1 percent of a portfolio firm’s outstanding shares, and remaining 1 as long as the investor holds any stake in the firm. The key source of temporal variation in the baseline regression.

Strategically important politicians: Members of Congress sitting on committees that oversee issues on which a firm actively lobbies, identified by crosswalking lobbying reports from the Senate Office of Public Records to relevant committee jurisdictions. Used to test whether ownership-driven political giving displaces or supplements firm-profit-motivated giving.

Board seat channel: The mechanism through which investor influence on firm political giving is amplified when the investor obtains representation on the portfolio firm’s board of directors (present in approximately 5 percent of acquisitions). The board interaction coefficient is more than twice the acquisition-alone coefficient in the preferred specification.

Jackknife Standard Errors for Clustered Regression

Mon, 01 Jan 0001 00:00:00 +0000

Hansen (2025) makes a theoretical case for replacing the conventional cluster-robust variance estimator (CRVE) and heteroskedasticity-consistent (HC) standard errors with a specific jackknife variance estimator, V5, in linear regression with heteroskedastic and/or cluster-dependent observations.

The paper identifies two fundamental problems with conventional CRVE1 and CRVE2 estimators. First, these estimators can be fully downward biased: Theorem 2 establishes that the infimum of E[v̂1²]/v² and E[v̂2²]/v² over all admissible regressor and covariance matrix configurations equals zero, meaning expected variance can be arbitrarily close to zero relative to the true variance. This pathology arises from extreme regressor leverage — specifically when one cluster dominates the sample — and holds even under homoskedasticity and clusterwise invertibility. Second, Theorem 5 shows that confidence intervals constructed from CRVE1 and CRVE2 standard errors have worst-case coverage probability equal to zero for any finite critical value c, making them unable to achieve any target coverage level uniformly over regression designs.

Crucially, Hansen shows that even the conventional jackknife estimators V3 and V4, which are already in use (e.g., via Stata’s vce(jackknife) option), share these pathologies when clusterwise noninvertibility is present. Clusterwise noninvertibility occurs when deleting a single cluster renders the regressor matrix singular — as in regressions with cluster-level fixed effects, a single treated cluster, or sparse dummy variables. Stata’s existing fix of simply dropping noninvertible clusters is shown to be insufficient: under clusterwise noninvertibility, the infimum of E[v̂3²]/v² and E[v̂4²]/v² over the broader model class equals zero (Theorem 2, equations 19–20), and the corresponding confidence intervals also achieve worst-case coverage of zero.

The proposed estimator V5 resolves these problems through three modifications to the conventional jackknife: (1) it uses a generalized (Moore-Penrose) inverse rather than dropping noninvertible clusters, ensuring all clusters are included; (2) it centers at the full-sample estimator β̂ rather than the mean of delete-one estimates; and (3) it omits the (G−1)/G degrees-of-freedom correction. Theorem 1 proves that E[V̂5] ≥ V in the positive semidefinite sense for all sample sizes, regressor matrices, and covariance structures — the estimator is never downward biased. Theorem 3 then shows that jackknife-based confidence intervals C̃5(c) have coverage probability bounded below by the Cauchy distribution for any c ≥ 1. With the conventional critical value c = 1.96, this guarantees finite-sample coverage of at least 70% and test size of at most 30%, regardless of regression design or error variance structure.

To improve upon the conservative Cauchy bound in practice, the paper proposes a Satterthwaite adjusted t approximation for the jackknife t-ratio. The adjustment derives degrees of freedom K and a scale factor a from the eigenvalue structure of a design-dependent matrix D. Theorem 7 shows that a → 1 and K → ∞ as n → ∞ under mild regularity conditions (no single cluster dominates). Simulation evidence across six regression designs — varying regressor distributions (Normal, LogNormal with cluster dependence, sparse Dummy) and error structures (clustered normal, heteroskedastic) — with G ∈ {6, 12, 40, 100} clusters confirms that the Satterthwaite jackknife interval achieves coverage rates uniformly above 93% at the nominal 95% level even with G = 6, while CRVE1 intervals fall as low as 57% coverage in the LogNormal/heteroskedastic design. The empirical application extends Meng, Qian, and Yared (2015) on Chinese TV access and redistribution preferences, finding that the jackknife standard error for the TV access coefficient exceeds the CRVE1 standard error and the Satterthwaite interval is wider, affecting conclusions about statistical significance.

The theory holds under Assumptions 1–4: correctly specified linear regression with zero conditional mean errors, full rank X, finite second moments, arbitrary cluster sizes and within-cluster covariance structure, and (for Theorem 3) normal errors. Results hold for fixed k and G, arbitrary n, and allow clusterwise noninvertibility subject to Assumption 3 (inference targets the well-identified regressors).

Q: What is the central claim of the paper? A: Conventional CRVE and HC variance estimators should be replaced by the jackknife estimator V5 in all linear regression contexts with heteroskedastic or clustered errors. V5 is never downward biased (its expectation weakly exceeds the true variance matrix), whereas CRVE1 and CRVE2 can be arbitrarily downward biased. The Satterthwaite-adjusted V5 confidence interval has excellent finite-sample coverage.

Q: What is the worst-case bias of CRVE1? A: The infimum of E[v̂1²]/v² over all admissible regressor matrices and covariance matrices equals zero (Theorem 2, equation 15). This means that for some data-generating process, the expected CRVE1 variance estimate is arbitrarily close to zero relative to the true variance — full downward bias. Importantly, this pathology holds even under homoskedasticity (Σ = Iₙ) and clusterwise invertibility; it is driven entirely by extreme regressor leverage.

Q: Why is CRVE2 also fully downward biased, and how does its failure differ from CRVE1’s? A: Theorem 2 (equation 16) shows that the infimum of E[v̂2²]/v² over F* also equals zero. The difference is that the proof for CRVE2 requires non-i.i.d. errors, meaning CRVE2’s failure requires manipulation of the covariance matrices in addition to extreme leverage, whereas CRVE1 can fail under i.i.d. errors from leverage alone.

Q: What is clusterwise noninvertibility and why does it matter? A: Clusterwise noninvertibility occurs when deleting a single cluster renders the regressor design matrix X’X − Xg’Xg singular. This happens in regressions with cluster-level fixed effects, with a cluster-level treatment indicator when only one cluster is treated, or with sparse dummy variables. The paper shows that the conventional jackknife estimators V3 and V4 become fully downward biased (infimum of expectation ratio equals zero) under clusterwise noninvertibility, even though Stata’s existing fix of dropping noninvertible clusters was explicitly designed to handle this case.

Q: What is the key innovation in V5 that makes it robust to clusterwise noninvertibility? A: V5 uses the Moore-Penrose generalized inverse in the delete-one-cluster estimator β̂₋g, ensuring all G clusters are included in the sum rather than discarding noninvertible clusters. It also centers at the full-sample β̂ rather than the mean β̄ of delete-one estimates, and omits the (G−1)/G degrees-of-freedom correction. The paper shows these three differences together imply V̂5 ≻ V̂4 ≻ V̂3 in the positive semidefinite ordering.

Q: What does Theorem 1 establish about V5? A: Theorem 1 proves E[V̂5] ≥ V in the positive semidefinite sense for all sample sizes, all regressor matrices, all covariance matrices, and under clusterwise noninvertibility. This conservative property holds without any assumption on cluster sizes, regressor leverage, within-cluster correlation, or heteroskedasticity beyond Assumption 1 (correct specification and finite second moments). The infimum of E[v̂5²]/v² equals 1 (equation 21), meaning the inequality is sharp.

Q: What does the Cauchy distribution bound say, and how useful is it in practice? A: Theorem 3 shows that for any c ≥ 1, the jackknife confidence interval C̃5(c) has coverage probability at least P[|ζ| ≤ c] where ζ is Cauchy. With c = 1.96, this guarantees coverage of at least 70% and test size of at most 30% uniformly over all regression designs and error structures (under normality). The bound is not tight in typical applications — actual coverage is much higher — but it provides the first generally applicable uniform guarantee for clustered/heteroskedastic regression. The Cauchy critical value at 5% is 12.7, far too large for practical use, so the bound is more useful as a theoretical guarantee than as a practical inference tool.

Q: What does Theorem 5 establish about confidence intervals from CRVE1–CRVE4? A: Under normality, the worst-case coverage probability of confidence intervals constructed from any of the four estimators v̂1 through v̂4 equals zero for any finite critical value c (equations 26–29). For v̂1 and v̂2, this holds over the clusterwise-invertible model class F*; for v̂3 and v̂4 it holds over the broader class F allowing noninvertibility. Zero worst-case coverage cannot be fixed by enlarging c, since the result holds for all finite c. This is not an impossibility result in the Bahadur-Savage sense; it is a statement that specific commonly-used intervals fail, while V5-based intervals succeed.

Q: What is the Satterthwaite approximation and how is it implemented? A: The Satterthwaite adjustment replaces the jackknife t-ratio’s exact finite-sample distribution — a ratio of a normal to the square root of a weighted sum of chi-squares — with a scaled t distribution with K degrees of freedom, where K and a scale factor a are matched by moment conditions on the eigenvalues of a design matrix D. The confidence interval is θ̂ ± v̂5 · t^{1−α/2}_K / a, and the p-value uses a Student t or F distribution with the same K and scale. These quantities can be computed without explicit eigendecomposition using trace formulas (equations 38–39), which are preferred computationally when G > k.

Q: What do the simulations show about coverage rates? A: Across six designs (three regressor types × two error types) and G ∈ {6, 12, 40, 100}, CRVE1 falls as low as 57% coverage in the LogNormal regressor/heteroskedastic error design with G = 6. CRVE2 has somewhat better but still substantially undercovering intervals. The conventional jackknife interval undercovers (as low as 85%) in leveraged/heteroskedastic designs. The Satterthwaite jackknife interval achieves coverage uniformly exceeding 93% across all designs, though it can be excessively conservative (100%) in some cases. All simulation estimates have standard errors less than 0.003 (20,000 replications).

Q: Does the Satterthwaite adjustment vanish in large balanced samples? A: Yes. Theorem 7 shows that if the design matrix is uniformly non-singular and no single cluster dominates (maxg ||Xg||² = o(n)), then a → 1 and K → ∞ as n → ∞. Consequently, the Satterthwaite interval converges to the standard normal interval in well-balanced large samples.

Q: How does V5 relate to the classical HC3 estimator? A: Under independent sampling (no clustering, ng = 1), V5 reduces to the HC3 estimator of Andrews (1991) and Davidson and MacKinnon (1993), which uses the Moore-Penrose inverse. The conventional jackknife V3/V4 reduce to the HC3 of MacKinnon and White (1985). The paper’s results thus provide a formal theoretical basis for the longstanding recommendation (by Efron-Stein 1981, MacKinnon-White 1985, Andrews 1991, and others) to use HC3/jackknife standard errors.

Q: What is the practical recommendation for empirical researchers? A: Replace all CRVE1/CRVE2/HC standard errors with V5, computed via the Moore-Penrose generalized inverse including all clusters. Report V5-based standard errors (which are never downward biased) alongside Satterthwaite-adjusted confidence intervals and p-values using equations (30)–(31). The adjustment parameters a and K differ per coefficient and must be computed separately for each. The paper advises against reporting a/v̂5 as an “adjusted standard error” since that quantity loses the never-downward-biased property.

Q: What is the empirical application and what does it find? A: The paper extends Meng, Qian, and Yared (2015), which studies the effect of TV access on demand for redistribution in China using provincial household survey data (30 provinces, multiple years), and Canay, Santos, and Shaikh (2021), who found CRVE1 standard errors may be unreliable in that setting. Applying V5, the jackknife standard error for the TV access coefficient exceeds the CRVE1 standard error, the Satterthwaite interval is wider than the conventional interval, and conclusions about statistical significance are affected.

Q: What are the scope conditions and limitations? A: The bias results (Theorems 1–2) require only correct specification (zero conditional mean) and finite second moments. The Cauchy bound (Theorem 3) additionally requires normal errors; whether a similar bound holds without normality or in G → ∞ asymptotics is left open. The Satterthwaite adjustment applies only to inference on real-valued (scalar) parameters and does not extend to joint hypothesis tests. Assumption 3 limits inference to “well-identified” regressors (those whose leave-cluster-out coefficients are uniquely defined after partialling out controls).

V5 (jackknife variance estimator): The paper’s proposed estimator, defined in equation (10) as the sum over all G clusters of outer products of (β̂₋g − β̂), where β̂₋g uses the Moore-Penrose generalized inverse. Unlike conventional jackknife estimators, V5 includes all clusters (no dropping), centers at the full-sample β̂, and omits the (G−1)/G correction. Its key property is E[V̂5] ≥ V for all regression designs.

Never-downward-biased (conservative) estimator: A variance estimator whose expectation is weakly greater than the true variance in the positive semidefinite sense, for all admissible regressor matrices and covariance structures. V5 has this property; CRVE1, CRVE2, and conventional jackknife estimators do not.

Full downward bias: The worst-case property that the infimum of E[v̂²]/v² equals zero over the model class — meaning the expected variance estimate can be arbitrarily close to zero relative to the true variance. CRVE1 is fully downward biased under clusterwise invertibility alone; CRVE2 requires non-i.i.d. errors; conventional jackknife estimators become fully downward biased under clusterwise noninvertibility.

Clusterwise noninvertibility: The condition where deleting a single cluster g renders the matrix X’X − Xg’Xg singular, so the standard delete-one-cluster estimator β̂₋g is undefined. This occurs in regressions with cluster-level fixed effects, a single treated cluster, or sparse dummy variables. V5 handles this via the Moore-Penrose generalized inverse; Stata’s existing fix of dropping such clusters is shown to be non-robust.

Cauchy distribution bound: Theorem 3’s result that the jackknife confidence interval C̃5(c) has coverage probability at least P[|ζ| ≤ c] for all c ≥ 1, uniformly over all regression designs and error variances (under normality). With c = 1.96, this gives a guaranteed coverage floor of 70%. This is the first generally applicable uniform coverage guarantee for clustered/heteroskedastic regression.

Satterthwaite adjusted t approximation: A data-dependent distributional approximation for the jackknife t-ratio that approximates the denominator’s weighted chi-square distribution by a scaled chi-square with K degrees of freedom, where K and scale factor a are computed from trace formulas involving the design matrix. The resulting confidence interval θ̂ ± v̂5 · t^{1−α/2}_K / a converges to the standard normal interval in well-balanced large samples.

Regressor leverage: The degree to which variation in a coefficient of interest is concentrated in a small number of clusters. High leverage (when one cluster dominates the regressor of interest) is the mechanism by which CRVE1/CRVE2 achieve worst-case downward bias even under homoskedasticity.

Joined at the Hip: Monetary and Fiscal Policy in a Liquidity-Dependent World

Mon, 01 Jan 0001 00:00:00 +0000

Layer 1 — What this paper finds and why it matters

Calvo and Velasco study an economy where both money and government bonds provide liquidity services, and they show that this shared role implies bond-financed fiscal expansions can be neutral or contractionary — not merely less effective than hoped. The mechanism turns on a fundamental asymmetry: the price of money in terms of goods is pinned down by sticky prices, whereas the price of long-term bonds is free to jump immediately in response to expected changes in bond supply. When the government announces a future bond-financed transfer to households, bond prices fall right away, compressing total liquidity before a single new bond is actually issued; the liquidity-in-advance constraint then forces aggregate demand and output down, producing a recession that precedes and is qualitatively separable from any subsequent boom. The paper maps four distinct timing cases — unanticipated permanent, anticipated permanent, unanticipated transitory flow, and unanticipated temporary stock — and shows each has a different (and sometimes opposite) short-run sign for output. To prevent these contractionary liquidity effects, the central bank must cut the interest rate on money and expand the money supply in ways that are precisely coordinated with the timing of the bond helicopter drop; in this sense fiscal and monetary authorities are, the authors conclude, joined at the hip. The paper also distinguishes this result from standard fiscal-dominance stories: the monetary authority is not compelled to finance the deficit but to stabilize bond prices in order to protect aggregate demand.

In depth

Q1. What is the central question and how does the paper differ from the standard New Keynesian framework?

The central question is whether bond-financed government transfers raise, lower, or leave unchanged aggregate demand and output when bonds provide liquidity services. Standard Keynesian and New Keynesian treatments focus on whether expansionary fiscal policy crowds out private investment through higher interest rates, or amplifies demand when the zero lower bound binds. Calvo and Velasco instead focus on the liquidity channel: because long-term bond prices are free to jump on news about future bond supply, increases in expected bond issuance can immediately reduce the market value of outstanding bonds, compressing total liquidity in private portfolios and thereby reducing consumption and output even before any new bond is issued. They call this a “non-standard” result and note that, by contrast, the price of money is insulated from such anticipatory jumps by sticky goods prices.

Q2. What is the model structure?

The paper uses a bare-bones, continuous-time, closed-economy model with a single infinitely lived household, one consumption good, and two assets in positive net supply: money (equated with central-bank reserves) and a long-term government bond (a perpetuity paying a coupon). The key friction is a liquidity-in-advance constraint — households must hold sufficient liquidity (a weighted combination of real money balances and the real market value of bonds) to consume. The supply side is a standard Calvo (1983) Phillips curve. Policy instruments are the nominal interest rate on money, the nominal money supply, the nominal bond supply, and the bond coupon; the price of long-term bonds is endogenous. Commercial banks are abstracted away: money is effectively a CBDC. The paper notes that all main results also go through under a money-in-the-utility-function specification, provided the elasticity of substitution between consumption and liquidity is sufficiently low.

Q3. What does “liquidity” mean in the paper’s own sense, and why does the bond price matter for it?

Liquidity is defined as a CES-weighted sum of real money holdings and the real market value of bond holdings, where the market value of bonds equals the bond price times the real quantity outstanding. Because the bond price is free to jump, the market value of bonds (and therefore total liquidity) can change instantaneously in response to news, even when neither the nominal money stock nor the nominal bond stock has yet changed. Money does not share this vulnerability: its “price” in terms of goods is fixed in the short run by nominal price stickiness. This asymmetry — sticky price of money, flexible price of bonds — is the paper’s central mechanism. The authors attribute the stickiness insight to Keynes’s General Theory (the “price theory of money” as labelled by Calvo 2012).

Q4. What happens when the bond supply rises unexpectedly and permanently?

An unanticipated and permanent step increase in the nominal (and, on impact, real) supply of long-term bonds is neutral: consumption and output are unchanged. Bond prices fall immediately so that the total market value of bonds outstanding — and therefore total liquidity — is the same as before. The analogy drawn is to an unanticipated permanent increase in the money supply under fully flexible prices, which also has no real effects. The coupon must rise proportionally so that the return on bonds remains at its steady-state level. The paper notes that neutrality may not hold if bond holdings are distributed non-uniformly (e.g., concentrated in financial intermediaries that use bonds as repo collateral), because the drop in bond prices could trigger runs on those institutions.

Q5. What happens when a permanent bond-supply increase is anticipated in advance?

An anticipated and permanent future step increase in nominal bond supply causes a recession during the announcement-to-implementation interval, before any new bond has been issued. Because arbitrage prevents an anticipated capital loss on bonds, the bond price cannot jump down at the implementation date T. Instead it must fall gradually starting at announcement date 0, reaching its new (lower) steady-state level exactly at T. This declining bond price reduces the market value of bonds and thereby compresses total liquidity throughout the interval [0, T), generating deflation and a negative output gap over that entire period. A naïve observer who notes an output boom just as the government begins to issue bonds at T would incorrectly conclude the policy is expansionary, when in fact the boom is the recovery from the pre-implementation recession.

Q6. What happens when the fiscal authority issues bonds at a constant rate for a finite period (transitory flow)?

An unanticipated, transitory, constant-rate bond issuance over an interval [0, T) also has a recessionary impact on impact and during the issuance period. Bond prices fall faster than the nominal bond stock accumulates, so the total market value of bonds declines and liquidity is compressed. The Calvo-Phillips equation evaluated with negative and rising inflation implies a negative output gap throughout the early part of the episode. A boom follows after bond issuance ends — not because “confidence is restored” or fiscal sustainability has improved, but because the boom is mechanically part of the same liquidity-adjustment cycle as the earlier recession.

Q7. What happens under an unanticipated but temporary step increase in the bond stock?

An unanticipated but temporary step increase in bond supply — one that will be reversed at a known future date T — is expansionary on impact. Because the price of bonds cannot be anticipated to jump at T, the bond price must rise from its impact level back to the initial steady state by T. On impact, the bond price falls but by less than the increase in nominal bond supply, so the market value of bonds rises and total liquidity increases, pushing aggregate demand and output above their natural rates. The initial boom is thus followed by a recession around the time bond supply is cut back, which the authors note could generate political pressure to extend the “expansionary” fiscal policy.

Q8. What is the common mechanism linking the contractionary cases?

In both contractionary cases (anticipated permanent and unanticipated transitory flow), the bond price falls more rapidly than the bond stock rises, so the total market value of bonds declines, compressing liquidity. From the model’s liquidity identity (equation 18 in the paper), total liquidity depends on real money balances (fixed on impact) plus a weight on the relative position of bonds to money. When that relative position (captured by the variable s_t in the model) falls, total liquidity falls. The liquidity-in-advance constraint then directly constrains consumption and output downward. Deflation is the only endogenous mechanism to rebuild real liquidity, but it works gradually and involves a protracted recession.

Q9. What monetary policy does the paper prescribe to neutralize the contractionary effects?

To avoid the contractionary liquidity effects of anticipated bond helicopter drops, the central bank must cut the interest rate on money and expand the money supply in a manner whose precise time profile depends on the timing of the fiscal shock. For an anticipated permanent bond-supply increase, the required monetary response involves gradually expanding the nominal money supply between announcement and implementation, followed by a discrete step decrease in nominal (and real) money at exactly the moment bond supply jumps up. This coordinated monetary expansion offsets the bond-price-driven compression of liquidity. The paper confirms this formally in Section IV (not fully extracted in the source text), with the conclusion that avoiding unwanted contractionary effects requires coupling fiscal bond issuance with specific, coordinated monetary actions.

Q10. How does the paper relate to fiscal dominance — and how does it differ?

The paper identifies a novel form of fiscal dominance in which monetary policy is compelled not to monetize the fiscal deficit but to stabilize government bond prices in order to protect aggregate demand and inflation. Traditional fiscal dominance (common in emerging markets) forces the central bank to print money to finance the deficit. Here, the mechanism is different: expected bond issuance drives down bond prices and compresses liquidity, so the central bank must intervene in bond markets — effectively buying newly issued bonds — to prevent deflationary recessions. An outside observer could mistake this for traditional monetization. The paper frames the Federal Reserve’s $1 trillion Treasury purchase program from mid-March 2020 onward as consistent with this bond-price-stabilization logic, citing Vissing-Jorgensen (2021) on the causal role of Fed purchases in driving down yields through acute liquidity provision.

Q11. What is the scope of the non-standard results?

The non-standard (neutral or contractionary) results apply specifically to bond-financed increases in government transfers to the private sector; money-financed fiscal expansion and bond-financed government consumption changes are not the focus and do not share these properties in the model. The authors explicitly note this caveat. However, they argue the exercise is policy-relevant because much of the fiscal response to both the 2008 Global Financial Crisis and the Covid-19 crisis took the form of sharp increases in government transfers financed by bond issuance. The model also assumes lump-sum taxes, so in the absence of liquidity effects Ricardian equivalence would obtain; all non-neutralities are driven entirely by the liquidity channel.

Key concepts

Liquidity-in-advance constraint : An analog of a cash-in-advance constraint in which the household must hold a weighted sum of real money balances and the real market value of bonds sufficient to finance current consumption; it always binds in the model’s equilibrium, so liquidity directly pins down output.

Price theory of money : The proposition (attributed to Keynes and labelled by Calvo 2012) that money is highly liquid partly because the nominal goods-price level is sticky, fixing the price of money in terms of goods; this insulates the real value of money from the anticipatory jumps that affect bond prices.

Bond helicopter drop : A government transfer to households financed by issuing long-term bonds (perpetuities), with no change in taxes or money supply; the term “helicopter drop of bonds” is used by the authors to parallel Friedman’s helicopter money but with bonds as the instrument.

Bond-price stabilization (non-traditional fiscal dominance) : The authors’ term for a situation in which expected fiscal bond issuance compresses bond-market liquidity and forces the central bank to expand money supply and cut the interest rate on money in order to stabilize bond prices and prevent contractionary effects, even though the central bank is not formally required to finance the deficit.

s_t (bond-to-money relative position) : A model variable defined as the log-deviation from steady state of the ratio of the real market value of bonds to real money balances; it captures the relative contribution of bonds to total portfolio liquidity and is the key endogenous state variable linking bond-price dynamics to aggregate demand.

Calvo-Phillips curve : The standard Calvo (1983) staggered-pricing supply side, used here to generate the inflation-output gap trade-off; in the paper’s notation, inflation dynamics satisfy π̇_t = δπ_t − κ(y_t − ȳ), where output gaps are driven by liquidity shortfalls rather than standard demand shocks.

Jumpstarting an International Currency

Mon, 01 Jan 0001 00:00:00 +0000

This paper asks how a currency achieves international status — moving from zero to positive cross-border use — and whether deliberate central bank policy can accelerate that transition. The authors focus on the People’s Bank of China (PBoC) swap lines signed between 2009 and 2018, which extended RMB-denominated lender-of-last-resort credit to foreign central banks for the stated purpose of supporting RMB-denominated trade finance and settlement.

The empirical analysis combines two datasets. The first covers every RMB swap line agreement the PBoC signed with a foreign central bank (38 countries by 2018), compiled from PBoC news releases and validated against counterparty communications, treated as a staggered binary absorbing treatment. The second is monthly SWIFT data on cross-border payment message values (October 2010 – October 2018), disaggregated by currency and message type (payment orders MT103/MT202 and trade-finance messages MT400/MT700). The working sample, after excluding financial centre hubs, sanctioned countries, pre-sample treated countries, and small economies, covers 114 countries with 11,058 observations, of which 21 are treated during the sample period.

The main identification strategy is a staggered difference-in-differences design using the imputation estimator of Borusyak et al. (2024), with controls for bilateral trade with China, Chinese economic policy variables (RMB clearing bank presence, AIIB membership, infrastructure investment flows, UN voting alignment), and regional RMB adoption trends. The authors are explicit that conditional independence is not guaranteed and characterize results as documenting an association.

At the extensive margin, signing a swap line is associated with an approximately 14 percentage point increase in the probability that a country uses the RMB for international payments in a given month (baseline column: 11%, rising to approximately 14% with controls and approximately 20% when anticipation effects are accounted for by shifting treatment timing six months earlier). At the intensive margin — using ln(1 + RMB payments) and Poisson specifications — RMB usage is between 250% and 440% higher in treated countries following the policy. The effect concentrates within the first 12 months of signing and persists without reversion. The effect is present in payments not involving China as a counterparty, is not explained by Belt and Road Initiative membership, and does not extend to bilateral trade volumes with China.

Four mechanisms from the paper’s theoretical model are tested and supported. First, swap lines reduce offshore RMB borrowing costs by an estimated 115 basis points on average (rising to 205 basis points for emerging market currencies). Second, the 2015–16 RMB crisis — in which the PBoC drained offshore liquidity to defend the exchange rate peg, sharply raising private RMB borrowing costs — caused a significant decline in RMB use among countries without a swap line but not among those with one, consistent with the model’s prediction that swap lines cap the right tail of borrowing cost distributions. Third, effects are concentrated in trade-finance SWIFT messages, stronger in countries with above-median trade shares with China, and increasing in intermediate import intensity and working capital reliance. Fourth, the RMB gains displace existing international currencies — the USD share falls by approximately 8 percentage points and the EUR share by approximately 2.5 percentage points — rather than displacing local currencies, as the model predicts. There are also geographic spillovers: a neighboring country signing a swap line is associated with a 10% increase in RMB use even for countries that did not sign.

The theoretical framework models import-export firms that choose simultaneously the currency of trade finance and the currency of sales invoicing. Sticky prices create a complementarity between these two choices. A swap line truncates the right tail of the borrowing cost distribution (first-order stochastic dominance), which can push firms above a threshold into using the rising currency for both liabilities and invoicing. The model predicts threshold behavior — a currency either jumpstarts or does not — and explains why only a small number of currencies ever achieve international status.

Q: What are the PBoC swap lines and how do they mechanically affect firms? A: A PBoC swap line is a renewable 3-year agreement between the PBoC and a foreign central bank that allows the foreign central bank to borrow RMB and on-lend it domestically to support RMB-denominated trade finance. Like other central bank lending facilities, they place a ceiling on interest rates, thereby truncating the right tail of the distribution of RMB borrowing costs faced by commercial banks and their firm customers. The key insurance property holds even when lines are not actively drawn upon, because their existence caps tail risk.

Q: What is the extensive margin finding for swap lines and RMB payments? A: Signing a swap line is associated with an approximately 11% increase in the probability that a country uses the RMB for cross-border payments in a given month without controls, rising to approximately 14% with the full set of controls, and to approximately 20% when treatment timing is shifted six months earlier to account for anticipation effects. The event study shows the effect concentrates within 12 months of signing and does not revert.

Q: What is the intensive margin finding? A: Using ln(1 + RMB payments) and Poisson specifications — preferred because Mongolia is an outlier and payment value volatility is increasing in payment level — treated countries have RMB payment values between 250% and 440% higher than control countries after signing. The RMB share of payments rises by 0.13 percentage points on average, compounding to approximately 0.3 percentage points in years 3–4, or roughly one-fifth of the overall rise in RMB payments over the full sample period.

Q: How do the authors address the concern that swap lines are signed precisely when economic integration with China is deepening? A: They include a comprehensive set of controls: bilateral export and import values to/from China, the ratio of Chinese trade to GDP, China trade agreement status, RMB clearing bank presence, AIIB membership, infrastructure investment flows, and UN voting alignment. They also show separately that (i) the effect is present in RMB payments not involving China as a counterparty, (ii) Belt and Road Initiative membership does not account for the effect, and (iii) there is no increase in bilateral trade with China following swap line signing. The authors nonetheless characterize results as documenting an association, not establishing causation.

Q: Do swap lines actually reduce RMB borrowing costs as the model requires? A: Yes. Using the same staggered difference-in-differences methodology, signing a swap agreement is associated with a 115 basis point fall in offshore RMB borrowing rates on average. For emerging market currency comparators the effect rises to 205 basis points. The event study shows an immediate and sustained reduction with no detectable pre-trend.

Q: What does the 2015–16 RMB crisis reveal about the mechanism? A: In August 2015 the PBoC adjusted its RMB-USD central parity rate, triggering a 3% depreciation over two days and subsequent offshore liquidity drainage that raised both the level and volatility of offshore RMB borrowing costs until approximately April 2017. This shock was primarily financial rather than reflecting a Chinese economic slowdown. Countries without a swap line experienced a sharp decline in RMB payment usage in 2015Q4, while countries with a swap line — whose right-tail borrowing costs were capped — did not, consistent with the model’s prediction that the lines insulate against tail risk shocks.

Q: Are the effects concentrated in trade finance as the model predicts? A: Yes. Restricting the analysis to SWIFT trade-finance message types (MT400 and MT700), the coefficient estimates are similar in magnitude to those for all payments. Effects on the trade finance extensive margin are concentrated among countries with above-median trade shares with China. The effects are also increasing in countries’ intermediate import intensity and in the degree to which export industries rely on working capital.

Q: Which currencies does the RMB displace and which does it not displace? A: The swap line is associated with a 14 percentage point rise in the RMB share of payments to and from China. Decomposing this: the USD share falls by approximately 8 percentage points, the EUR share by approximately 2.5 percentage points, the combined GBP/JPY/CHF share by approximately 0.5 percentage points, and other currencies by approximately 3 percentage points. The local currency of the country receiving the swap line does not show a statistically significant decline, consistent with the model’s prediction that the RMB competes primarily with existing international vehicle currencies rather than with domestic currencies.

Q: Are there geographic spillovers from swap lines? A: Yes. A neighboring country (defined as countries within 1,000 km, or the nearest five if fewer than five are within that distance) signing a swap line is associated with a 10% increase in RMB payments for the non-signatory neighbor. The authors attribute this to supply chain linkages: firms importing RMB-invoiced inputs from a swap-line country face an incentive to adopt RMB for their own downstream transactions.

Q: What does the model predict about which currencies can ever become international? A: The model identifies three thresholds a currency must pass. First, exchange rate variance must be sufficiently low; most currencies fail this condition. Second, the right tail of borrowing costs in that currency must not be too high; skewed distributions fail the threshold condition in Proposition 2. Third, the currency-issuing country must be large enough as an export market or intermediate input source to generate the complementarity factor Psi that makes adopting the currency worthwhile. Most currencies fail on multiple dimensions, explaining why so few achieve international status.

Q: How do sticky prices create the complementarity between trade finance currency and invoicing currency in the model? A: Firms set prices in advance before exchange rates and borrowing costs are realized. If a firm borrows in currency r to finance imported inputs but prices its exports in currency d, cost and revenue shocks are mismatched, creating profit volatility. Nominal price stickiness means firms cannot adjust prices ex post to maintain constant markups. This makes it optimal to align the currency of liabilities (trade finance) with the currency of export invoicing, creating a complementarity that amplifies the effect of a reduction in r-currency borrowing costs on invoicing currency choice.

Q: How do the authors handle the potential bias from heterogeneous treatment effects in the staggered difference-in-differences design? A: They use the imputation estimator of Borusyak et al. (2024), which is robust to heterogeneous treatment effects across cohorts, clustering standard errors at the country level and averaging treatment effects by cohort. They also verify results using the synthetic difference-in-differences estimator of Arkhangelsky et al. (2021), which reweights observations to equalize pre-treatment trends, and show results are robust across both two-way fixed effects and these more modern estimators.

Q: What historical parallel do the authors draw and what does it imply for the RMB’s future? A: The paper draws a parallel with the USD’s displacement of pound sterling in trade finance in the decade following the Federal Reserve’s creation in 1913 and the establishment of bankers’ acceptances. That transition was supported by World War I’s damage to the UK economy and rapid US economic growth. The authors conclude that RMB internationalization will require not only continued policy support but also favorable economic fundamentals including sound monetary policy and deeper capital markets.

Q: How does the PBoC’s swap line program differ from Federal Reserve and ECB swap lines? A: PBoC lines differ in four key respects: they have longer maturities (3-year renewable agreements vs. shorter-term Fed/ECB lines); they involve a large and diverse set of mostly developing countries rather than a handful of advanced economies; they target trade finance in a context of limited RMB cross-border banking rather than addressing foreign-bank dollar funding shortfalls caused by dollar dominance; and they were designed to initiate internationalization rather than to respond to an existing dominant currency’s liquidity stresses. The aggregate notional limit of approximately RMB 3 trillion is nonetheless comparable in scale to the USD 600 billion of peak drawings from Fed swap lines.

International currency jumpstart: The process by which a currency moves from zero to positive international use, as opposed to the better-studied phenomenon of a currency achieving dominance. The paper distinguishes jumpstart (initial adoption) from dominance (widespread adoption), arguing that different mechanisms govern each stage.

PBoC swap lines: Renewable 3-year agreements between the People’s Bank of China and foreign central banks enabling the latter to borrow RMB and on-lend it domestically for RMB-denominated trade finance. In the paper’s framework, they function as an extension of the lender of last resort function abroad, placing a ceiling on offshore RMB borrowing costs and truncating the right tail of the borrowing cost distribution.

Trade finance currency complementarity: The paper’s central mechanism — the alignment incentive between the currency of a firm’s liabilities (working capital / trade finance for imported inputs) and the currency of its export invoicing. Sticky prices create this complementarity because misaligned currency choices expose firms to uninsurable profit volatility.

Borrowing cost distribution truncation: The mechanism by which a swap line affects firm behavior — not by lowering average costs but by capping the right tail of the distribution of possible RMB borrowing rates. The model requires first-order stochastic dominance of the post-swap-line distribution over the pre-swap-line distribution.

Threshold condition for currency adoption: Derived from the model’s Proposition 2, the condition on the expected concave function of borrowing costs relative to an adjusted interest rate differential that must be satisfied for a firm to choose r-currency credit over d-currency credit. The complementarity factor Psi, which increases with the size of the rising-currency market, enters this threshold.

Extensive vs. intensive margin of currency use: The extensive margin refers to whether a country uses the RMB at all in a given month (1(Rpayment > 0)); the intensive margin refers to the share of payments denominated in RMB or the log value of RMB payments. The paper finds the swap lines affect both margins, with the extensive margin effect appearing immediately and stabilizing after 12 months.

Vehicle currency displacement: The paper’s empirical finding that RMB adoption displaces existing international vehicle currencies (USD, EUR) rather than local currencies. This is a prediction of the model: firms adopting RMB for trade finance were previously using an existing international currency, not their domestic currency, for that purpose.

Latent Heterogeneity in the Marginal Propensity to Consume

Mon, 01 Jan 0001 00:00:00 +0000

Lewis, Melcangi, and Pilossoph estimate the unconditional distribution of the marginal propensity to consume (MPC) using the 2008 Economic Stimulus Act (ESA) rebate payments, deploying Gaussian mixture linear regression (GMLR) — a clustering regression approach — rather than the standard practice of interacting the rebate with observable household characteristics. The key methodological departure is that households are assigned to groups not by any presupposed observable, but by how well estimated group-specific MPCs describe each household’s actual consumption response; this allows recovery of the full unconditional MPC distribution, including heterogeneity driven by latent (unobservable) factors.

Data come from the 2008 Consumer Expenditure Survey (CEX), which contains household-level expenditure data and supplemental questions on ESA payments. Identification exploits the quasi-random timing of rebate receipt, determined by the last two digits of recipients’ Social Security Numbers, following the design of Parker, Souleles, Johnson, and McClelland (2013). The specification is updated following Borusyak et al. (2024) to avoid “forbidden comparisons” in staggered treatment settings. The number of groups G is selected by BIC, which selects G = 3 for total expenditures, confirmed by K-fold cross-validation.

The main finding is substantial MPC heterogeneity. For total expenditures, the three estimated group-level MPCs are 0.04, 0.23, and 1.33, with population shares of 30%, 48%, and 23% respectively. The implied aggregate (share-weighted average) MPC is 0.42, compared to 0.24 in the homogeneous Parker et al. (2013) specification estimated on the same data. Splitting by consumption category: for nondurables, two groups have MPCs of 0.09 and 0.18, with roughly equal population shares, and the lower bound of 0.09 is statistically distinguishable from zero — evidence against strict adherence to the Permanent Income Hypothesis even among the lowest-MPC group. For durables, the MPC distribution is dichotomous: about 29% of households have a durable MPC statistically indistinguishable from zero, while 21% have an MPC of 0.67. The cross-good correlation between household-level nondurable and durable predicted MPCs is only 0.13, ruling out strong substitution but indicating weak complementarity.

Turning to observable determinants, the paper finds that many household characteristics are individually correlated with estimated MPCs — including homeownership, mortgage status, income, and the average propensity to consume (APC) — despite the fact that the same dataset and similar identification strategies previously yielded insignificant relationships. Homeowners have significantly higher MPCs than renters; households with a mortgage have even higher MPCs than outright homeowners. In salary income, households in the top tercile spend 0.17 more per rebate dollar than the baseline group; households in the top tercile of non-salary income spend 0.19 more. However, in joint regressions, only two characteristics remain robustly and positively correlated with MPCs: total income (both salary and non-salary components) and the APC. The APC relationship is particularly notable: a one-percentage-point higher prior spending rate is associated with 0.19 additional cents spent per rebate dollar in the full multivariate specification.

The paper identifies three groups in the joint income-APC space: “poor savers” (low income, low APC, lowest MPCs), an intermediate group (high income or high APC but not both), and “rich spenders” (high income and high APC, highest MPCs). The “rich spender” group has received little prior attention in consumption-savings models.

Critically, observable characteristics jointly explain at most 8% of MPC variation (adjusted R-squared from a measurement-error correction). With 92% of MPC heterogeneity unexplained by standard observables, the authors conclude that a substantial share of variation reflects latent household traits — plausibly heterogeneity in discount rates or intertemporal elasticities of substitution. This finding also limits the practical scope for government targeting of fiscal transfers: because observable characteristics predict little MPC variation, any targeting strategy can exploit only a small fraction of the overall distribution.

Scope conditions: results apply to household expenditure responses (marginal propensities to spend, not to consume in the strict sense) within one quarter of rebate receipt. The income-MPC positive correlation is confined to households within the income range eligible for the 2008 ESA (phased out above $150,000 for joint filers). The sample excludes the top and bottom 1.5% of consumption changes as outliers.

Q: What is the core methodological innovation of this paper? A: The paper applies Gaussian mixture linear regression (GMLR) to the 2008 tax rebate setting, jointly estimating group-level MPCs and household group membership probabilities without imposing any prior restriction on which observable characteristics drive heterogeneity. Because groups are determined by how well group-specific MPCs explain consumption patterns rather than by presupposed observables, the method recovers the full unconditional distribution of MPCs, including latent heterogeneity. This contrasts with sample-splitting approaches that can only recover co-variation with chosen characteristics.

Q: What are the three group-level MPCs for total expenditures, and what shares of the population do they represent? A: The three estimated MPCs are 0.04 (30% of households), 0.23 (48%), and 1.33 (23%), all with precisely estimated group shares (standard errors of 0.01). The largest MPC of 1.33 is statistically significant at the 1% level. The lowest MPC of 0.04 is not statistically different from zero even under the more favorable conditional standard errors that treat group assignment as known.

Q: How does the average MPC implied by the GMLR distribution compare to the homogeneous specification? A: The share-weighted average MPC from the three-group GMLR is 0.42, compared to 0.24 from the homogeneous (G=1) specification on the same data and identification strategy. This gap arises partly because the homogeneous estimate averages across households with very heterogeneous responses, and partly because the distribution has a right-skewed tail with a meaningful mass at MPC above 1.

Q: What are the MPC distributions for nondurable and durable goods separately? A: For nondurables, BIC selects two groups with MPCs of 0.09 and 0.18 and roughly equal population shares (48% and 52%); crucially, the lower bound of 0.09 is statistically distinguishable from zero at the 5% level, providing evidence that no household strictly follows the Permanent Income Hypothesis for nondurables. For durables, BIC selects three groups: MPCs of 0.03 (not distinguishable from zero, 29% of households), 0.15 (50%), and 0.67 (21%), reflecting the discrete, lumpy nature of durable goods purchases.

Q: How correlated are nondurable and durable MPCs at the household level? A: The correlation between household-level posterior predicted MPCs for nondurables and durables is 0.13, statistically significant at the 1% level. This rules out substitution between goods categories, but the positive complementarity is quantitatively small. The authors interpret this as possibly reflecting a small share of “spender” types who adjust multiple consumption categories in response to transitory income shocks.

Q: Which observable characteristics are individually correlated with MPCs? A: Homeowners have significantly higher MPCs than renters; households with a mortgage display even greater MPCs than outright homeowners. Both salary and non-salary income are positively correlated: households in the top tercile of salary income have MPCs about 0.13 higher than the omitted group, and top-tercile non-salary income households have MPCs about 0.015 higher (though the latter is individually less precisely estimated). The average propensity to consume (APC) is significantly positively correlated with the MPC, with a coefficient of 0.075 in univariate regression and 0.166 in the full joint specification.

Q: Which observable characteristics remain significant in the joint (multivariate) regression? A: When all household characteristics are included jointly, only income (both salary and non-salary components) and the APC remain robustly and positively correlated with MPCs. Top-tercile salary income is associated with 0.112 higher MPCs and top-tercile non-salary income with 0.049 higher MPCs, while the APC coefficient rises to 0.166 (from 0.075 univariate). Homeownership, age, education, and most demographic controls become statistically insignificant in the joint specification.

Q: What fraction of MPC variation is explained by observable characteristics? A: The adjusted R-squared from the full multivariate regression of predicted MPCs on all observable characteristics is approximately 6%. After a measurement-error correction proposed in Supplement A.6 to account for noise in estimated posterior MPCs, the corrected R-squared rises to 8%. Either way, the vast majority — over 90% — of MPC heterogeneity is unexplained by standard observables, implicating latent household traits such as heterogeneous discount rates or intertemporal elasticities of substitution.

Q: How does the extent of MPC heterogeneity recovered by GMLR compare to sample-splitting on observables? A: Table 4 shows that splitting by age terciles yields MPC estimates ranging from 0.13 to 0.34; splitting by total income yields a range of 0.18 to 0.45; splitting by the APC yields 0.06 to 0.21. All of these ranges are far narrower than the GMLR-recovered range of 0.04 to 1.33. The authors argue that sample-splitting on individual observables, which are noisy and correlated with only a portion of MPC heterogeneity, systematically understates the true extent of heterogeneity.

Q: What is the “rich spender” finding and why is it theoretically notable? A: Households with both high total income and a high prior average propensity to consume have the largest MPCs. This “rich spender” group is poorly accommodated by standard consumption-savings models: the canonical one-asset incomplete markets model typically predicts a negative MPC-APC correlation conditional on income, and the two-asset Kaplan-Violante (2014) model can generate wealthy hand-to-mouth households with high income and high MPCs, but not necessarily high APCs. Preference heterogeneity — e.g., heterogeneous intertemporal elasticities of substitution as in Aguiar, Boar, and Bils (2019) — can rationalize the positive income-APC-MPC nexus.

Q: What explains the positive income-MPC correlation, and how does the paper relate it to the prior literature? A: The paper notes that this positive correlation is consistent with Kueng (2018), who finds higher spending propensities among high-income recipients of Alaska Permanent Fund payments, and rationalizes it via near-rationality or mental accounting: when a rebate is small relative to income, the perceived cost of deviating from consumption smoothing is low. The authors also note that low-income households still exhibit large absolute MPCs, suggesting sizable deviations from consumption smoothing at the bottom of the income distribution, even if relatively lower than for high-income households.

Q: What are the policy implications for targeting fiscal transfers? A: The paper finds that the 2008 ESA increased spending for all households in partial equilibrium (minimum group MPC of 0.04, nondurable lower bound 0.09, all statistically positive or near-positive). Among observable characteristics, targeting relatively higher-income households (including retirees and entrepreneurs via non-salary income) would maximize aggregate consumption effects. However, since observables explain only 8% of MPC variation, any targeting strategy can exploit only a small fraction of the overall heterogeneity; the government faces fundamental limits on feasible targeting. This also implies a tension between stimulus and distributional/insurance motives for transfer programs.

Q: How does the paper confirm that recovered heterogeneity is not spurious? A: The authors generate 250 Monte Carlo samples from the estimated homogeneous model, impose G=3, and re-run the GMLR and observable regressions; they find significant relationships with observable characteristics in virtually none of these samples. Additionally, applying the BIC to homogeneous Monte Carlo samples, the BIC selects G=1 in all 250 samples, confirming that the selected G=3 in actual data reflects genuine heterogeneity rather than overfitting.

Q: How does GMLR compare to quantile regression for recovering the MPC distribution? A: Quantile regression (as used by Misra and Surico (2014) on the same data) recovers relationships at percentiles of the overall conditional distribution of consumption changes, so the ranking of households is driven by all sources of variation in consumption, not just the rebate response. If factors unrelated to the rebate dominate the conditional distribution, MPC heterogeneity will be underestimated in the presence of noise. The authors illustrate this formally in Supplement B and note that Misra and Surico (2014) find a substantial share of MPCs at or below zero for nondurables, in contrast to the GMLR lower bound of 0.09 that is statistically positive.

Q: What do the longer-run (lagged) MPC estimates show? A: The specification includes up to two lags of rebate indicators, allowing measurement of spending responses in subsequent quarters after rebate receipt. The paper reports these results (Section 4.4) but the text provided does not fully detail them; the heterogeneous structure is maintained across horizons.

Gaussian Mixture Linear Regression (GMLR): A probabilistic clustering regression approach that jointly estimates group-specific regression coefficients (here, MPCs) and population group shares by maximizing an expected log-likelihood via the EM algorithm. Households receive continuous posterior weights (gamma_{jg}) reflecting uncertainty about their group membership rather than binary hard assignment, with identification from a Gaussianity assumption on within-group errors.

Unconditional MPC Distribution: The full marginal distribution of MPCs across all households in the population, capturing heterogeneity from both observable and latent (unobservable) sources. Contrasted in the paper with the conditional distributions recovered by sample-splitting on observables, which by construction can only reflect co-variation with the chosen splitting variable.

Posterior Predicted MPC: For each household, the expectation of the group-specific MPC weighted by the household’s posterior group membership probabilities (lambda-tilde_{0,j} = sum_g gamma_{jg} lambda_{0g}). This object is the optimal (MSE-minimizing) individual-level MPC prediction and is the relevant input for targeted fiscal policy design.

Latent Heterogeneity: MPC variation that cannot be attributed to any observable household characteristic and is instead driven by unobserved traits — plausibly heterogeneous discount rates, intertemporal elasticities of substitution, or other preference parameters. Operationalized as the share of MPC variance unexplained by observable regressors (approximately 92% in this paper).

Rich Spenders: A group identified jointly in the APC-income space: households with both high total income and a high average propensity to consume, displaying the largest marginal propensities to consume out of the rebate. This group is not well-accommodated by standard one-asset or two-asset incomplete markets models under homogeneous preferences.

Average Propensity to Consume (APC): Defined empirically as average lagged consumption expenditures divided by total income, intended to capture persistent preference heterogeneity — a “spender type” — by measuring how much of income a household habitually spends before receiving the rebate. A one-percentage-point higher APC is associated with 0.19 additional cents spent per rebate dollar in the full multivariate specification.

Forbidden Comparisons: A bias identified by Borusyak et al. (2024) in event-study designs with staggered treatment, arising when newly treated units are compared to previously treated units rather than true controls. The paper addresses this by regressing consumption changes on rebate receipt indicators (iota_{jl}) directly rather than on rebate amounts, and including lagged rebate indicators to account for persistent effects.

Liquidity Traps, Prudential Policies, and International Spillovers

Mon, 01 Jan 0001 00:00:00 +0000

The paper develops a tractable open-economy New Keynesian model with nominal rigidities and an occasionally binding zero lower bound (ZLB) to study how monetary policy and macroprudential policy (modeled as a tax on capital flows) jointly transmit to output, capital flows, and the exchange rate, and what this implies for international spillovers and global welfare. An analytical decomposition identifies three transmission channels — intertemporal substitution, expenditure switching, and aggregate income — and the calibration finds that capital controls operate almost entirely through intertemporal substitution (about 95%), whereas expenditure switching accounts for roughly a quarter to a third of the effect of monetary policy. On the normative side, the authors show that, absent capital controls, monetary policy faces a tradeoff between stabilizing output today and curbing capital flows to lower the likelihood of a future liquidity trap, but that ’leaning against the wind’ (pre-emptively raising rates) is not necessarily optimal and can be counterproductive when tradables and non-tradables are highly substitutable. Quantitatively, adding capital controls lowers the average unemployment rate conditional on a liquidity trap from about 6% to about 1.5% and cuts the unconditional welfare cost of liquidity traps from about 0.4% to about 0.1% of permanent consumption, with an average ex-ante tax on inflows of about 0.2% and an average ex-post tax on outflows of about -0.05%. Finally, contrary to ‘currency war’ concerns, the authors argue that capital controls are not beggar-thy-neighbor: a country can use them to insulate itself from adverse foreign-policy spillovers (which operate through the world real interest rate), and coordination is beneficial only during a liquidity trap and works by stimulating rather than restricting flows. All results hold within their small-open-economy model under its calibration.

In depth

Q1. What is the model, and which policies does it study?

The paper studies an infinite-horizon small open economy with nominal rigidities and an occasionally binding zero lower bound on the nominal interest rate, in which the government has two instruments — the nominal interest rate (monetary policy) and a tax on capital flows (macroprudential policy). The economy has a tradable final good and a non-tradable good with sticky prices, and features aggregate demand externalities. The authors use this setting to ask three questions: how interrelated are the transmission channels of the two policies; how should monetary policy be used jointly with macroprudential policy; and what happens to global welfare when many countries adopt prudential policies simultaneously.

Q2. What are the three transmission channels, and how much does each matter?

An analytical decomposition (extending Kaplan, Moll and Violante 2018 and Auclert 2019 to an open economy) identifies three channels — intertemporal substitution, expenditure switching, and aggregate income — and the calibration shows monetary policy and capital controls operate through very different channels. The intertemporal substitution channel accounts for about 95% of the effect of capital controls, while expenditure switching (operating through exchange-rate depreciation that shifts demand toward non-tradables) accounts for a substantial share of the effect of monetary policy — the paper states ‘about one-third’ in its introduction and ‘about one-quarter’ in its conclusion. The expenditure-switching channel and the role of the exchange rate are what distinguish the open-economy decomposition from its closed-economy antecedents.

Q3. Do open capital markets amplify or dampen monetary policy?

Capital flows may either amplify or attenuate the output effects of monetary policy, depending on the relative sizes of the elasticity of substitution over time and the elasticity across sectors. If the intertemporal elasticity exceeds the intratemporal one, an open capital account amplifies monetary policy (a monetary expansion raises total consumption more than output, so households borrow from abroad); the result reverses when the intratemporal elasticity is larger, in which case a closed capital account produces the larger output expansion.

Q4. Is ’leaning against the wind’ the optimal prudential use of monetary policy?

Contrary to a widespread policy view, leaning against the wind is not necessarily optimal: when the elasticity of substitution across sectors is higher than across time, raising the interest rate ahead of a liquidity trap can be counterproductive. In that case a rate hike generates a large negative expenditure-switching effect and a sharp income drop while only modestly reducing consumption, so in general equilibrium it leads to capital inflows and more external debt — exacerbating the aggregate demand externality and making a future contraction more likely. The implication is that a prudential monetary policy may require lowering, not raising, the interest rate ahead of a liquidity trap.

Q5. How should monetary and macroprudential policy be combined, and how pre-emptively?

When capital controls are available, the central bank uses monetary policy to stabilize output and uses the capital-flow tax to manage flows, with the macroprudential tax on debt positive only if the ZLB is likely to bind next period; monetary policy, by contrast, must be used prudentially even when the ZLB binds only in some distant future. Because monetary policy is a blunter instrument, it has to be used more pre-emptively than capital controls. The authors also show the central bank may restrict outflows during a liquidity trap when that trap is either temporary or very severe.

Q6. What are the quantitative welfare and unemployment gains from capital controls?

Adding capital controls substantially improves macroeconomic stabilization: average unemployment conditional on a liquidity trap falls from about 6% to about 1.5%, and the unconditional welfare cost of liquidity traps falls from about 0.4% to about 0.1% of permanent consumption — more than a fourfold reduction. The average ex-ante prudential tax on inflows is about 0.2% and the average ex-post tax on outflows is about -0.05%. The authors also note that, with capital controls, liquidity traps are less frequent and less severe but — perhaps surprisingly — tend to last longer.

Q7. Are capital controls beggar-thy-neighbor, and how do international spillovers work?

The authors argue that, contrary to emerging policy concerns, capital controls are not beggar-thy-neighbor and can enhance global macroeconomic stability; international spillovers operate through the world real interest rate, and a country can use capital controls to insulate itself from adverse foreign policies. In their multi-country extension, a country can remain insulated from negative spillovers of a change in the foreign monetary stance through capital controls, which can help prevent the outbreak of a currency war.

Q8. When is international policy coordination desirable?

The authors provide conditions under which a regime of uncoordinated capital controls can dominate laissez-faire, and they find that coordination is desirable only during a liquidity trap — where, notably, it calls for stimulating capital flows rather than preventing them. This stands against the view that uncoordinated capital-control policies necessarily produce a global paradox of thrift.

Q9. How do these results differ from prior open-economy liquidity-trap models?

The paper’s more benign view of spillovers contrasts with contributions such as Caballero, Farhi and Gourinchas (2021), Eggertsson et al. (2016), and Fornaro and Romei (2019), and the authors trace the difference to two features of their model: positive liquidity and the presence of ex-post capital controls. Because goods subject to nominal rigidities are consumed only domestically, foreign policies that favor savings (lowering the world interest rate) raise demand for domestic goods through asset markets and can be stabilizing at the ZLB; and ex-post controls let the central bank actively manage flows during a trap to offset adverse spillovers.

Key concepts

aggregate demand externality : the externality (as in Schmitt-Grohe and Uribe 2016 and Farhi and Werning 2016) by which an individual agent’s borrowing raises external debt and, given nominal rigidities and the ZLB, makes the economy more vulnerable to a future demand-driven contraction; it is the market failure that prudential policy targets in this model.

expenditure switching channel : the open-economy transmission channel through which an exchange-rate depreciation makes non-tradables relatively cheaper, shifting demand toward domestically produced goods; the paper finds it accounts for a substantial share (roughly a quarter to a third) of monetary policy’s effect.

intertemporal substitution channel : the channel through which a change in the intertemporal price shifts consumption between present and future; it accounts for about 95% of the effect of capital controls in the calibration.

liquidity trap / occasionally binding ZLB : a state in which the zero lower bound on the nominal interest rate binds, so conventional monetary policy cannot stabilize output; the risk of entering such a state in the future is what makes pre-emptive prudential policy valuable here.

capital controls (prudential tax on flows) : the macroprudential instrument in the model — a tax on capital inflows (ex ante) or outflows (ex post) — used to manage the level and timing of capital flows and to insulate the economy from foreign spillovers.

beggar-thy-neighbor : a policy that improves one country’s outcomes at others’ expense; the paper argues capital controls are, contrary to common concern, not beggar-thy-neighbor in its setting and can raise global stability.

Local Projection-Based Inference under General Conditions

Mon, 01 Jan 0001 00:00:00 +0000

This paper develops a uniform asymptotic theory for local projection (LP) regression under general conditions, addressing a gap in the literature where existing results required restrictive assumptions about lag order, data persistence, and shock processes. The research question is: how can one conduct valid statistical inference on impulse responses from LP regressions when the true lag order is unknown (possibly infinite), data exhibit arbitrary persistence including unit roots and near-unit roots, horizons are allowed to grow with sample size, and shocks follow general conditionally heteroskedastic martingale difference sequences (MDS)?

The paper works within a VAR(infinity) data-generating process framework, where the vector autoregression may have an unknown and potentially infinite number of lags. The LP regression truncates this at a chosen model order p, with the truncation bias controlled by tail decay conditions on the VAR coefficients. The theoretical framework accommodates a class of VARMA models as a specific illustration, showing that Assumptions 1 and 2 hold for VARMA(q+1, r) processes when the model lag order p diverges at least as fast as log n.

The main theoretical result (Theorem 1) establishes uniform asymptotic normality of the LP estimator, simultaneously over: the coefficient parameter space A, model lag orders p in [p_low, p_high], horizons h in [1, h_bar], and configurations of the linear combination vector gamma (covering both individual and cumulated impulse responses). The convergence rate is pi_1(h; gamma)^{-1/2} n^{1/2}, which depends on persistence level and horizon. For an AR(1) process, the individual response rate is (sum_{i=0}^{h-1} a_1^{2i})^{-1/2} n^{1/2} and the cumulative response rate is h^{-3/2} n^{1/2}, which is slower.

The paper makes two principal contributions. First, LP is shown to be semiparametrically efficient when the controlled lag order diverges. Under classical assumptions (homoskedastic MDS shocks, stationarity, fixed horizon), the LP estimator achieves the same asymptotic distribution as the VAR-implied iterative estimator, and reaches the semiparametric efficiency bound of Chamberlain (1987) under the conditional moment restriction model. Under Gaussianity, LP is asymptotically Cramer-Rao efficient. This extends Plagborg-Moller and Wolf (2021) from distributional equivalence of estimands to equivalence of asymptotic distributions. The commonly held view that LP is inefficient relative to VAR-implied methods holds only under finite small-order VAR models; with a diverging lag order, the efficiency gain from the parsimonious VAR structure vanishes. The alternative LP estimator of Lusompa (2022), shown to be more efficient than standard LP under a known AR(1) model, is likewise shown (Proposition 2) to be asymptotically equivalent to standard LP when a sufficiently large lag order is used (p_u/sqrt(n) -> 0 and sqrt(n)(1-|rho|)^{p_u} -> 0).

Second, two new standard errors are proposed, neither involving HAR-type correction or bandwidth selection. SE_1 is a White-style heteroskedasticity-robust standard error applied after partialling out controls; it is uniformly consistent under a zero fourth cumulant condition on shocks (e.g., zero excess kurtosis with conditional homoskedasticity), but not for general MDS shocks. SE_2, the paper’s main methodological contribution, constructs the variance estimator using martingale-transformed scores: the LP residual Delta_t is projected onto forward residuals (Delta_{t+1}, …, Delta_{t+h-1}) to partial out serial dependence, recovering the true MDS error xi_{1t}(h; gamma) asymptotically. SE_2 is uniformly consistent for general MDS shocks (Proposition 4) and, under a finite-order VAR DGP, requires only p = p_true lags (rather than p >= p_true + 1 required by SE_1 and HAR-type methods).

Simulations using univariate ARMA(1,1) models with rho in {0, 0.5, 0.95, 1} and theta in {-0.5, 0, 0.5}, and bivariate VAR(1) models, confirm that SE_2-based 95% confidence intervals maintain coverage close to the nominal level across all cases including unit roots, while SE_1 shows degraded coverage under conditional heteroskedasticity (GARCH). Both outperform MOPM for cumulated responses at longer horizons.

Scope conditions: the framework accommodates data with unit roots and near-unit roots but not explosive roots or integration of order greater than one (for which differencing is prescribed before applying the LP). The growing-horizon rate condition p^2 h^2 / n -> 0 becomes binding as h grows, requiring h and p to grow at comparable rates or p more slowly. The results are for the VAR framework and do not directly apply to structural (SVAR) identification without additional assumptions.

Q: What is the central inferential problem that motivates this paper?

A: Applied macroeconomists estimating impulse responses via LP regressions face a trilemma: the true lag order is unknown and may be infinite, data may be highly persistent or integrated, and shocks may be conditionally heteroskedastic. Existing uniform validity results (chiefly Montiel Olea and Plagborg-Møller 2021) assume a finite and known model order and require mean-independent shocks, leaving inference potentially invalid when these conditions fail. The paper constructs a theory and inference procedures that remain valid simultaneously over all these dimensions.

Q: What is the VAR(infinity) data-generating process assumed, and what are the key restrictions on it?

A: The DGP is yt = sum_{j=1}^{infinity} a_j y_{t-j} + u_t, where u_t is serially uncorrelated. Assumption 1 bounds the impulse responses uniformly over the parameter space (ruling out explosive roots and integration of order greater than one). Assumption 2 imposes that the tail coefficients a_j decay fast enough that the truncation bias is asymptotically negligible: the rate condition requires sqrt(n) * p * sum_{j=1}^{infinity} j |a_{p+j}| -> 0, implying p must diverge for infinite-order processes. For VARMA models, p need only diverge as slowly as log n.

Q: What does Theorem 1 establish, and what is the convergence rate?

A: Theorem 1 establishes uniform asymptotic normality of the LP estimator, with the supremum taken jointly over the coefficient space A, lag orders p in [p_low, p_high], horizons h in [1, h_bar], and the linear combination vector gamma. The convergence rate is pi_1(h; gamma)^{-1/2} n^{1/2}, where pi_1(h; gamma) = sum_{i=1}^{h} |phi_{1i}|^2 captures persistence and horizon effects. For an AR(1) process, the individual response rate is (sum_{i=0}^{h-1} a_1^{2i})^{-1/2} n^{1/2} and the cumulative response rate is the slower h^{-3/2} n^{1/2}.

Q: In what sense is LP semiparametrically efficient, and under what assumptions?

A: Under classical assumptions — homoskedastic MDS shocks, stationarity, and fixed horizon — when the controlled lag order p diverges at the appropriate rate, the LP estimator reaches the semiparametric efficiency bound of Chamberlain (1987) under the conditional moment restriction model E(yt - sum a_j y_{t-j} | ys, s <= t-1) = 0. It achieves the same asymptotic distribution as the VAR-implied estimator, which itself has the same distribution as the LP estimator under these conditions (established by extending Lutkepohl 1990). Under Gaussianity, LP is asymptotically Cramer-Rao efficient.

Q: Why does the efficiency advantage of VAR-implied methods over LP vanish with a large lag order?

A: Under a finite, small-order VAR model, imposing the functional relationship between all impulse responses and a small set of VAR slope parameters — analogous to dimension reduction in a factor model — yields an efficiency gain for the iterative VAR-implied estimator. However, as the model lag order grows, the number of parameters to estimate grows correspondingly, eroding the dimension-reduction benefit. With a diverging lag order, the extraction of common parameters through a parsimonious model no longer tightens the asymptotic variance of the VAR-implied estimator relative to the direct LP estimator.

Q: How does SE_2 avoid the need for HAR (heteroskedasticity and autocorrelation robust) bandwidth selection?

A: The LP regression error Delta_t(h; gamma) is serially correlated for h >= 2 (it contains MA terms of order h-1), which would normally require HAR correction. SE_2 avoids this by constructing the variance estimator from the martingale-transformed score: the LP residual Delta_t is regressed on the forward residuals (Delta_{t+1}, …, Delta_{t+h-1}) and the fitted residual hat{xi}{1t} is used in place of Delta_t. Asymptotically, hat{xi}{1t} recovers the true LP(infinity) error xi_{1t}(h; gamma) = sum_{i=1}^{h} phi’{1i} u{t+i}, which is a MDS with respect to {u_t, u_{t-1}, …}. Since MDS sums have a martingale structure, their variance can be estimated as a simple sum of squares without bandwidth selection.

Q: Under what condition is SE_1 uniformly consistent, and when does it fail?

A: SE_1 is the standard White heteroskedasticity-robust variance estimator applied to the partialled-out score. It is uniformly consistent under the zero fourth cumulant condition on shocks — that is, when u_t has zero excess kurtosis and is conditionally homoskedastic. This condition fails for general MDS shocks (e.g., GARCH-type shocks), because the cross-moment Cov((tau’w_0)^2, (tau’w_k)^2) does not vanish in general. Simulation results confirm that SE_1-based confidence intervals show degraded coverage under GARCH shocks, while SE_2 maintains coverage.

Q: What is the relationship between this paper and Montiel Olea and Plagborg-Møller (2021)?

A: Montiel Olea and Plagborg-Møller (2021) (MOPM) established uniform validity of LP inference under a finite-order, known VAR model and required mean-independent (not merely MDS) shocks. The current paper extends MOPM in five dimensions: it allows an unknown and potentially infinite true lag order; allows the controlled lag order to diverge; develops new asymptotic theory for general MDS shocks; proposes SE_2 whose consistency does not require mean-independent shocks; and unifies inference for both individual and cumulated impulse responses. The lag-augmented LP regression of MOPM (setting p = p_true + 1) is a special case of the framework here.

Q: What does the paper show about the alternative LP estimator of Lusompa (2022)?

A: Lusompa (2022) showed that, under a known AR(1) model with the true lag order, an alternative LP estimator that exploits the serial dependence structure of the LP error is asymptotically more efficient than standard LP across horizons. Proposition 2 of the current paper shows this efficiency gain does not survive when a sufficiently large lag order is used for the preliminary VAR used to compute the transformation. Specifically, when p_u/sqrt(n) -> 0 and sqrt(n)(1-|rho|)^{p_u} -> 0, the alternative and standard LP estimators are asymptotically equivalent: sqrt(n)[tilde{beta}_1(h) - beta_1(h)] - sqrt(n)[hat{beta}_1(h) - beta_1(h)] = o_p(1). The discrepancy arises from estimation errors in the preliminary residuals entering the asymptotic distribution.

Q: What are the rate conditions on the lag order p and horizon h, and how do they compare to VAR-implied methods?

A: Under a fixed horizon, the condition p^2/n -> 0 suffices for LP, which is weaker than the p^3/n -> 0 typically required for VAR-implied methods (the stricter condition arises because VAR-implied methods must estimate all p slope matrices jointly, while LP treats all but the first as nuisance). Under growing horizons (h -> infinity), the rate condition is p^2 h^2/n -> 0, and the analysis shows p = O(h) is sometimes optimal — p and h should grow at the same rate or p more slowly. By contrast, VAR-implied methods require p = o(n^{1/3}/h^{2/3}) under growing horizons.

Q: What is the lag order flexibility advantage of SE_2 under a finite-order VAR DGP?

A: When the true DGP is a finite-order VAR(p_true), SE_2 achieves consistent inference using exactly p = p_true lags — the exact order. In contrast, SE_1 and HAR-type standard errors require p >= p_true + 1 (at least one extra lag) because at p = p_true the LP residuals Delta_t(h; gamma) contain MA terms of order h-1 that create serial dependence. SE_2’s martingale transformation handles this serial dependence directly, without requiring the extra lag to purge it.

Q: What scope conditions limit the paper’s framework?

A: The framework rules out explosive roots (violating the uniform impulse response bound in Assumption 1) and integration of order two or higher (violating Assumption 1(iii)). For I(2) variables, the prescribed solution is to take differences before applying the LP, and then use the cumulated response (gamma = gamma_CIR) to recover original level responses. The growing-horizon results require the tension condition h_bar * p^2 / n -> 0 (for gamma with ||gamma||_1 = O(1)), implying a binding tradeoff between the range of allowed horizons and the range of allowed lag orders. Results do not directly extend to structural identification without additional assumptions.

Local Projection (LP) regression: A direct regression of the outcome h periods ahead on current and lagged endogenous variables, as in Jorda (2005). The LP estimator of the horizon-h impulse response is the OLS coefficient on the current endogenous variable in this regression, with p-1 lags included as controls. It estimates impulse responses directly for each horizon without imposing the recursive structure of a VAR model.

Uniform asymptotic validity: A distributional approximation (here, standard normal) that holds simultaneously over a parameter space A, a range of model lag orders [p_low, p_high], a range of horizons [1, h_bar], and specifications of the linear combination vector gamma — not merely pointwise for fixed parameter values. Uniformity is the operative concept ensuring finite-sample reliability across empirically relevant configurations.

Semiparametric efficiency: In the paper’s usage, the LP estimator achieves the efficiency bound of Chamberlain (1987) for the semiparametric conditional moment restriction model E(yt - sum a_j y_{t-j} | ys, s <= t-1) = 0 when the controlled lag order diverges. Under Gaussianity, this coincides with Cramer-Rao efficiency. The key result is that the efficiency loss of LP relative to VAR-implied methods — well-documented under finite small-order VAR — is asymptotically negligible once the lag order diverges.

Martingale difference sequence (MDS) shocks: The shock process u_t satisfying E(u_t | u_s, s <= t-1) = 0 almost surely — a condition weaker than mean independence (E(u_t | u_s, s <= t-1) = 0 for all functions of past shocks). MDS shocks include GARCH and stochastic volatility processes. The paper’s SE_2 is designed to be consistent for general MDS shocks, while SE_1 and MOPM require the stronger mean-independence condition.

SE_2 (martingale-transformed standard error): The paper’s proposed standard error, constructed by first regressing LP residuals Delta_t on their forward values (Delta_{t+1}, …, Delta_{t+h-1}) to partial out serial dependence, then using the residual hat{xi}{1t} in the variance estimator as a simple sum of squares. SE_2 is uniformly consistent for general MDS shocks and requires no bandwidth selection, because the residual hat{xi}{1t} asymptotically recovers the MDS LP(infinity) error xi_{1t}(h; gamma).

VAR(infinity) model: A vector autoregression yt = sum_{j=1}^{infinity} a_j y_{t-j} + u_t with potentially infinitely many lags. The paper’s framework treats the true lag order as unknown and possibly infinite, requiring the controlled lag order p in the LP regression to diverge (at a rate constrained by Assumption 2) so that truncation bias becomes asymptotically negligible. VARMA processes are a special case shown to satisfy the paper’s assumptions.

Cumulated impulse response: The linear combination beta_1(h; gamma_CIR) = sum_{j=1}^{h} beta_1(j), corresponding to gamma = (1, …, 1)’. Cumulated responses exhibit slower convergence rates than individual responses — h^{-3/2} n^{1/2} versus (sum_{i=0}^{h-1} a_1^{2i})^{-1/2} n^{1/2} for an AR(1) — and are especially relevant when the response variable is in differences and the researcher seeks level responses of the original variable.

Long-Term Debt and Short-Term Rates: Fixed-Rate Mortgages and Monetary Transmission

Mon, 01 Jan 0001 00:00:00 +0000

This paper uses instrumental-variable local projections (IV-LP) on an unbalanced panel of up to 35 countries over approximately two decades to establish two interconnected findings about fixed-rate mortgages (FRMs) and monetary policy. First, monetary policy affects mortgage type selection: a 100 basis point tightening increases the share of adjustable-rate mortgages (ARMs) in new originations by approximately 10 percentage points after one year, while easing generates the reverse shift toward FRMs. The mechanism is budget constraints: ARM rates move nearly one-for-one with policy rates while FRM rates respond by only about 0.5 percentage points per 100 bps, so after tightening the FRM-ARM spread narrows but both products become more expensive — households facing tighter budgets select the cheaper ARM option, irrespective of spread comparisons. Second, the prevailing stock composition of outstanding ARMs determines how strongly monetary policy transmits to real activity: for every additional percentage point of household debt held as ARMs, the same 100 bps policy change produces approximately 0.05 percentage points more impact on real private consumption at six quarters ahead, controlling for the level of household debt-to-GDP. A back-of-the-envelope calculation implies that the same 100 bps change induces a consumption response approximately 5 percentage points stronger in an economy with 100 percent ARMs versus one with only FRMs. These two findings jointly imply that FRMs create both path-dependency (past easing cycles populate the stock with FRMs, weakening future transmission) and state-dependency (current FRM prevalence determines how much a given rate change moves consumption and GDP) in monetary policy.

In depth

The paper draws on two data sources: flow data covering new mortgage originations in 27 countries and stock data on the outstanding mortgage composition in 35 countries, spanning approximately two decades of quarterly observations. A mortgage is classified as fixed-rate (FRM) if the contractual interest rate is fixed for 12 months or more from origination; below that threshold it is classified as adjustable-rate (ARM). This definition aligns with ECB and Eurostat conventions and is consistent across the panel, though note that some “fixed-rate” mortgages in the sample include hybrid products with initial fixed periods that eventually reprice. The FRM share in new flows (used in the path-dependency analysis, equation 2) captures how the composition of new originations responds to monetary policy. The FRM share in outstanding stock — expressed as a proportion of household debt-to-GDP (ARMdebt) — is the state variable in the state-dependency analysis (equation 3). Countries’ time-series for both measures display the expected patterns: in the long period of ultra-low rates following the GFC, the FRM share in stock increased substantially across the sample.

Q2. How are monetary policy shocks identified and why are information effects excluded?

Monetary policy shocks are constructed from Bloomberg high-frequency financial market surprises around central bank announcement windows, then orthogonalized with respect to the central bank’s private information component using the Bauer and Swanson (2023) procedure. The Bauer-Swanson orthogonalization removes the portion of policy surprises that is correlated with the central bank’s assessment of the economic outlook — the “Fed information effect” identified by Nakamura and Steinsson (2018). Without this purification, a policy surprise that partly reflects the central bank’s private negative news about growth would confound the identification: the estimated consumption response would reflect both the direct policy-rate effect and the information revelation, making it impossible to isolate the transmission mechanism through mortgage types. The first-stage Kleibergen-Paap Wald F statistics are 34 or above for the path-dependency regression (equation 2) and 12.9 or above for the state-dependency interaction regression (equation 3), satisfying standard relevance thresholds.

Q3. What is the path-dependency mechanism and what does Figure 3 show?

Figure 3 plots impulse responses of FRM rates, ARM rates, 10-year and 1-year government bond yields, the FRM-ARM spread, and the ARM share in new flows to a one percentage point policy rate change instrumented with the Bauer-Swanson-cleaned shocks. FRM rates respond by approximately 0.5 percentage points per 100 bps of policy change, similar to the response of 10-year government bond yields, with full reversion after about 4–6 quarters. ARM rates respond approximately one-for-one, similar to 1-year yields, also reverting after 4–6 quarters. Since ARM rates respond more than FRM rates, the FRM-ARM spread narrows by about 0.5 percentage points after a 100 bps tightening — making ARMs relatively cheaper compared to FRMs. Despite this narrowing of the spread (which should theoretically discourage ARM selection), the paper finds that ARM share in new flows increases significantly: a 100 bps tightening raises the ARM share by approximately 10 percentage points after one year, a large effect corresponding to about two thirds of a within-country standard deviation. The paper attributes this to budget constraints: even though the FRM-ARM spread narrows, both products become more expensive in absolute terms, and cash-constrained borrowers choose the cheaper option (ARM) to minimize initial monthly payments, rather than comparing relative spreads. The converse holds during loosening: as borrowing costs decline and budget constraints ease, borrowers show a revealed preference for the interest rate risk protection of FRMs, consistent with a general preference for payment certainty when affordability is not binding.

Q4. How does the mortgage stock composition affect monetary policy transmission (state-dependency)?

The state-dependency analysis (equation 3, Figure 4) regresses macroeconomic outcomes on the interaction of a policy rate change and the ex-ante ARM debt share (ARMs as a proportion of household debt-to-GDP), using country and quarter fixed effects with Driscoll-Kraay standard errors and IV identification. The left column of Figure 4 shows that the marginal effect of a 100 bps policy change on real private consumption increases by approximately 0.05 percentage points for each additional percentage point of ARMs in outstanding stock, a differential that becomes noticeable after about six quarters. The differential response for durables consumption appears earlier (around two quarters), while the real GDP differential is roughly half the consumption differential (about 0.02 percent per percentage point of ARM debt). The right column of Figure 4 separates the state variable into the pure ARM share and household debt-to-GDP by including both interaction terms in a horse-race specification. The paper finds that the ARM share (not the debt level) drives the transmission differences for real GDP and both measures of consumption, consistent with a cash-flow channel interpretation: it is interest rate resets on existing ARM contracts that affect disposable income flows and spending, not the debt level per se. Household debt-to-GDP is relevant for durables consumption, potentially reflecting wealth and collateral effects on credit-intensive spending categories. The 100 percent ARM versus 0 percent ARM back-of-the-envelope calculation implies a 5 percentage point consumption difference per 100 bps, corresponding exactly to one standard deviation in cumulative real private consumption changes at 6 quarters in this sample.

Q5. Why is the shift toward ARMs after tightening paradoxical given the standard relative pricing model, and what channels can explain it?

The standard framework predicts that borrowers choose FRMs when the FRM-ARM spread is low (ARMs relatively less attractive) and ARMs when the spread is high; a tightening that narrows the spread should therefore shift borrowers toward FRMs, not ARMs. The paper finds the opposite and offers two channels. First, a budget constraint channel: after tightening, both FRM and ARM rates rise in absolute terms, but ARMs remain cheaper at origination because they carry lower initial payments; liquidity-constrained borrowers facing higher total borrowing costs choose the cheaper option regardless of the spread direction, consistent with evidence in Andersen et al. (2023) that ARM adoption is more prevalent among liquidity-constrained borrowers. Second, a cost-minimization channel with short-run focus: some borrowers choose the product that minimizes current-period mortgage payments, not lifetime payments; after tightening, ARMs minimize the monthly payment even though they expose borrowers to future rate risk. The paper notes that the converse — FRM adoption after loosening despite rising FRM-ARM spreads — cannot be explained by short-run cost minimization and suggests a preference for rate certainty when affordability is non-binding.

Q6. Is the state-dependency effect asymmetric between tightening and loosening cycles?

The paper tests an asymmetric specification and finds that FRMs are a greater impairment to monetary transmission during tightening relative to loosening cycles, especially when free prepayment options are available. During tightening, a high FRM share means few borrowers face rate resets on their existing debt, so the cash-flow channel is weak; simultaneously, prepayment refinancing into new mortgages is unattractive (locking in a higher rate) so the existing FRM stock remains insulated. During loosening, a high FRM share means borrowers can refinance into lower FRM rates or into ARMs at lower cost, partially restoring the transmission channel. This asymmetry is consistent with findings in Berger, Milbradt, Tourre, and Vavra (2021) on mortgage prepayment and path-dependent monetary policy effects in the US, and suggests that the FRM-induced weakening of transmission is particularly binding precisely during contractionary cycles when central banks most need the transmission mechanism to be operative.

Q7. What are the implications for central bank transmission assessment and policy?

The two findings together imply that monetary policy transmission capacity is endogenous to the history of the policy cycle. A prolonged loosening phase (such as the post-GFC decade of ultra-low rates) shifts new originations toward FRMs, which accumulate in the outstanding stock; the resulting high FRM share means that subsequent tightening operates through a weakened transmission channel. The central bank’s policy instrument affects the transmission mechanism’s own strength. This endogeneity has at least two practical implications. First, central banks that have conditioned borrowers into expecting prolonged low rates may face amplified instrument-calibration uncertainty: the same 100 bps tightening has systematically weaker real effects in economies where prior easing locked in high FRM shares, requiring larger policy moves to achieve the same macroeconomic stabilization. Second, cross-country heterogeneity in the FRM-ARM mix — itself partly endogenous to the history of monetary policy — explains a significant portion of the observed heterogeneity in monetary policy transmission strength across countries, complementing structural explanations based on financial market depth, indebtedness levels, and household balance sheet composition.

Key concepts

fixed-rate mortgage (FRM): a mortgage with a contractual interest rate fixed for 12 months or more; holders are contractually insulated from subsequent policy rate changes, reducing the pass-through of monetary policy to household debt service costs through the cash-flow channel; in the paper’s framework, FRM prevalence is both a consequence of past policy (path-dependency) and a determinant of current transmission strength (state-dependency).

adjustable-rate mortgage (ARM): a mortgage where the interest rate resets with market rates (at intervals shorter than 12 months for the paper’s classification); holders feel policy rate changes immediately in their monthly payments, amplifying the cash-flow channel; the paper finds ARM share in new flows rises after monetary tightening due to budget constraint effects.

path-dependency: the property that the current effectiveness of monetary policy depends on the accumulated history of prior policy rate changes, through their effect on the outstanding mortgage stock composition; specifically, prolonged easing cycles generate high FRM shares that reduce future transmission potency.

state-dependency: the variation in monetary policy transmission strength with the prevailing share of ARMs in outstanding mortgage debt; the same policy rate change produces a consumption response approximately 5 percentage points larger in a 100 percent ARM economy than in a 100 percent FRM economy (per 100 bps), controlling for debt-to-GDP.

cash-flow channel of monetary policy: the mechanism by which changes in policy rates affect households’ disposable income through resets in the interest payments on their existing variable-rate debt; the dominant channel in the paper’s state-dependency results — ARM share (not debt level) drives transmission differences for consumption and GDP, consistent with income flow effects on spending propensity.

IV local projections (IV-LP): the estimation framework combining Jordà (2005) local projections — a flexible, model-free method for estimating impulse responses at multiple horizons — with instrumental variable identification using Bauer-Swanson-cleaned monetary policy shocks; used for both the path-dependency regressions (equation 2, ARM flow response) and the state-dependency regressions (equation 3, interaction with ARM stock).

Bauer-Swanson (2023) information effect correction: the procedure for removing the component of high-frequency monetary policy surprises that is correlated with the central bank’s private information about economic conditions; applied here to prevent the estimated transmission effects from conflating pure rate changes with information revelation about the macroeconomic outlook.

Loose Monetary Policy and Financial Instability

Mon, 01 Jan 0001 00:00:00 +0000

This paper provides the first long-run causal evidence that a persistently loose stance of monetary policy — defined as extended periods of low interest rates relative to the neutral rate — significantly raises the probability of a financial crisis several years later. Using a long historical panel of 18 advanced economies (approximately 1870–2020, excluding world wars), the paper estimates local projection (LP) regressions in which the stance is measured as the 5-year backward moving average of (r – r*), with r* from the Del Negro–Giannoni–Gaballo–Tambalotti (DGGT) factor model. The OLS baseline finds that a 1 percentage-point (pp) looser average stance over a 5-year window raises the 3-year financial crisis probability by 2.2pp at a 5–7 year horizon and 3.3pp at a 7–9 year horizon, against an unconditional base of 10.5%. To address the endogeneity of monetary policy to pre-existing economic conditions, the authors construct an instrumental variable based on the international trilemma of open-economy finance: for countries pegging their exchange rate, changes in the base-country interest rate orthogonal to domestic economic conditions provide exogenous variation in domestic rates, weighted by a capital mobility index. IV estimates are substantially larger: 1pp looser average stance raises crisis probability by 5.5pp at 5–7 years and 15.5pp at 7–9 years, indicating that OLS understates the causal effect because accommodative policy is endogenously adopted during recessions when crisis risk is already low. The same loose-policy stance significantly raises the probability of entering R-zones — periods of credit market overheating identified by Greenwood, Hanson, Shleifer, and Sørensen (2022) as harbingers of financial crisis — and, with a lag of 6–9 years, raises the probability of historically low GDP growth (below the 20th percentile of the cross-country distribution). The evidence supports a growth-risk tradeoff: loose policy may deliver short-term stimulus, but at a meaningful cost in medium-term financial fragility and real tail risk.

Data and sample (Section 2): 18 advanced economies, long historical panel from the 1870s to 2020, excluding the world war episodes (pre-1914, interwar, and 1939–1945 conflicts), yielding an unbalanced panel of roughly 1,500 country-year observations. Financial crisis dates from the Jordà–Schularick–Taylor (2017) Macrofinancial History Database. The stance measure is r_{i,t} − r*{i,t}, where r*{i,t} is country-specific and time-varying, estimated from a factor model (DGGT); the 5-year backward moving average smooths over cyclical fluctuations and captures the sustained character of monetary accommodation that theory associates with financial fragility buildup. The unconditional 3-year financial crisis probability in the post-WWII sample is 10.5%.

Empirical methodology (Section 3): Local projections (Jordà 2005) with financial crisis indicator B_{i,t} as the outcome and 5-year backward MA of stance as the key regressor, estimated at horizons h = 0 to 12 years:

B_{i,t+h} = α_{i} + β_{h} · stance_{i,t} + γ_{h} · X_{i,t} + ε_{i,t+h}

Controls X_{i,t} include: lagged B (crisis history), lagged stance, lagged log GDP growth, lagged credit-to-GDP growth, lagged inflation, and lagged short-term rate — plus global controls (cross-country averages) to absorb common factors. Country fixed effects α_{i} and Driscoll–Kraay (1998) standard errors with h lags account for serial correlation and cross-sectional dependence. The coefficient −100β_{h} converts to the change in 3-year crisis probability (in percentage points) per 1pp tighter stance, so a positive −100β_{h} means a looser stance raises crisis probability.

OLS baseline results (Section 4.1): The baseline LP-OLS model (Figure 3, panel (a)) finds no significant association between stance and crisis probability in the first 4 years after the policy window — loose monetary policy does not immediately raise crisis risk. Crisis probability rises meaningfully from horizons 5 onward:

5–7 year horizon: +2.2pp crisis probability per 1pp lower average stance
7–9 year horizon: +3.3pp crisis probability per 1pp lower average stance
Very loose indicator (stance at the 20th percentile, approximately −2.5%): +13pp at the peak horizon; when stance = −1%, crisis probability is approximately 16% (vs unconditional 10.5%)
Alternative chronology (Baron–Verner–Xiong 2021, bank equity crash events): +5.3pp at the 8-year horizon per 1pp lower stance — broadly consistent with the baseline

R-zone analysis (Section 4.2): Greenwood, Hanson, Shleifer, and Sørensen (2022) define R-zones as periods when household or business credit grows anomalously fast — a pre-crisis credit overheating indicator. LP-OLS estimates show:

1pp lower average stance → +3.2pp household R-zone probability within 5 years; +1.8pp business R-zone probability
Very-loose binary indicator (bottom quintile of stance) → +9.6 to 10.8pp R-zone probability These magnitudes confirm that the financial instability buildup operates through the canonical credit channel: loose monetary policy inflates credit volumes first, with financial crises following several years later.

Eurozone periphery illustration (Section 4.2): The pre-2008 divergence between the ECB’s common stance and country-specific neutral rates is shown in Figure 10. Core eurozone countries (Belgium, Denmark, France, Germany, Netherlands) experienced tight-to-neutral effective stances during 2003–2008, while periphery countries (Ireland, Italy, Portugal, Spain) faced loose stances of up to approximately −10pp. The periphery’s credit boom — in total credit, household credit, mortgage credit, and house prices — far exceeded the core’s over 2002–2008, consistent with the LP-OLS estimates. This pattern motivates the IV strategy.

IV construction (Section 4.3): The instrument follows Jordà, Schularick, and Taylor (2020) and uses the international monetary trilemma. For countries pegging their exchange rate (identified by exchange rate stability), the domestic interest rate is mechanically tied to the base country’s rate; the instrument is:

z_{i,t} = k_{i,t} × (ΔR_{b(i,t),t} − ΔR̂_{b(i,t),t})

where k_{i,t} is a Chinn–Ito capital mobility index, b(i,t) is the base country for country i in year t, ΔR_{b,t} is the actual change in the base country’s interest rate, and ΔR̂_{b,t} is the predicted change obtained from a first-stage regression of base-country rates on base-country economic conditions. The residual captures shifts in the base country’s rate that are orthogonal to economic fundamentals and are transmitted to pegged countries via the exchange rate commitment — exogenous from the perspective of the pegged country. Ten lags of z are used as instruments for the 5-year moving average of stance. The Kleibergen–Paap (2006) test for weak instruments exceeds 10 across all first-stage regressions.

IV second-stage results (Figure 11): The IV estimates are substantially larger than OLS throughout the horizon:

5–7 year horizon: +5.5pp crisis probability per 1pp lower average stance (vs +2.2pp OLS)
7–9 year horizon: +15.5pp per 1pp lower average stance (vs +3.3pp OLS)
With stance = −1%, the IV-implied crisis probability is 16% at 5–7 years; at 7–9 years, medium-term crisis risk more than doubles from the unconditional 10.5% to over 20%
These IV estimates are 2.5× to 5× the OLS, implying substantial attenuation bias in OLS: monetary policy is endogenously loosened during downturns when crisis risk is already low, so reverse causality compresses the OLS coefficient toward zero

IV R-zones (Figure 13): LP-IV estimates for household and business R-zones confirm the LP-OLS direction — loose monetary policy raises the likelihood of entering credit market overheating as defined by Greenwood et al. (2022), at economically relevant magnitudes in the post-WWII period.

Growth-risk tradeoff (Section 5): To close the circle between monetary policy, financial fragility, and real activity, the paper estimates LP models with tail real growth indicators as outcomes. Define Low-Output-Growth_{i,t} = 1{Δ₃(log Y_{i,t}) < 20th percentile} — an indicator for historically low 3-year real GDP per capita growth. The 20th percentile in the sample corresponds to positive growth of 1.32%. Results (Figure 14a):

No significant relationship between stance and Low-Output-Growth probability in the first 4–5 years — consistent with the idea that short-term stimulus benefits materialize before financial fragility builds
At horizons 6–9 years: when stance is 1pp looser, the probability that Low-Output-Growth turns on rises by 2pp (at 8 years) and 3pp (at 9 years), significant at the 32% (5%) level at h=8 (h=9)
For Barro–Ursua (2008) disaster events (peak-to-trough falls in real GDP per capita of ≥10%, 3.2% of sample observations): the disaster probability follows a similar hump — slightly lower disaster risk in the short term under loose policy (the stimulus dividend), followed by materially higher disaster risk at 7–9 years (Figure 14b)
Conclusion: loose monetary policy produces a growth-risk tradeoff, where short-run stimulus gains are offset by elevated medium-term tail risk in financial and real activity

Scope conditions: The paper documents empirical regularities from long historical data; it does not build or estimate a structural model, so it cannot formally decompose the mechanisms driving the reduced-form effects (risk-taking channel, credit-boom channel, or asset-price inflation). The stance measure (r − r*) depends on estimates of the time-varying neutral rate, which carries its own uncertainty; robustness using alternative r* measures is presented. The IV relies on countries pegging their exchange rate, which varies across time and countries; results may not generalize to monetary unions or fully flexible exchange rate regimes where the trilemma applies differently. The sample of 18 advanced economies may not be representative of emerging market contexts. The analysis is positive, not normative: it does not compute welfare-optimal monetary policy rules that account for the intertemporal tradeoff.

In depth

Q1. Why does the paper measure stance as a 5-year backward moving average rather than the contemporaneous rate gap?

The 5-year moving average captures the sustained character of loose monetary policy that theory associates with financial fragility accumulation; a single quarter of low rates does not meaningfully alter bank balance sheets or credit market dynamics, but several years of below-neutral rates allow risk appetite to build up gradually through reach-for-yield behavior, leveraging, and lending standard erosion. The backward average also corresponds more naturally to the length of a typical financial cycle (Borio 2014), over which excessive credit and asset price growth gradually accumulates before a crisis materializes. Using the contemporaneous rate gap would miss the cumulative nature of the stance and would likely attenuate the estimated effect toward zero because any individual year’s rate is highly endogenous to the current cyclical position.

Q2. Why are the IV estimates so much larger than the OLS estimates, and what does this imply about the direction of endogeneity bias?

The IV estimates (5.5pp at 5–7 years, 15.5pp at 7–9 years) are roughly 2.5× to 5× the OLS estimates (2.2pp and 3.3pp), implying that OLS is severely attenuated by reverse causality: central banks endogenously loosen policy during recessions and financial downturns — precisely the states in which crisis risk is temporarily depressed — so the OLS coefficient conflates the true causal effect (loose policy raises crisis risk) with an offsetting correlation (loose policy coincides with post-crisis low-risk states). The trilemma IV isolates the exogenous component of the stance — changes transmitted to pegged countries by the base-country’s monetary decisions that are orthogonal to the pegged country’s own economic conditions — and strips away this endogeneity, revealing that the true causal effect on crisis risk is substantially larger than OLS suggests. This finding matters for policy: it implies that the textbook concerns about risk-taking and financial cycle effects of low rates are not only statistically detectable but quantitatively much more important than naive correlations suggest.

Q3. How does the trilemma instrument achieve exogenous variation in domestic monetary conditions?

For countries pegging their exchange rate, the trilemma forces domestic interest rates to shadow the base country’s rate (usually the US, Germany, or the UK); when the base country cuts rates for reasons driven by its own domestic conditions — unrelated to the pegged country’s economic state — the pegged country inherits looser monetary conditions through the exchange rate commitment. The instrument refines this logic by: (i) using the residual of the base-country rate change after partialling out the base country’s own macro fundamentals, eliminating the component of the base-country cut that might be correlated globally with crisis risk; and (ii) weighting by the capital mobility index k_{i,t}, so that the instrument is strongest when capital flows freely and the trilemma constraint is tightest. The exclusion restriction requires that these exogenous shifts in the base-country rate affect the pegged country’s financial crisis probability only through the channel of domestic monetary conditions, not through other international spillovers (e.g., trade or capital flow channels).

Q4. What is the timing pattern of crisis risk accumulation and what explains the absence of an effect in the first four years?

Crisis risk does not rise in the first 4 years after a period of loose monetary policy, rises sharply at 5–7 years (5.5pp IV), and peaks at 7–9 years (15.5pp IV) — the “slow burn” pattern reflects the lag between credit market overheating and realized financial crises. The mechanism links stance to crisis through the intermediary of credit booms: the paper shows (Figure 13) that R-zones (credit overheating) build within 5 years of loose policy, and the literature (Schularick–Taylor 2012; Jordà–Schularick–Taylor 2015) has established that credit booms predict financial crises with similar multi-year lags. The short-term absence of elevated crisis risk is consistent with — and not in tension with — the Barro–Ursua disaster results, which show lower disaster probability in the short term under loose policy, capturing the genuine stimulus dividend before the financial fragility materializes.

Q5. What are R-zones and what role do they play in the paper’s chain of evidence?

R-zones (Greenwood, Hanson, Shleifer, and Sørensen 2022) are periods when household or business credit grows anomalously fast relative to historical norms, identified as leading indicators of subsequent financial distress; the paper uses them to establish a link in the causal chain: loose monetary policy → credit overheating → financial crisis, providing a mechanism-level bridge between the reduced-form IV results. The R-zone regressions show that loose policy raises the household R-zone probability by 3.2pp and business R-zone by 1.8pp within 5 years (OLS; LP-IV confirms the direction), implying that the credit channel is active within the financial cycle window before the eventual crisis materializes. This is important because it distinguishes the paper’s finding from a pure statistical correlation between stance and crisis: the financial system’s credit overheating is a detectable intermediate state that connects loose policy to the eventual fragility outcome.

Q6. What does the growth-risk tradeoff finding imply for the welfare calculus of monetary accommodation?

The short-term benefits of loose policy (higher output, lower unemployment in the first 4–5 years) are offset in expectation by a materially elevated probability of historically severe output collapses at 6–9 year horizons; the Barro–Ursua disaster evidence further suggests a slight reduction in disaster risk in the short term followed by a large increase at medium horizons, which is exactly the intertemporal tradeoff that makes evaluating accommodative policy difficult in real time. The growth-risk tradeoff does not by itself deliver an optimal policy prescription — the tradeoff between near-term stimulus and medium-term tail risk depends on the discount rate, the size of the respective effects, and the welfare cost of financial crises — but it establishes that any evaluation of prolonged accommodative policy that considers only its near-term benefits is incomplete. The finding is consistent with the Growth-at-Risk literature (Adrian et al. 2019, 2022) and with the BIS’s documented concerns about financial cycle risks during the 2010s low-rate environment.

Q7. Why is the endogeneity of monetary policy to financial conditions particularly important for this paper’s identification?

A central objection to any empirical relationship between low rates and subsequent financial crises is that central banks loosen policy in response to financial stress and economic weakness — states in which crisis risk is already elevated or depressed by pre-existing vulnerabilities; the OLS coefficient would then reflect the reverse-causal channel (crisis risk → loose policy) as much as the forward-causal channel (loose policy → crisis risk), making it impossible to infer causation. The trilemma IV directly addresses this by exploiting variation in monetary conditions that is literally determined by a different country’s central bank for that country’s domestic reasons — making it extremely implausible that the pegged country’s crisis risk influenced the base country’s rate decision in ways that satisfy the exclusion restriction. The result that IV exceeds OLS by 2.5–5× implies the endogeneity was strongly attenuating (loose policy coincides with low-risk states, biasing OLS downward), and the true causal effect of sustained accommodation on crisis risk is considerably larger than the raw correlations would suggest.

Q8. How does the paper relate to and distinguish itself from the theoretical risk-taking channel literature?

The paper is entirely empirical and does not propose a structural model; it complements the theoretical risk-taking channel literature (Borio–Zhu 2012; Dell’Ariccia–Laeven–Marquez 2014; Bekaert–Hoerova–Lo Duca 2013) by providing the first long-run causal evidence that the reduced-form prediction of that literature — loose policy raises systemic financial fragility — holds in the historical data. Existing empirical work had focused on high-frequency or cross-sectional responses of individual bank risk metrics to monetary policy surprises; the paper’s long-run LP approach is better suited to capturing the slow financial cycle dynamics that theory predicts and cannot be identified in event-study windows. The IV strategy resolves the identification problem that had stymied prior cross-country empirical work, where reverse causality confounded the relationship.

Key concepts

monetary policy stance : in this paper, the 5-year backward moving average of the policy rate gap (ri,t − r*i,t), where r* is the time-varying natural rate from the DGGT factor model; the sustained character of the measure captures the cumulative accommodation relevant for financial cycle dynamics, as opposed to short-lived rate cuts that do not materially affect bank portfolio decisions or credit standards.

trilemma IV : the paper’s instrumental variable for monetary stance, constructed for exchange-rate pegging countries as the capital-mobility-weighted residual of base-country interest rate changes (orthogonal to the base country’s own macro conditions); exploits the international monetary trilemma — a country pegging its exchange rate surrenders monetary autonomy and must match the base country’s rate regardless of its own economic conditions — to generate exogenous variation in the domestic stance.

local projections (LP) : the empirical methodology (Jordà 2005) estimating a separate OLS regression for each horizon h = 0,…,12, with the future crisis indicator (or R-zone, or low growth indicator) at horizon h as the outcome and the current stance measure as the key regressor; provides flexible impulse response functions without imposing the dynamic restrictions of a VAR, and allows the timing of crisis risk buildup to emerge directly from the data.

R-zones : periods of credit market overheating as defined by Greenwood, Hanson, Shleifer, and Sørensen (2022) in which household or business credit grows anomalously fast; used in this paper as an intermediate-state indicator that links loose monetary policy (identified 1–4 years earlier) to subsequent financial crisis (materializing 5–9 years later), supporting the credit-channel interpretation of the reduced-form IV results.

growth-risk tradeoff : the paper’s characterization of the intertemporal welfare consequences of sustained monetary accommodation; loose policy delivers short-term output gains (visible as slightly lower disaster probability at short horizons) but raises the probability of historically low real GDP growth at 8–9 year horizons by 2–3pp and elevates medium-term financial crisis risk by up to 15.5pp per 1pp looser average stance, implying that assessments of accommodative policy based only on near-term stimulus benefits substantially understate the medium-term costs.

Making the Invisible Hand Visible: Managers and Worker Allocation

Mon, 01 Jan 0001 00:00:00 +0000

This paper asks why managers matter for firm performance, and specifically whether managers improve productivity by matching workers to better-suited jobs inside firms rather than through supervision, motivation, or selection out of the firm. The setting is the internal labor market of a large private consumer goods multinational enterprise (MNE) operating in more than 100 countries, with annual turnover exceeding EUR 50 billion. The data cover the universe of white-collar workers and managers at the firm — 200,000 workers and 30,000 managers observed monthly over 11 years (January 2011 to December 2021) — linked to payroll, performance ratings, organizational chart, digital platform activity, employee surveys, and an independent sales productivity series for field sales workers in 15 countries.

The paper confronts two identification challenges. First, the author constructs a measure of manager quality — “high flyers” — defined as managers who were promoted to the first managerial work level (WL2) by age 30. This threshold yields 26.2% of managers classified as high flyers. The measure is defined entirely ex ante, before the manager ever supervises the worker under study, which addresses reverse causality. It is validated against ex post performance metrics including future salary growth, probability of promotion to WL3, performance ratings, and anonymous subordinate feedback. Second, to identify causal effects of manager quality on workers, the author exploits the firm’s long-standing policy of rotating WL2 managers laterally across teams as part of their career development, a practice implemented for several decades. Using an event-study design centered on the worker’s first manager transition, the author compares workers who transition from a low-flyer to a high-flyer manager (LtoH) against workers who transition from one low-flyer to a different low-flyer (LtoL), netting out the effect of the transition itself. Pre-event parallel trends are confirmed empirically.

The main findings are as follows. Gaining a high-flyer manager causes substantial reallocation of workers within the firm through lateral job transfers: seven years after the manager transition event, cumulative lateral moves are 40% higher for workers who gained a high-flyer manager relative to those who gained another low-flyer. These lateral moves are not confined to a single organizational margin — transfers rise within-team, across teams in the same function, and across functions — and they involve meaningfully larger shifts in task content, as measured by angular separation across O*NET cognitive, routine, and social task intensity dimensions, with cumulative task distance becoming statistically distinguishable from zero approximately seven quarters post-transition. These gains in lateral mobility translate into persistent wage growth: seven years after the manager transition, workers supervised by a high-flyer earn salaries 13% higher than the comparison group, with divergence beginning only after the transition date. Using independent sales bonus data, three years after gaining a high-flyer manager workers’ sales productivity increases by 0.347 standard deviations, ruling out the interpretation that wage gains merely reflect manager favoritism rather than genuine productivity improvement. Establishment-level data further show that sites with a higher share of workers under high-flyer managers display higher output per worker and lower operational costs per unit.

Effects are asymmetric: gaining a good manager has large positive effects, but losing one (comparing HtoL with HtoH transitions) produces no corresponding negative effects, implying that a single exposure to a high-flyer manager generates durable benefits that survive a subsequent downgrade in manager quality. A mediation analysis finds that 64% of the salary gain is explained by lateral job changes, though the author notes this understates the full allocation channel because it excludes vertical transfers and the gains from remaining well-matched in the current role. These findings hold under multiple robustness checks including restricting to new hires, using the Sun and Abraham (2021) interaction-weighted estimator, varying the age threshold for high-flyer classification, using a tenure-based alternative, and placebo tests with randomly assigned manager types.

The scope conditions are specific to white-collar workers at a large, organizationally homogeneous consumer goods multinational. All workers hold college degrees, mean firm tenure is 8.5 years, team sizes average five workers, and the firm has the same organizational structure across all countries, functions, and years.

Q: How does the paper define “high flyer” managers and what share of managers receive this classification? A: High flyers are managers who achieved the first managerial work level (WL2) by age 30, a threshold derived from continuous age estimates constructed from 10-year age bands in the personnel records. This definition yields 26.2% of managers classified as high flyers. The measure is time-invariant and defined ex ante relative to any interaction with the workers whose outcomes are studied.

Q: What validates the high-flyer measure as capturing genuine managerial ability rather than noise? A: The high-flyer classification is significantly positively correlated with multiple ex post performance metrics recorded after the manager’s own promotion: future salary growth, probability of subsequent promotion to WL3 (director level), annual performance ratings, and anonymous upward feedback scores from subordinates on leadership. High flyers are also 14.5 percentage points less likely to be mid-career recruits, suggesting they are internally developed talent rather than external hires.

Q: What is the source of identifying variation and how does the event-study design address endogeneity? A: The firm has operated a decades-long policy of rotating WL2 managers laterally across teams to broaden their experience and to screen candidates for promotion to WL3. These rotations are asserted by firm executives and HR representatives to be orthogonal to worker and team characteristics. The author verifies this empirically by showing that a wide range of team characteristics measured over the two years before a transition — including team performance, inequality, transfer rates, and team diversity — cannot predict the type of incoming manager. The event-study design compares workers who receive a high-flyer replacement (LtoH) against workers who receive another low-flyer replacement (LtoL), netting out any generic effect of a managerial change, and confirms parallel pre-trends.

Q: What is the effect of gaining a high-flyer manager on lateral job mobility? A: Seven years after the manager transition, workers assigned to a high-flyer manager exhibit lateral moves that are 40% higher relative to workers assigned to another low-flyer. These lateral moves occur across all organizational margins: within the same team, across teams within the same function (the largest contributor), and across functions. Beyond frequency, lateral moves under high-flyer managers also involve larger task-content shifts, with cumulative task distance (measured using O*NET cognitive, routine, and social task dimensions via angular separation) becoming statistically distinguishable from zero approximately seven quarters after the transition.

Q: What is the wage effect of gaining a high-flyer manager and when does it materialize? A: Workers who transition from a low-flyer to a high-flyer manager earn a salary 13% higher than workers who transition to another low-flyer, measured seven years after the transition event. The divergence begins only after the transition date, consistent with the pre-event parallel trends assumption, and accumulates gradually rather than appearing as an immediate jump.

Q: Does the wage gain reflect genuine productivity improvement or simply managerial favoritism in pay decisions? A: The author uses an independent sales bonus series — based on monthly targets set by supply chain demand planning teams, not by managers — for 5,604 field sales workers in 15 countries from 2018 to 2021. Three years after gaining a high-flyer manager, workers’ sales productivity increases by 0.347 standard deviations. This confirms that pay gains correspond to actual productivity improvement rather than inflated ratings for unchanged performance.

Q: How much of the wage gain is attributable to the lateral reallocation channel specifically? A: A mediation analysis attributes 64% of the 13% salary gain to lateral job changes. The author cautions that this is a lower bound because the mediation excludes vertical transfers (which mechanically raise salary) and does not capture gains for workers who remain in their current job because it represents a good match rather than requiring reallocation.

Q: Are the effects symmetric — does losing a high-flyer manager reverse the gains? A: No. Comparing workers who transition from a high-flyer to a low-flyer manager (HtoL) against workers who transition from a high-flyer to another high-flyer (HtoH) reveals no corresponding negative effects. The gains from a single prior exposure to a high-flyer manager are persistent and are not undone by a subsequent low-quality manager. The author interprets this as evidence that a good match, once created, endures independently of the manager who created it.

Q: Does gaining a high-flyer manager raise the rate of worker exit from the firm? A: No. There is no statistically detectable effect on either voluntary exits (quits) or involuntary exits (layoffs), with null results that are not masked by heterogeneity across high- and low-performing workers. This rules out the interpretation that high-flyer managers improve measured outcomes of retained workers by selecting out underperformers.

Q: Do workers move into roles connected to their high-flyer manager’s prior network or follow their manager when the manager moves? A: No. There is no evidence that workers move into roles connected to the high-flyer manager’s prior colleagues; if anything, subordinates of high-flyer managers are less likely to make such moves. Workers also do not follow their high-flyer managers when those managers subsequently rotate to a different team. These findings rule out favoritism, social network access, and information-advantage explanations as primary drivers.

Q: How does the paper rule out on-the-job teaching (human capital transmission) as the primary mechanism? A: If high-flyer managers improved worker outcomes primarily by teaching workers to be more productive in their current job, the prediction would be reduced lateral mobility (workers become too productive to leave their current role). The observed pattern — substantially higher rates of lateral reallocation under high-flyer managers — is the opposite of this prediction, making teaching as the dominant channel unlikely.

Q: What does the manager behavior evidence show about how high flyers spend their time? A: Time-use data from a random sample of approximately 600 WL2 managers in 2019 show that high-flyer managers spend 19% more time in one-on-one meetings with subordinates and engage more in communication and multitasking activities relative to low-flyer managers. Their skill profiles also differ: high flyers are more likely to have strengths in strategy and talent management rather than project management, consistent with a more coordination-intensive and people-development-oriented style.

Q: What heterogeneity is there in who benefits from high-flyer managers? A: Effects are larger when managers and workers are in the same physical office (proximity facilitates talent assessment), when the organizational unit has a more diverse set of job roles (more matching opportunities), and for younger workers who are still discovering their comparative advantages. Critically, benefits are not concentrated among high-baseline performers: workers with low initial pay growth experience gains comparable to those of high performers, suggesting high-flyer managers uncover and deploy hidden talent broadly rather than accelerating only already-visible stars.

Q: Does high-flyer management aggregate to establishment-level productivity? A: Yes. Establishments where a higher share of workers are supervised by high-flyer managers show higher output per worker (tons per FTE) and lower operational costs per unit of output (operational costs per ton), measured using establishment-year data across approximately 150 sites globally over 2019-2021. This is consistent with the individual-level allocation mechanism producing aggregate productivity gains.

Q: What are the organizational design implications of the asymmetric effects? A: Because the gains from a single exposure to a high-flyer manager persist even after a subsequent manager downgrade, firms do not need each worker to be continuously supervised by a high-flyer. It is sufficient to rotate high-flyer managers across teams so that each worker receives at least one exposure. This makes the allocation mechanism resource-neutral relative to hiring, firing, or formal training programs.

High flyer (paper’s definition): A manager who achieved the first managerial work level (WL2) at the firm by age 30 — a time-invariant, ex ante classification representing the firm’s revealed-preference assessment of leadership potential, validated against subsequent salary growth, promotion probability, performance ratings, and subordinate feedback. Constitutes 26.2% of managers in the sample.

Internal labor market (paper’s usage): The system within the firm through which workers are allocated to jobs via lateral transfers and vertical promotions, mediated by managers rather than by external price mechanisms; the institutional context within which manager-worker matching produces wage growth and productivity gains.

Lateral transfer (paper’s usage): A horizontal reallocation of a worker to a different job title, team, subfunction, or function at the same work level, as distinct from a vertical promotion. Captured monthly in personnel records; operationalized as moves involving changes in task content measured by O*NET task distances.

Task distance (paper’s usage): The angular separation between origin and destination occupations across three O*NET task dimensions (cognitive, routine, and social intensity), ranging from zero (identical task profiles) to one (completely distinct profiles), used to characterize the substantive scope of lateral moves induced by high-flyer managers.

Manager rotation (paper’s usage): The firm’s longstanding policy of reassigning WL2 managers laterally across teams within a subfunction, designed to broaden managerial experience and screen for promotion to WL3; treated in the empirical strategy as generating plausibly exogenous variation in the manager type each worker encounters.

Allocation mechanism (paper’s usage): The process by which managers discover workers’ specific skills and match them to specialized jobs inside the firm, operating through lateral reallocation rather than through hiring, firing, or on-the-job training; identified in the paper as the primary channel through which high-flyer managers generate persistent wage and productivity gains.

Asymmetric persistence (paper’s usage): The empirical pattern in which the gains from gaining a high-flyer manager are large and durable, while losing a high-flyer manager (transitioning to a low-flyer) produces no corresponding negative effects on the outcomes of previously well-matched workers, implying that good matches, once formed, survive a change in manager quality.

Manager Pay Inequality and Market Power

Mon, 01 Jan 0001 00:00:00 +0000

This paper asks whether managers are paid for market power. Bao, De Loecker, and Eeckhout build a general equilibrium model in which firms compete oligopolistically in goods markets (following Atkeson and Burstein 2008) while managers are allocated to firms through a competitive matching market (following Gabaix and Landier 2008 and Tervio 2008). The model identifies two distinct channels through which market power and firm size jointly determine executive compensation: a market power channel, whereby a more productive firm charges a higher markup given its output level, and a firm size channel, whereby higher total factor productivity expands output given markups. Because manager ability and firm type are complementary inputs into TFP, assortative matching arises: high-ability managers sort into high-type firms, amplifying both productivity dispersion and markup dispersion across firms.

The authors estimate the model year-by-year using Simulated Method of Moments on Compustat data covering 1994 to 2019, targeting ten moments including the average salary share, markup distribution, employment, and manager compensation levels. Firm-level markups are estimated using the production approach of De Loecker, Eeckhout, and Unger (2020). The ExecuComp variable TDC1 — encompassing salary, bonus, restricted stock grants, and option grant values — measures manager pay. Finance, insurance, and real estate sectors (SIC 6000–6799) are excluded.

Main findings: market power accounts for on average 45.8% of total manager pay over the sample period, rising from 38.0% in 1994 to 48.8% in 2019. Over the full period, average CEO compensation (net of reservation utility) roughly doubled, from approximately $2.94 million to $6.43 million. Of the $3.49 million cumulative increase, $2.02 million (57.8%) is attributed to rising market power, with the remainder ($1.47 million) due to the firm size channel. The market power channel’s dominance is concentrated among top managers: for the highest-ranked managers in 2019, 80.3% of pay is attributable to market power, and nearly all of their pay growth since 1994 stems from the market power channel. For lower-ranked managers, pay is determined primarily by the firm size channel and has been roughly flat over the period.

Within the market power channel, changes in technology — specifically increasing dispersion in firm-level TFP — are the dominant factor, contributing $1.33 million (65.9% of total market power channel growth). The increasing importance of manager ability (rising parameter alpha) contributes an additional $1.14 million through the market power channel. Within the firm size channel, TFP change accounts for 70.1% ($1.03 million) of growth, but the large effects from rising alpha and rising complementarity (gamma) are substantially offset by increasing dispersion in firm type. Structural estimates confirm that the average number of firms per market declines from 4.40 to 3.15, and firm-type dispersion (sigma_z) rises from 0.51 to 0.77, both consistent with rising market power over the period.

A counterfactual economy with no market power — firms priced at marginal cost — would yield a social welfare gain of 58.4% on average. The welfare cost of market power in 1994 could be offset by a 33.8% TFP increase; by 2019 the required TFP offset had risen to 51.7%. Without any market power, even the most talented managers would earn only their reservation utility, because firms earn zero profits regardless of productivity, eliminating the complementarity-driven matching surplus that makes top managers valuable. This confirms that superstar manager pay is intrinsically tied to the existence of market power in goods markets, not solely to firm size.

Scope conditions: the model applies to publicly listed US firms covered by Compustat and ExecuComp. The mechanism relies on Cournot competition within oligopolistic markets, assortative matching between managers and firms, and complementarity between manager ability and firm type (elasticity of substitution gamma estimated to be negative throughout the sample). The findings on market power share apply to CEOs specifically; the authors argue the same logic extends to all managerial positions with span-of-control over other workers, which encompasses roughly one-fifth of the workforce.

Q: What are the two channels through which manager pay is determined in the model, and how do they differ mechanically? A: The market power channel captures how a given level of TFP translates into higher markups — more productive firms charge more above marginal cost — thereby increasing profits per unit of output. The firm size channel captures how higher TFP expands the quantity of output a firm produces, increasing total profits through scale rather than through price-cost margin. Both channels raise profits and thus the marginal product of managers, but they operate through distinct economic mechanisms: one through pricing power and the other through productive scale.

Q: What is the empirical magnitude of the market power channel’s contribution to manager pay levels and growth? A: Market power accounts for an average of 45.8% of total manager pay over 1994–2019, rising monotonically from 38.0% in 1994 to 48.8% in 2019. For the total pay increase of $3.49 million over the period, $2.02 million (57.8%) is due to the increase in market power, with the remaining $1.47 million attributable to the firm size channel.

Q: How does the market power channel’s importance vary across the manager ability distribution? A: For the highest-ranked managers, 80.3% of total pay in 2019 is attributable to market power, and nearly all of their pay growth since 1994 runs through the market power channel. For the lowest-ranked managers, pay is almost entirely explained by the firm size channel and has been approximately flat over the period. This heterogeneity arises because top managers sort into high-markup firms through assortative matching, making their compensation disproportionately dependent on those firms’ market power.

Q: How does the model generate assortative matching between manager ability and firm type? A: Manager ability and firm type are complementary inputs into TFP (the CES aggregator with elasticity of substitution gamma less than one), which makes the matching output supermodular. In a frictionless matching market with transferable utility, supermodularity guarantees that high-ability managers match with high-type firms in equilibrium (Proposition 1). This positive assortative matching then amplifies productivity and markup dispersion, since the most productive firms become even more productive and gain larger market shares.

Q: What structural changes drive the rising importance of market power in manager pay over time? A: The dominant factor within the market power channel is changes in technology, specifically increasing firm-type dispersion (sigma_z rising from 0.51 to 0.77), which contributes $1.33 million or 65.9% of market power channel growth. The rising importance of manager ability (alpha, the weight on manager ability relative to firm type in the TFP aggregator) contributes another $1.14 million. The number of firms per market declines from an average of 4.40 to 3.15, further reducing competitive pressure and amplifying the markup premium for high-productivity firms.

Q: What does the counterfactual with no market power (first-best pricing) imply for manager pay and social welfare? A: Without market power, firms price at marginal cost and earn zero profits regardless of productivity, which eliminates the surplus from manager-firm matching. All managers would earn only their reservation utility, which is negligible relative to actual compensation. Social welfare would increase by 58.4% on average. The efficiency cost of market power — measured as the TFP increase needed to offset welfare losses — rose from 33.8% in 1994 to 51.7% in 2019, indicating a worsening welfare distortion over the period.

Q: How are markups measured, and what is their trend in the data? A: Markups are not directly observable and are estimated using the production approach of De Loecker, Eeckhout, and Unger (2020), which recovers firm-level price-cost margins from production data without requiring price data. Average markups in the Compustat sample rose from 1.53 in 1994 to 1.78 in 2019. The reduced-form elasticity of manager pay with respect to markups (controlling for firm characteristics, year, and firm fixed effects) increased substantially: in 2019 a one-percent increase in firm-level markup raises manager pay by 0.41 percent, which is 70.1% larger than the effect estimated in 1994.

Q: How does the paper handle the identification challenges inherent in regressing manager pay on markups? A: The reduced-form regression (with firm fixed effects, year effects, and interactions of year dummies with markups) documents a robust positive correlation but cannot establish causality due to reverse causality and omitted-variable bias. The paper addresses this by embedding the markup-manager pay relationship in a structural model where both are jointly determined by primitives — technology, market structure, and manager ability — and estimating those primitives via Simulated Method of Moments. The quantitative decomposition into market power and firm size channels derives from the model structure rather than from identifying variation in an instrumental variables sense.

Q: What do the matching model estimates reveal about manager-firm complementarity over time? A: The estimated elasticity of substitution between manager ability and firm type (gamma) is negative throughout the sample, confirming complementarity. Gamma was relatively stable before declining sharply from -2.22 in 2014 to -3.55 in 2019, indicating that manager ability and firm type became substantially more complementary in the latter part of the sample. The importance-of-manager parameter alpha is small (consistent with Gabaix and Landier 2008) but generally increasing, suggesting managers play an expanding role in determining firm-level TFP over time.

Q: What are the broader macroeconomic and distributional implications of the findings? A: Because approximately one-fifth of workers supervise other workers, the market-power-driven premium in managerial pay has implications beyond CEO compensation for the shape of the earnings distribution. The rise in top-1-percent income is identified as an efficiency concern, not just an equity concern: the best managers are hired by high-markup firms where they generate profits for shareholders but disproportionately little additional social value. Assortative matching between top managers and top firms widens the productivity gap between competitors, increasing market power and deadweight loss — the social return to managerial talent is therefore below the private return in equilibrium.

Market Power Channel: The component of manager pay attributable to how a firm’s TFP raises its markup — the ratio of output price to marginal cost — given the level of output. Distinct from the firm size channel; operates through pricing power rather than scale.

Firm Size Channel: The component of manager pay attributable to how a firm’s TFP expands output quantity given markups. Increasing output scale raises total profits and thus the marginal product of the manager even absent any change in price-cost margins.

Assortative Matching: The equilibrium allocation of high-ability managers to high-type firms, arising because manager ability and firm type are complementary inputs into TFP (supermodular matching output). Matching is determined in a frictionless market with transferable utility.

Markup: The ratio of output price to marginal cost, equal to the inverse of the price elasticity of demand under the nested CES preference structure. Endogenously determined by the firm’s sales share within its oligopolistic market and the elasticities of substitution within markets (eta) and across markets (theta).

Manager-Firm Complementarity: The property that manager ability and firm type are imperfect substitutes with elasticity of substitution gamma less than one in the TFP aggregator. Complementarity is the necessary condition for positive assortative matching and for the supermodularity of matching surplus.

Span of Control (Lucas 1978): The mechanism by which a manager raises the productivity of all workers under supervision, so that a more able manager generates a proportionally larger productivity gain the larger the firm. Provides the microfoundation for why firm size amplifies the value of manager ability.

Market Structure: The number of firms in each oligopolistic sub-market (Ij), which varies across markets and over time. Together with the distribution of firm-level TFP within a market, market structure determines how much competitive pressure limits markup extraction. Average firms per market declines from 4.40 to 3.15 over 1994–2019.

Marginal Propensity to Consume and Personal Characteristics: Evidence from Bank Transaction Data and Survey

Mon, 01 Jan 0001 00:00:00 +0000

Layer 1 — Overview

Research Question. This paper asks whether heterogeneity in the marginal propensity to consume (MPC) stems from temporary circumstances (e.g., transient wealth shocks that tighten liquidity) or persistent personal characteristics (e.g., high time discount rates or strong risk aversion that permanently shape saving behavior). Because liquidity constraints are endogenous — they can reflect either bad luck or impatient preferences — disentangling these two sources requires independently measured individual characteristics, which are not available in standard transaction datasets.

Data and Setting. The study combines two data sources drawn from Mizuho Bank, one of Japan’s three largest banks (approximately 24 million individual accounts). First, weekly bank account transaction data for January 2019 to November 2022 covering all outflows (ATM withdrawals, credit card debits, utility payments, interbank transfers) for the approximately 5,282 survey respondents. Second, a bespoke survey conducted in November–December 2022 among 400,000 randomly selected salary-receiving account holders (response rate 1.32%, yielding 5,282 usable observations). The survey elicits the Arrow–Pratt measure of absolute risk aversion, quantitative time discount rates for one-week, one-year, and ten-year horizons, self-reported liquidity constraints, homeownership, education, age, and gender, among other variables.

Three Income Shocks. MPC is estimated against three distinct income events: (1) the Japanese government’s Special Cash Payments (SCP) — a 100,000 JPY (approximately 800 USD) per-person lump-sum transfer during COVID-19, likely transitory, unexpected, and nearly randomly timed across municipalities due to administrative bottlenecks; (2) regular salary receipts (recurring, expected in both timing and amount); and (3) semi-annual bonus payments (received twice yearly, with timing known in advance but amount largely unknown — intermediate between SCP and salary in terms of expectedness).

Estimation Strategy. A two-way fixed effects regression with event-study leads and lags (windows of five weeks before and after each income event) is used to estimate consumption responses. Individual and week fixed effects absorb time-invariant heterogeneity and aggregate shocks (including COVID-19 emergency declarations). Standard errors are clustered at the individual level. For heterogeneity analysis, the income shock variable is interacted with individual characteristics from the survey (treated as proxies for persistent characteristics) and with time-varying log wealth and a liquidity constraint dummy (wealth below one-twelfth of annual income, proxying temporary circumstances).

Main Findings — Average MPC. Across all three income types, the on-impact MPC (week of receipt) is approximately 0.2: specifically γ₀ = 0.23 for the SCP (significant at 5%), 0.20 for salary, and 0.22 for bonus. When estimated jointly in a single regression, coefficients are γ_SCP = 0.21, γ_salary = 0.19, and γ_bonus = 0.21. This uniformity holds despite the sharply different properties of these shocks (transitory-unexpected vs. regular-expected vs. semi-known).

Main Findings — Heterogeneity. Significant heterogeneity in MPC is found primarily in the bonus subsample, where statistical power is greatest. The following cross-term coefficients are significant at the 5% level in the multivariate specification: (a) liquidity constraint dummy — positive and significant, indicating that individuals temporarily below one month’s income in deposits spend a larger fraction of their bonus, with a one standard deviation increase raising MPC by 0.094 (9.4 percentage points); (b) time discount rate (quantitative measure) — positive and significant, with a one standard deviation increase in impatience raising MPC by 0.084; (c) risk aversion (quantitative Arrow–Pratt measure) — positive and significant, conditional on controlling for wealth and liquidity, with a one standard deviation increase raising MPC by 0.031; (d) education — negative and significant irrespective of wealth/liquidity controls, with a one standard deviation increase in education reducing MPC by 0.041.

These magnitude estimates are sizable relative to the baseline MPC of approximately 0.2. For SCP and salary shocks, cross-term coefficients are uniformly insignificant at the 5% level, which the author attributes partly to smaller sample sizes and shorter observation windows for the SCP subsample.

Scope Conditions. The sample consists of Mizuho Bank account holders who receive salary payments directly into their Mizuho account, overrepresenting metropolitan areas and salaried workers relative to the national census. Wealth at Mizuho captures only deposits at that institution and excludes securities accounts, postal savings, and intra-household transfers. Age and gender do not yield significant cross-term coefficients in any specification; the self-reported survey measure of liquidity constraints (ability to cover one month’s income by drawing on savings, assets, or borrowing) is also insignificant, in contrast to the transaction-based liquidity constraint dummy.

In depth

Q1. Why is separating temporary circumstances from persistent characteristics important for MPC estimation?

Liquidity constraints — the standard proximate predictor of high MPC — are endogenous. An individual may be liquidity-constrained because of a temporary adverse income shock (bad luck) or because of persistently high impatience (high time discount rate) that leads to chronically low saving. If policy evaluation treats all constrained households symmetrically, it conflates these two very different channels. The paper follows Jappelli and Pistaferri (2020), Gelman (2021), and Aguiar, Bils, and Boar (2021) in arguing that both channels matter and that their relative contributions need empirical separation.

Q2. Why are Japanese bonuses particularly well-suited to identifying MPC heterogeneity?

Bonuses are paid semi-annually to most regular employees in Japan (accounting for roughly 15–30% of annual income), with timing known in advance but amount largely unknown until receipt. This intermediate nature — partially anticipated in timing but uncertain in magnitude — provides meaningful variation in consumption responses across individuals while maintaining a clean event-study design. The bonus subsample (3,722 individuals who received a bonus at least once) is also large enough to detect cross-term effects that are statistically insignificant in the SCP subsample (2,446 individuals) and in the salary analysis, likely due to greater statistical power.

Q3. How is the Arrow–Pratt measure of risk aversion constructed from the survey?

Respondents are asked whether they would purchase a lottery ticket at prize value Z = 100,000 JPY and price p = 10,000 JPY for varying winning probabilities α. The threshold α at which a respondent switches from accepting to rejecting identifies their risk attitude. The absolute risk aversion σ = −U’’/U’ is then calculated as (αZ² − 2αZp + p²) / (2(αZ − p)). This yields σ ranging from −4.5 (when α = 0.01, i.e., risk-loving) to 0.891 (when α = 1, i.e., refusing to buy even at a 90% win probability). Risk neutrality corresponds to σ = 0 (at α = 0.1).

Q4. How are time discount rates measured, and what is the range?

Respondents are asked the minimum amount X they would require to wait one week, one year, or ten years to receive a payment instead of receiving 100,000 JPY one week from now (using a one-week anchor to address hyperbolic discounting). The discount rate is calculated as r = X/100,000. The range is 0.01 (X = 100 JPY) to 100 (X = 10,000,000 JPY, i.e., would not wait even for 1,100,000 JPY in ten years). The unweighted average across one-week, one-year, and ten-year horizons is used as the composite discount rate in the multivariate specifications.

Q5. What is the transaction-based liquidity constraint dummy, and how does it differ from the survey-based measure?

The transaction-based dummy equals one if end-of-month deposits at Mizuho Bank (the previous month) are below one-twelfth of the individual’s annual income — i.e., if the individual holds less than one month’s equivalent income in liquid deposits. This is a time-varying measure. The survey-based measure asks respondents to self-report whether they could cover one month’s income by drawing on savings, selling assets, or borrowing. The transaction-based measure is significant at the 5% level in the bonus and salary heterogeneity regressions, while the survey-based measure is insignificant, indicating that the precise definition and data source of the liquidity constraint measure matters materially for detecting its effect on MPC.

Q6. What are the estimated on-impact MPC values for each income shock, and how stable are they across robustness checks?

The point estimates from the event-study regression (γ₀) are: 0.23 for SCP in the baseline sample (SCP recipients in 2020, N = 2,446 individuals), 0.20 for salary (all 5,282 survey respondents), and 0.22 for bonus (3,722 bonus recipients). In a robustness specification restricting to only year-2020 data for the SCP, γ₀ = 0.235; using cash withdrawals from ATMs as a proxy for consumption instead of total outflows, γ₀ = 0.162 for SCP. In a joint regression including all three income types simultaneously, γ_SCP = 0.21, γ_salary = 0.19, and γ_bonus = 0.21. The SCP MPC for the smaller second-wave subsample (200 individuals, 2021–22) is 0.104 and insignificant, consistent with insufficient statistical power rather than a structural difference.

Q7. Why is the similarity in MPC across the three shock types potentially surprising, and what does the paper say about it?

Standard theory predicts divergent MPCs: transitory unexpected windfalls (SCP) should have a higher MPC than permanent salary changes under the permanent income hypothesis, while Ricardian equivalence might reduce the MPC to fiscal transfers like the SCP if households anticipate future tax increases. The paper finds the MPCs are approximately equal (around 0.2 across all three types), and if anything the SCP MPC is slightly higher than the salary MPC. The paper acknowledges this uniformity without offering a structural explanation, using it primarily as a robustness check on the baseline estimate rather than a substantive puzzle to resolve.

Q8. Which personal characteristics are significantly associated with higher MPC, and in which income shock samples?

In the multivariate heterogeneity regression, significant cross-term coefficients at the 5% level are found exclusively in the bonus subsample (columns 5–6 of Table 6): the quantitative risk aversion measure (positive, coefficient 0.042–0.049), the quantitative discount rate (positive, coefficient 0.004), and education (negative, coefficient −0.034 to −0.037). The liquidity constraint dummy (transaction-based) is also positive and significant for bonuses. In the univariate robustness regressions (Table 7), the own-house dummy is negative and significant at 5% for bonuses (controlled and uncontrolled); discount rates for one-week and ten-year horizons are positive and significant at 5% for bonuses; risk aversion A (direct self-report) is negative and significant at 5% for SCPs in the uncontrolled specification.

Q9. Do age and gender matter for MPC heterogeneity?

No. In all specifications across all three income shock types, the cross-term coefficients on age and the male dummy are uniformly insignificant at the 5% level. The lack of significance for age and gender is noted as a notable result, since both are commonly used demographic proxies in heterogeneous agent models that assume they reflect economically meaningful differences in consumption behavior.

Q10. How does the paper quantify the economic magnitude of each significant heterogeneity factor?

Table 8 reports the product of each cross-term coefficient and the standard deviation of the corresponding variable. For the bonus subsample: a one standard deviation increase in the liquidity constraint dummy raises MPC by 0.094 (9.4 percentage points); a one standard deviation increase in the discount rate raises MPC by 0.084; a one standard deviation increase in risk aversion raises MPC by 0.031; and a one standard deviation increase in education reduces MPC by 0.041. All four magnitudes are described as sizable relative to the baseline MPC of approximately 0.2 (20%).

Q11. Why does the paper focus on bonuses for the heterogeneity analysis rather than the SCP?

The SCP events provide cleaner identification of transitory, exogenous income shocks (near-random timing due to municipal administrative bottlenecks, as documented by Kubota, Onishi, and Toyama 2021), but the subsample of SCP recipients is smaller (2,446 in 2020, 200 in the second wave), reducing statistical power for detecting heterogeneity in cross-term coefficients. The salary sample is large (5,282 individuals) but salaries are expected, recurring, and may partially update permanent income, complicating interpretation of cross-term estimates. Bonuses offer a balance: a relatively large subsample (3,722) and a partially unexpected income component, making them the most informative sample for heterogeneity analysis.

Q12. What are the main caveats and limitations the paper identifies?

Four caveats are noted. First, the personal characteristics from the survey — including time discount rates and risk aversion — are treated as exogenous, but they may themselves be endogenous to economic circumstances or short-term conditions at the time of the survey. Second, only Mizuho Bank deposits are observed; financial assets at other institutions (securities, postal savings) are missing, meaning the liquidity constraint measure understates true wealth for some respondents. Third, the sample is tilted toward metropolitan salaried workers and toward wealthier individuals compared to the full Mizuho customer base (median log wealth of 7.4 vs. 5.9 in Kubota et al. 2021). Fourth, the multiple-testing problem is acknowledged: with many cross-term tests conducted, some rejections of the null at the 5% level may be spurious.

Key Concepts

Marginal Propensity to Consume (MPC, on-impact). In this paper, MPC is operationalized as the coefficient γ₀ from the two-way fixed effects event-study regression — specifically, the fraction of an income shock spent during the same week the shock is received, estimated from total bank account outflows. This is a weekly, within-account measure, not a lifetime or annual consumption response.

Arrow–Pratt Absolute Risk Aversion (σ). A quantitative measure of risk preferences computed from the paper’s survey by eliciting the probability threshold α at which a respondent is indifferent between buying and not buying a lottery with prize Z = 100,000 JPY and price p = 10,000 JPY. Calculated as σ = (αZ² − 2αZp + p²) / (2(αZ − p)). Ranges from −4.5 to 0.891 in the sample, with σ = 0 indicating risk neutrality.

Time Discount Rate (r). Measured by asking respondents the minimum additional amount X (beyond 100,000 JPY) they would require to delay receipt by one week, one year, or ten years, with r = X/100,000. The paper uses the unweighted average of three horizon-specific rates as a composite measure. Ranges from 0.01 to 100 in the sample. Used as a proxy for impatience or myopia — a persistent personal characteristic.

Liquidity Constraint Dummy (transaction-based). A time-varying binary indicator that equals one if individual i’s end-of-month Mizuho Bank deposit balance in month t−1 is below one-twelfth of annual income at t−1 — i.e., less than one month’s equivalent income in liquid deposits. Distinguished in the paper from a survey-based self-report of liquidity constraints, which is found to be insignificant.

Special Cash Payment (SCP). The Japanese government’s COVID-19 pandemic transfer program, providing 100,000 JPY (approximately 800 USD) per person in 2020 (universal) and 100,000 JPY per child in 2021–22 (restricted to households with children under 18 and income below 9.6 million JPY annually). Used in this paper as a transitory, salient, and largely unexpected income shock because municipal administrative bottlenecks made the exact timing unpredictable and nearly random across households.

Two-Way Fixed Effects Event-Study Regression. The paper’s primary estimator, which includes individual fixed effects (controlling for time-invariant person-level heterogeneity) and week fixed effects (absorbing aggregate shocks such as COVID-19 emergency declarations and seasonal patterns). Event-study leads and lags (k = −5 to +5 weeks around each income receipt) allow pre-trend testing and tracing of the dynamic consumption response. Normalized to γ_{−1} = 0.

MPC Heterogeneity Cross-Term. A regression augmentation (equation 3 in the paper) in which the contemporaneous income shock X⁰_{it} is interacted with individual characteristic Z_{it}. The coefficient δ on this cross-term identifies how the MPC varies with Z — the marginal effect of characteristic Z on the MPC. Persistent characteristics (e.g., risk aversion, discount rate, education from the survey) and temporary circumstances (e.g., log wealth, liquidity constraint dummy from transaction data) are included as separate Z variables.

Market Regulation, Cycles, and Growth Dynamics in a Monetary Union

Mon, 01 Jan 0001 00:00:00 +0000

This paper develops a two-country currency union DSGE model with endogenous TFP growth and product and labor market frictions to assess how cross-country differences in market regulation affect long-run growth and business cycle dynamics. The central insight is that with endogenous growth, there is no reason to expect real income convergence within a monetary union: large shocks can lead to permanent changes in output and the real exchange rate through their effect on endogenous TFP, lifting the standard dichotomy between cycles and growth. Less regulated economies tend to have higher trend growth and recover faster from negative shocks because their institutional environment is more conducive to innovation and reallocation. Applied to the euro area financial and sovereign debt crisis, the model is consistent with the observed divergence of output and TFP paths between Northern and Southern member states, with the less reform-friendly Southern members experiencing higher inflation, lower employment, and disappointing TFP growth.

In depth

Q1. Why does endogenous growth break the convergence prediction?

With endogenous TFP growth, there is no reason to expect real income convergence within a monetary union because TFP growth depends on the institutional environment—including product and labor market regulations—which differs persistently across countries. In standard neo-classical models, capital flows toward lower-capital countries and convergence follows. But when TFP is endogenous and depends on regulations and innovation, countries with higher regulations face permanently lower TFP growth rates, and the absence of an exchange rate instrument prevents the usual adjustment mechanism from operating. The model thus provides a structural account of the non-convergence documented empirically for the euro area since 1999.

Q2. How do product and labor market regulations affect growth and cycle dynamics?

Product and labor market regulations affect both long-run trend growth (through their effect on steady-state innovation and TFP) and short-run dynamics (through their effect on how quickly economies adjust to shocks via factor reallocation). The paper documents empirically that less regulated euro area economies have higher R&D intensity and TFP growth rates. In the model, higher product market regulation reduces the incentive for firms to innovate and enter, while higher labor market regulation slows the reallocation of workers from declining to expanding sectors following a shock.

Q3. How do temporary shocks produce permanent output effects?

Temporary shocks—such as the risk premium shocks experienced by euro area countries during the financial and sovereign debt crisis—can lead to permanent reductions in the level of output and TFP through their effect on endogenous innovation and capital accumulation, producing hysteresis without any permanent shock to fundamentals. This mechanism lifts the standard dichotomy between cycles and growth: temporary financial disruptions that reduce investment and employment also reduce R&D and innovation, which lowers TFP permanently. The model thus provides a structural account of the ‘secular stagnation’ concerns following the euro area crisis.

Q4. What does the application to the euro area crisis show?

Applied to the euro area financial and sovereign debt crisis, the model is consistent with the observed divergence between Northern and Southern member states: the asymmetric risk premium shock hits less regulated Northern economies (which recover faster) and more regulated Southern economies (where output and TFP appear permanently lower) differently due to their different institutional environments. The model predicts that the divergence in output and TFP paths between Germany/France (back to pre-crisis trend) and Spain/Italy (on permanently lower paths) is consistent with the role of product and labor market regulation in mediating shock propagation, complementing the exchange rate inflexibility channel in standard currency union analyses.

Key concepts

endogenous TFP growth : TFP growth that depends on the institutional environment (product and labor market regulations) and on innovation decisions; key departure from standard DSGE models; breaks the cycle-growth dichotomy by allowing temporary shocks to permanently affect TFP levels. product market regulation (PMR) : regulations governing market entry, competition, and firm behavior in the product market; modeled here as affecting the incentive to innovate and enter new markets, thereby shaping steady-state TFP growth. labor market regulation (LMR) : regulations governing hiring, firing, and wage determination; modeled here as affecting the speed of labor reallocation following shocks, thereby shaping business cycle dynamics and recovery speed in the currency union. hysteresis : the persistence of shock effects on the long-run level of output or TFP beyond the duration of the shock itself; arises here through the effect of temporary demand contractions on endogenous innovation and TFP accumulation.

Market Segmentation through Information

Mon, 01 Jan 0001 00:00:00 +0000

This paper asks what market outcomes an information designer — modeled as an internet platform that knows consumers’ preferences — can achieve by choosing what information to disclose to competing oligopolistic firms who then make personalized price offers. The model features n firms each producing a single differentiated product at zero cost, a continuum of consumers with unit demand and multidimensional valuations (one per product), and a designer who commits to a mapping from consumer types to joint distributions over messages sent to firms before they play a simultaneous pricing game. The designer’s objective spans the full range from maximizing producer surplus to maximizing consumer surplus.

The paper establishes two main results. First, under a necessary and sufficient condition called Aggregate Incentive Compatibility (AIC), the designer can implement full surplus extraction by firms — the producer-optimal outcome — in which every consumer buys her most preferred product at a price exactly equal to her valuation for it, capturing 100% of available surplus for producers. The AIC condition requires, for each firm i and each candidate deviation price p_hat_i, that the infra-marginal losses firm i would bear on its natural customers (those in Ei who value i most) from lowering price to p_hat_i must be weakly greater than the maximum business-stealing profit available from consumers who prefer other products but have valuation for i above p_hat_i. The condition is easier to satisfy when consumer preferences are more polarized, i.e., when consumers have stronger relative preferences for their most-preferred product. When firms offer homogeneous products the condition fails everywhere and no information structure can generate any producer surplus — Bertrand competition drives all profits to zero under any signal structure.

Second, the paper characterizes the consumer-optimal information structure, which achieves the maximum possible consumer surplus across all equilibria induced by any information structure. The upper bound on consumer surplus is CS* = (total surplus) minus sum_i Pi*_i, where Pi*_i is the profit firm i can guarantee itself by ignoring the designer’s signal and setting the best uniform price assuming all rivals price at zero. This bound is tight: the designer can implement it by publicly partitioning consumers into groups by most-preferred product, inducing rival firms to price at marginal cost (zero) for consumers who prefer another firm’s product, and then applying the Bergemann-Brooks-Morris (2015) extremal segmentation within each firm’s natural customer set to preserve each firm’s guarantee profit while achieving efficiency.

The illustrative two-firm example shows the quantitative stakes concretely. With no information disclosure, firms charge 4/5 and total producer surplus is about 76% of total surplus S*, consumer surplus is just under 10% of S*, and some consumers are excluded. With full disclosure, producer surplus rises to about 81% of S* and consumer surplus to 19%. The producer-optimal information structure (Case 3) achieves 100% of S* as producer surplus by pooling consumers who prefer different products into the same message submarket, giving each firm an incentive to price for its highest-valuing customers and ignore the others. The consumer-optimal information structure (Case 4) brings producer surplus down to about 57% of S* — its guaranteed lower bound — and delivers roughly 43% of S* to consumers, an outcome unattainable by full disclosure alone.

Both producer-optimal and consumer-optimal outcomes are efficient: all consumers buy their most-preferred product in both cases. The paper further characterizes the full efficient frontier between consumer- and producer-optimal outcomes, showing that mixing the consumer-optimal and full-information structures (or consumer-optimal, full-information, and producer-optimal structures when the latter is implementable) spans every point on the frontier.

The model assumes firms will price-discriminate if they can, that the designer has full knowledge of consumer types, and that the game is played once. The core results extend to continuous type distributions as shown in Online Appendix B.2. The analysis is restricted to a monopoly platform; competition among platforms is left for future work.

Q: What is the central research question and why does the two-benchmark comparison used by antitrust authorities miss important possibilities?

A: The paper asks what market outcomes — combinations of consumer and producer surplus — an information designer (a platform) can achieve by choosing among all possible information structures, not just the two benchmarks of no-information and full-information. Antitrust analysis that compares only those two cases misses a vast middle ground: an intermediary can package information in ways that, for instance, implement perfect collusion (extracting all surplus as producer surplus) while appearing to use privacy-protective technologies, or can intensify competition well beyond the full-information benchmark to benefit consumers.

Q: What is the producer-optimal information structure and when does it exist?

A: A producer-optimal information structure is one that induces an equilibrium in which every consumer buys her most-preferred product at a price exactly equal to her valuation — full surplus extraction. It exists if and only if, for every firm i and every candidate deviation price p_hat_i, the Aggregate Incentive Compatibility (AIC) condition holds: the aggregate infra-marginal losses firm i would suffer on its natural customers Ei from lowering price to p_hat_i must be at least as large as the maximum business-stealing profit from consumers outside Ei who have valuation for i weakly above p_hat_i. This is a condition on the distribution of consumer valuations, not on the information structure per se.

Q: What is the economic mechanism behind the producer-optimal structure — how does pooling consumers implement full surplus extraction?

A: The designer assigns consumers who prefer product A to the same message submarket as consumers who prefer another product but have a lower valuation for A. Firm A is then price-recommended its highest-valuing customers’ willingness to pay. The presence of the “outside” consumers in the same message makes it unprofitable for firm A to deviate downward to capture them, because the infra-marginal loss on the natural customers exceeds the additional revenue. Simultaneously, the rival firm cannot identify and undercut for A’s natural customers because the messages do not allow it to distinguish them. The result is that each firm plays a niche strategy, setting price equal to the valuation of its highest-type natural customers and excluding the others from its offer.

Q: When does polarization of consumer preferences help achieve the producer-optimal outcome?

A: Proposition 1 states that if a producer-optimal information structure exists under distribution f, it also exists under any distribution f_tilde that is more polarized than f — where more polarized means the mass of consumers who prefer i and have valuation above any threshold for i increases, and the mass of consumers who prefer j but have valuation above that threshold for i decreases. Intuitively, polarization slackens the Firm IC constraints because it reduces the business-stealing temptation: fewer consumers with high cross-product valuations are available for firm i to capture by undercutting. Concrete continuous-distribution examples include: uniform over the unit square (producer-optimal always exists), Hotelling anti-correlated values (exists everywhere), and truncated normal with mean 1/2 — producer-optimal is feasible for all standard deviations sigma > 0.15.

Q: Why does the producer-optimal outcome fail entirely when products are homogeneous?

A: Proposition 2 states that when all consumer types have equal valuations across products (the support of f lies on the diagonal of V^n), then for any information structure and any induced equilibrium, every consumer buys at price zero and all firms earn zero profit. The logic extends the standard Bertrand undercutting argument: with homogeneous products, any positive price a firm charges is undercut by a rival who can always profitably steal demand, and this applies to any posterior distribution induced by any signal realization. Even private signals cannot prevent this outcome because no signal realization can give a firm a non-contestable position.

Q: How is the consumer-optimal information structure constructed, and what is its key economic logic?

A: Theorem 2 shows the consumer-optimal structure has three layers. First, consumers are partitioned into n groups by most-preferred product (Ei). Second, firms j not equal to i are induced — by publicly revealing which group a consumer belongs to — to set price zero for consumers outside their group, because competing for those consumers is hopeless when their preferred firm is identified. Third, within each Ei, consumers are further partitioned into submarkets using the Bergemann-Brooks-Morris (2015) extremal segmentation applied to residual valuations (theta_i minus the maximum of competing valuations), ensuring firm i earns exactly its guarantee profit Pi*_i. By holding each firm down to its guarantee profit, the residual goes to consumers, maximizing CS.

Q: What is the guarantee profit Pi*_i and how does it bound consumer surplus?

A: Pi*i is the maximum profit firm i can achieve by ignoring all designer signals and setting a single uniform price to all consumers, against the worst-case scenario in which all other firms price at zero. Formally, Pi*i = max{pi} sum{theta in Ei: theta_i - pi >= max_{j not equal i} theta_j} pi * f(theta). Since firm i can always achieve Pi*_i regardless of the information structure (by simply ignoring signals), no information structure can push firm i’s profit below Pi*_i. The sum of these guarantee profits across all firms provides a lower bound on total producer surplus — and therefore an upper bound on consumer surplus — achievable by any information structure.

Q: In the two-firm numerical example, what is the quantitative comparison across the four cases?

A: Total available surplus S* = 0.84. Under no information (Case 1): producer surplus approximately 76% of S*, consumer surplus just under 10% of S*, and consumers of types (3/5, 2/5) and (2/5, 3/5) do not trade. Under full disclosure (Case 2): producer surplus approximately 81% of S*, consumer surplus 19% of S*, efficient. Under the producer-optimal structure (Case 3): producer surplus = 100% of S* (all surplus extracted), consumer surplus = 0%, efficient. Under the consumer-optimal structure (Case 4): producer surplus approximately 57% of S*, consumer surplus approximately 43% of S*, efficient. All cases except Case 1 are efficient; the no-information case excludes some consumers from trading.

Q: Is the full-information disclosure structure consumer-optimal?

A: Not in general. Proposition 3 states that full information is consumer-optimal if and only if all consumers in Ei have identical residual valuations (theta_i minus their second-best alternative) — a condition that generically fails. When residual valuations within Ei are heterogeneous, the designer can do strictly better for consumers by applying the extremal segmentation within each Ei rather than revealing full information, which would allow firms to price-discriminate on individual residual valuations and extract more surplus.

Q: Can the designer trace out the entire efficient frontier between consumer- and producer-optimal outcomes?

A: Yes, under two conditions. First, by mixing the consumer-optimal structure (point A) with the full-information structure (point B) using fractions lambda and 1-lambda respectively, the designer can implement any point on the efficient frontier between A and B. Second, when the producer-optimal outcome (point C) is also implementable, mixing the full-information structure with the producer-optimal structure by applying them to fractions lambda and 1-lambda of the consumer population respectively spans every point between B and C. The key insight is that the AIC condition, if it holds for f, also holds for any rescaled sub-distribution of f (it is scale-invariant), so the producer-optimal sub-problem remains feasible.

Q: What are the regulatory implications of the analysis?

A: The paper identifies a fundamental tension: banning information use sacrifices efficiency (some consumers excluded, wrong products purchased), but unrestricted use permits platforms to implement perfect collusion through information design. Critically, the paper shows that privacy-enhancing technologies that pool consumers into cohorts — like Google’s Privacy Sandbox — are equally consistent with the producer-optimal (collusive) and consumer-optimal (competitive) structures; the two differ only in the principle by which consumers are grouped. The paper suggests regulators could mandate that consumers in the same cohort share the same most-preferred product and that information be disclosed symmetrically across firms — the defining features of the consumer-optimal structure. This would block the producer-optimal grouping (which mixes consumers with different most-preferred products) while preserving efficiency.

Q: How does this paper relate to and extend Bergemann, Brooks, and Morris (2015)?

A: Bergemann, Brooks, and Morris (2015) characterize achievable consumer and producer surplus outcomes when a designer discloses information to a single monopolist who can price-discriminate. The present paper extends this to oligopoly, where competition between firms creates both additional constraints (firms may undercut each other) and additional instruments (the designer can play firms against each other). The consumer-optimal construction directly applies the BBM (2015) extremal segmentation within each firm’s natural customer set Ei, but the outer layer — using public revelation of group membership to induce rival firms to price at zero — is new and arises specifically from the oligopoly setting.

Information designer: An entity (modeled as a platform) that observes the full joint distribution of consumer valuations over all products and commits, before firms price, to a mapping from consumer types to joint distributions over messages sent to competing firms; the designer can be interpreted as an internet intermediary choosing how to package and share consumer data.

Aggregate Incentive Compatibility (AIC): The necessary and sufficient condition on the distribution of consumer valuations for the existence of a producer-optimal information structure; for each firm i and each candidate deviation price p_hat_i, the aggregate infra-marginal losses firm i would incur on its natural customers by lowering price to p_hat_i must weakly exceed the maximum revenue firm i could gain by attracting consumers who prefer rival products but have valuation for i above p_hat_i.

Producer-optimal information structure: An information structure that induces an equilibrium in which every consumer buys her most-preferred product at a price exactly equal to her full valuation for it, extracting 100% of available surplus as producer surplus — the outcome equivalent to the firms’ fully collusive joint surplus maximum.

Consumer-optimal information structure: An information structure that achieves the maximum consumer surplus attainable across all equilibria induced by any information structure, holding each firm to its guarantee profit Pi*_i (the best uniform-price profit the firm can secure by ignoring all signals) and allocating all residual surplus to consumers while maintaining allocative efficiency.

Guarantee profit (Pi*i): The maximum profit firm i can secure unilaterally by ignoring the designer’s signal and setting an optimal uniform price, computed against the worst case in which all rival firms price at zero; it equals max{pi} times the sum of f(theta) over all types in Ei for which theta_i minus pi exceeds all rival valuations.

Polarization of preferences: A stochastic dominance condition under which, relative to a baseline distribution, the mass of consumers who prefer product i and have high valuations for it increases while the mass of consumers who prefer rival products but have high valuations for i decreases; higher polarization weakens the Firm IC constraints and makes the producer-optimal outcome easier to implement (Proposition 1).

Separation and Consistency: Two structural properties any producer-optimal information structure must satisfy: Separation requires that the messages firm i sends to different consumers in Ei who have distinct valuations for i are disjoint in support; Consistency requires that every message firm i can send to any consumer type is contained in the union of messages firm i sends to consumers in Ei, preventing firm i from ever inferring that a consumer prefers a rival’s product.

Markov-Perfect Equilibria in Differential Games—With an Application to Climate Policy

Mon, 01 Jan 0001 00:00:00 +0000

This paper by Jaakkola and Wagener addresses a long-standing open problem in the theory of differential games: how to make Markov-perfect equilibria (MPE) well-defined when best-response policy functions are generically discontinuous in the state variable. The paper’s primary contribution is methodological — it introduces discontinuous Markovian strategies into differential games and proves that, under this extension, (i) payoffs can always be computed and (ii) unique best responses exist for almost all strategy profiles of opponents. The authors then apply this framework to derive the entire set of symmetric MPE in a canonical non-cooperative climate mitigation model (van der Ploeg and de Zeeuw, 1992), finding welfare results that are quantitatively large and policy-relevant.

The technical difficulty the paper resolves is that discontinuous policy functions can cause the ordinary differential equation governing state dynamics to lack classical solutions, making payoffs undefined. Prior literature responded either by restricting strategies to continuous functions — which rules out many natural best responses and imposes an unjustified constraint on the strategy space — or by allowing discontinuities only in “admissible” profiles, which makes each player’s strategy set depend on opponents’ choices and thus violates the basic structure of non-cooperative game theory. The authors’ solution is to adopt Filippov solutions (differential inclusions that convexify dynamics at discontinuities), so that a well-defined state trajectory and payoff exist for every strategy profile, not just admissible ones.

The paper’s three main theorems cover existence (Theorem 1), characterization (Theorem 2), and symmetric equilibrium conditions (Theorem 3). Theorem 1 establishes that, given any fixed set of potential jump points, the best-response correspondence maps almost all opponent strategy profiles to a unique Markovian best response — “almost all” in the sense of prevalence on infinite-dimensional function spaces. Theorem 2 provides necessary and sufficient conditions for a strategy to be a best response: it must satisfy the maximum principle where the value function is differentiable, value discontinuities may only occur at jump points of opponents’ strategies where the player cannot unilaterally push the state back to the low-stock side, and the value at any such interface must exceed the static optimum. Theorem 3 translates these into conditions for symmetric Nash equilibrium.

Applied to the van der Ploeg–de Zeeuw climate model — N symmetric countries choosing emissions a_i, with carbon stock x evolving as x-dot = sum(a_i) - deltax, and flow utility u(x, a_i) = a_i - (1/2)a_i^2 - dx — the paper characterizes the complete set of symmetric MPE. The unique continuous globally defined equilibrium (the linear MPE, previously established by Rowat 2007) is shown to be weakly Pareto-dominated by every other MPE with a continuous value function. The best equilibria feature discontinuous strategies that act like stock-conditioned trigger strategies: when the carbon stock falls below a target steady state x, players respond with a discrete upward jump in emissions to rapidly return the economy to x*; when carbon rises above x*, players increase emissions only gradually, creating a threat of drifting to a higher-pollution steady state that disciplines deviations. In a calibrated example with N=10, delta=0.02, rho=0.02, and damage parameter d=0.5, the linear equilibrium steady state is approximately 2.5 times the first-best level, while the best continuous-value MPE steady state is approximately 1.2 times the first-best level. Choosing the best equilibrium rather than the linear equilibrium closes between 50 and 100 percent of the welfare gap to the first-best outcome, depending on initial conditions. The paper also identifies particularly bad equilibria involving value-function discontinuities — coordination failures in which no single country can unilaterally stop the carbon stock from rising past a threshold — that can yield welfare outcomes worse than the linear equilibrium at high carbon levels.

The scope of the methodological results covers differential games with a single state variable and strategies that are real-analytic except at finitely many points. Extension to multiple state variables is left for future work. The climate application is restricted to the symmetric linear-quadratic van der Ploeg–de Zeeuw framework, chosen to facilitate comparison with prior literature.

Q: What is the fundamental technical problem with MPE in differential games that this paper resolves?

A: In differential games with Markovian strategies, best-response policy functions are generically discontinuous in the state variable. Discontinuous right-hand sides in the state dynamics ODE can prevent existence or uniqueness of classical solutions, making payoffs undefined for some strategy profiles. Prior literature either restricted attention to continuous strategies (causing non-existence of best responses to many profiles) or defined “admissible” strategy sets that depend on opponents’ choices (violating non-cooperative game theory structure). This paper resolves both problems for the single-state-variable case.

Q: How does the paper make payoffs well-defined under discontinuous strategies?

A: The paper adopts Filippov solutions — differential inclusions that replace the dynamics at a discontinuity point with a convex hull of the left and right limits. At a “push-push” discontinuity (where dynamics push the state toward the jump point from both sides), the Filippov solution remains at the jump point and flow payoffs are a weighted average of left and right actions. This ensures a well-defined trajectory and payoff for every strategy profile, not just “admissible” ones, restoring the standard non-cooperative game-theoretic structure.

Q: What does Theorem 1 establish, and what does “almost all” mean in this context?

A: Theorem 1 establishes that, for any fixed collection of jump points, each player has a unique Markovian best response to almost every profile of opponents’ strategies. “Almost all” is in the sense of prevalence on infinite-dimensional function spaces (following Hunt, Sauer, and Yorke 1992): the set of profiles for which a unique best response fails to exist is shy (measure-zero analog in infinite dimensions) and nowhere dense. This resolves the long-standing open problem of making MPE well-founded in differential games.

Q: What are the necessary and sufficient conditions for a best response given by Theorem 2?

A: A strategy phi_i is the best response to opponents’ profile if and only if: (i) at all points where the value function is differentiable, the strategy satisfies the maximum principle; (ii) the value function is decreasing in the state (monotonicity); (iii) value discontinuities may occur only at opponents’ jump points where player i cannot unilaterally move the state back to the low-stock region; (iv) at any such interface, the value must be at least as large as the static optimum u(x, a_i)/rho; and (v) the value is differentiable at push-push steady states. These conditions extend the standard maximum principle with local requirements that restrict which discontinuities are possible.

Q: What is the van der Ploeg–de Zeeuw model and why is it used here?

A: The van der Ploeg–de Zeeuw (1992) model has N symmetric countries choosing emissions a_i, with carbon stock evolving as x-dot = sum(a_i) - delta*x, and flow utility u(x, a_i) = a_i - (1/2)a_i^2 - dx. It is linear-quadratic, so a linear MPE exists and is analytically tractable, and prior literature (Dockner and Long 1993; Rowat 2007; Dockner and Wagener 2014) has studied it extensively. The paper uses it as a benchmark to demonstrate that the new methods yield novel and economically important results for even well-understood models.

Q: What is the linear equilibrium and why does it produce poor welfare outcomes?

A: The linear equilibrium phi_L(x) = alpha + beta*x, with beta negative, is the unique continuous globally defined MPE (Rowat 2007). In it, emissions decrease with the carbon stock because each player anticipates that opponents will also reduce emissions when carbon is high. This strategic substitutability creates adverse dynamic free-riding: players try to exploit the fact that high carbon stock will cause opponents to cut back, so each has an incentive to emit more when carbon is low. In the calibrated example, the linear equilibrium steady state is approximately 2.5 times the first-best level.

Q: What do the best equilibria look like, and why do they achieve high welfare?

A: The best equilibria feature a target steady state x* near the first-best level and a discontinuous upward jump in emissions when carbon falls slightly below x*. This threat rapidly returns any carbon reduction back to x*, eliminating the strategic incentive to free-ride on others’ reductions. When carbon rises above x*, emissions increase only slightly, causing the economy to drift slowly toward a higher-pollution steady state — the threat of this bad outcome disciplines overshooting. This mechanism is analogous to a trigger strategy but is conditioned on the stock level rather than on past actions, making it compatible with Markovian strategies.

Q: How large are the welfare gains from the best equilibrium relative to the linear equilibrium?

A: In the calibrated example with N=10, delta=0.02, rho=0.02, and d=0.5, the best continuous-value MPE steady state is approximately 1.2 times the first-best level, compared to 2.5 times for the linear equilibrium. Choosing the best equilibrium closes between 50 and 100 percent of the welfare gap between the linear equilibrium and the first-best outcome, depending on initial conditions. The paper characterizes this as a quantitatively large, first-order welfare improvement.

Q: What are “coordination failure” equilibria and when do they arise?

A: Coordination failure equilibria feature discontinuities not only in the strategy (emission rate) but also in the value function itself. They arise when no single country can unilaterally prevent the carbon stock from rising past a threshold — formally, when N * a_max < delta * x at the discontinuity point. In such cases, if opponents are emitting heavily, no individual country can stop atmospheric carbon from rising even if it emits nothing, making heavy emission a best response. All players following this logic simultaneously produce a self-fulfilling collapse to high emissions. At high carbon levels these equilibria can yield welfare outcomes worse than the linear equilibrium.

Q: What is the paper’s main policy implication for climate negotiations?

A: The paper argues that international climate negotiations should be understood as a coordination problem over which of many MPE is played, rather than as bargaining over a limited cooperative surplus in a dynamic prisoners’ dilemma. Since the best equilibria are self-enforcing (they are Nash equilibria, not cooperative solutions), they do not require external enforcement. The paper suggests effective agreements may involve threshold-based commitments — sharp decarbonisation if a carbon target is met, but acceptance of a substantially higher stabilisation target (e.g., 2.5 degrees C rather than 2 degrees C) if the first target is missed — to create the discontinuous strategic incentives that support good equilibria.

Q: How does the paper handle the previously identified “local MPE” that could not be extended to the entire state space?

A: Prior work (Dockner and Long 1993; Rubio and Casino 2002; Dockner and Wagener 2014) constructed nonlinear equilibria that were only locally defined, and the validity of such equilibria was questioned (Rowat 2007; Bernhard 2024) because they were undefined on the full state space. The present paper’s framework allows discontinuous strategies, so these locally defined equilibria can be extended into globally defined, discontinuous MPE. Most previously discovered equilibria are shown to be nested within the larger set of all symmetric MPE identified here.

Q: What mathematical tools are used to prove the main results?

A: The proofs rely on the theory of viscosity solutions to Hamilton-Jacobi-Bellman equations (Bardi and Capuzzo-Dolcetta 2008), building on and extending results of Barles, Briani, and Chasseigne (2013, 2014) on optimal control with discontinuous dynamics. A key departure from Barles et al. is that the paper cannot assume controllability of the dynamics near discontinuities without imposing undue restrictions on opponents’ strategies. The application of these results to a fixed-point condition of the best-response correspondence to construct MPE conditions is described as entirely novel.

Q: What are the scope conditions and limitations of the methodological results?

A: The main results (Theorems 1–3) apply to differential games with a single state variable and strategies that are real-analytic except at finitely many points with one-sided derivatives everywhere. The climate application is further restricted to the symmetric linear-quadratic van der Ploeg–de Zeeuw framework. Extension to multiple state variables is acknowledged as future work. The welfare calibration results are specific to the parameter values N=10, delta=0.02, rho=0.02, d=0.5.

Markov-perfect equilibrium (MPE): A Nash equilibrium in Markovian strategies, where each player’s strategy conditions only on the current state variable and not on the history of play. The paper makes this concept well-founded in differential games by allowing discontinuous strategies, ensuring payoffs can be computed for all strategy profiles and unique best responses exist almost everywhere.

Filippov solution: A solution concept for ordinary differential equations with discontinuous right-hand sides, which replaces the dynamics at a discontinuity point with a convex hull of the left and right limits. Used in this paper to define well-specified state trajectories and payoffs even when players’ strategies have jumps, eliminating the need to restrict strategy sets to “admissible” profiles.

Discontinuous Markovian strategy: A policy function phi: X -> A that maps the state to an action and is real-analytic except at finitely many points, with well-defined one-sided derivatives everywhere. The key innovation of the paper — allowing such strategies makes differential games well-behaved as standard non-cooperative games while capturing the generically discontinuous nature of optimal policy functions.

Push-push steady state: A steady state at a discontinuity point of a strategy where the dynamics push the state toward that point from both sides. Under Filippov solutions the state remains at such a point, with flow payoffs being a weighted average of left and right actions. Theorem 2 requires the value function to be differentiable at these points in equilibrium.

Coordination failure equilibrium: An MPE featuring discontinuities in both the strategy and the value function, arising when no single player can unilaterally move the state across a threshold. At high carbon levels, if opponents emit heavily, individual emission cuts are ineffective; heavy emission becomes a best response for all, sustaining a self-fulfilling high-emission outcome. These equilibria can yield welfare outcomes worse than the linear equilibrium.

Linear equilibrium: The unique continuous globally defined symmetric MPE in the van der Ploeg–de Zeeuw model, characterized by emissions decreasing linearly in the carbon stock. It involves adverse strategic substitutability — each player reduces emissions in response to high carbon because opponents do likewise — and is weakly Pareto-dominated by every MPE with a continuous value function.

Skiba point: A state at which the optimal policy is discontinuous because the value function has distinct left and right derivatives, corresponding to the boundary between two basins of attraction with different long-run outcomes. In this paper, the steady state of a best equilibrium is a Skiba-type point: below it, emissions jump up to return rapidly to the target; above it, emissions increase only gradually.

Markups Across Space and Time

Mon, 01 Jan 0001 00:00:00 +0000

Anderson, Rebelo, and Wong study the behavior of markups in the retail sector across regions and over time, using a combination of firm-level Compustat data and product-level scanner data from two large retailers — one operating over 100 stores across U.S. states (quarterly data from 2006 Q1 to 2009 Q3, covering roughly 3.6 million SKU-store pairs across 79 product categories) and one operating hundreds of stores across Canadian provinces (quarterly data from 2016 Q1 to 2018 Q4, covering 15.6 million item-store pairs across 41 product groups). Markups are measured using gross margins — sales minus cost of goods sold as a fraction of sales — computed at the product level using the replacement cost for every item. This measurement approach is appropriate for retail because cost of goods sold accounts for over 80 percent of total retail firm costs, making it a reliable proxy for marginal cost. The replacement cost data, available at the store level, is the cost used by managers in actual pricing decisions, distinguishing these datasets from typical scanner data that contain only average costs.

The paper documents five main facts. First, markups are remarkably stable over time and display a mild procyclical pattern. At the aggregate level, gross margins are roughly acyclical or mildly procyclical while sales and cost of goods sold are highly procyclical. The elasticity of gross margins with respect to real GDP is statistically insignificant at both the aggregate and firm level. The conditional response of gross margins to high-frequency monetary policy shocks and oil price shocks is also statistically insignificant, while net operating profit margins fall significantly in response to both shocks. Operating profit margins are 3.4 times more volatile than gross margins at a quarterly frequency, and sales and costs are roughly 2.6 times more volatile.

Second, there is large regional dispersion in gross margins. A variance decomposition shows that the regional variance of gross margins (0.103) is substantially larger than the time-series variance (0.013), with a near-zero covariance between the two components. Third, regions with higher incomes and more expensive houses have higher markups — gross margins are positively correlated with log household income and log median house value in both the U.S. and Canadian data.

Fourth, these higher regional markups do not result from less intense competition or regional differences in marginal costs. Gross margins are uncorrelated with the Herfindahl index (a measure of competition) and with a rural dummy (a proxy for higher transportation costs). The cyclicality of markups is acyclical or mildly procyclical regardless of whether the underlying product costs are themselves acyclical, procyclical, or countercyclical.

Fifth, and most distinctively, regional variation in markups arises from differences in assortment composition across regions rather than from deviations from uniform pricing. A decomposition of regional gross margin variance confirms that the dominant component is the term capturing differences in product assortment across markets; the term capturing differences in gross margins for the same item — which would be nonzero under geographic price discrimination — accounts for very little of the regional variation. When the same item is available in different regions, the retailer charges a uniform price, consistent with Della Vigna and Gentzkow (2019).

To rationalize these five facts, the authors propose a model with non-homothetic, quadratic preferences (following Melitz and Ottaviano 2008). In the model, higher-productivity regions choose higher-quality goods, which have less elastic demand and therefore higher markups. The markup is procyclical with respect to productivity shocks (A) but acyclical with respect to labor supply shocks (N), so a mixture of both types of shocks produces mildly procyclical markups. The model generates uniform pricing across regions for the homogeneous good, with regional markup differences arising through quality and assortment selection rather than price discrimination.

Q: How do the authors measure markups, and why is this approach appropriate for retail? A: Markups are measured as gross margins — (sales minus cost of goods sold) divided by sales — computed at the product level using the replacement cost for every item. This is appropriate for retail because cost of goods sold is the predominant variable cost, accounting for over 80 percent of total retail firm costs. The replacement cost is the marginal cost concept used by managers in pricing decisions and is available at the store level rather than as a national average.

Q: What is the cyclical behavior of gross margins at the aggregate retail level? A: Gross margins are roughly acyclical or mildly procyclical. Sales and cost of goods sold are highly procyclical, suggesting that the business cycle primarily affects quantities sold rather than markups. Operating profit margins are 3.4 times more volatile than gross margins at a quarterly frequency, while sales and costs are roughly 2.6 times more volatile.

Q: What is the conditional response of gross margins to monetary policy and oil price shocks? A: The response of gross margins to both high-frequency monetary policy shocks (identified from Federal Funds futures data) and oil price shocks (identified via the Ramey-Vine 2010 VAR approach) is statistically insignificant. In contrast, net operating profit margins fall in a statistically significant manner in response to both types of shocks, indicating that fixed cost absorption rather than markup adjustment drives profit volatility.

Q: How large is the regional dispersion in gross margins relative to their time-series variation? A: The variance decomposition shows that the regional variance of gross margins is 0.103, compared to a time-series variance of only 0.013, with a covariance term close to zero. The vast majority of gross margin variation is therefore cross-sectional rather than time-series.

Q: What variables explain the regional variation in gross margins? A: In the U.S. data, gross margins are positively correlated with log household income and log median house value. Gross margins are uncorrelated with the Herfindahl index (a competition measure) and with the rural county dummy (a transportation cost proxy). Canadian data confirms the positive correlation between gross margins and both log household income and log median house value.

Q: What is the mechanism through which higher-income regions have higher markups? A: Regional markup differences are driven by assortment composition differences, not price discrimination. When the same item is sold in multiple regions, it sells at a uniform price. Higher-income regions carry different (higher-quality, higher-margin) products. The correlation between unique items sold and regional household income is 0.42 for the Canadian retailer and 0.17 for the U.S. retailer.

Q: How is the variance of regional gross margins decomposed into assortment versus pricing components? A: The variance decomposition separates total regional gross margin variance into: (1) a term for differences in gross margins for the same item across regions (would be nonzero with geographic price discrimination), (2) a term for differences in assortment composition holding gross margins fixed, and (3) an interaction term plus covariance terms. The dominant term is the assortment composition component; the same-item price difference term accounts for very little of the regional variation.

Q: Does the acyclicality of gross margins hold for products with procyclical costs? A: Yes. The authors divide products into those with acyclical, procyclical, and countercyclical costs and show (Table 7) that gross margins are acyclical or mildly procyclical for all three groups in both the U.S. and Canadian data. This implies that retailer pricing behavior contributes to price inertia even for products whose wholesale costs move with the cycle.

Q: What fraction of gross margin changes are active versus passive? A: In the U.S. data, 91 percent of margin changes are active (resulting from price changes, regardless of whether replacement cost has changed); 9 percent are passive (replacement cost changes with no price change). In the Canadian data, 93 percent of changes are active. Both the probability of active margin changes and the size of margin changes are acyclical with respect to unemployment and local house prices.

Q: How does the Hall approach compare to gross-margin-based markup estimates? A: When the Hall approach is implemented using output elasticities (deflating sales by a product-level price deflator to obtain quantity), the resulting markup estimates are very close to those from gross margins — the ratio is 1.014 for the U.S. firm and 0.991 for the Canadian firm. However, when revenue elasticities are used instead of output elasticities (the common practice in the literature due to data limitations), the implied markup is 14 percent lower for the U.S. firm and 13 percent lower for the Canadian firm, confirming the bias documented by Bond et al. (2020).

Q: What are the key features of the theoretical model and what facts does it explain? A: The model uses non-homothetic quadratic preferences (Melitz-Ottaviano form) in which demand elasticity falls as consumption quality rises. Higher-productivity regions optimally consume higher-quality varieties, which face less elastic demand and hence carry higher markups. The markup is procyclical in productivity (A) with an elasticity less than one (incomplete cost passthrough) and acyclical in labor supply (N), so a mixture of shocks generates mild procyclicality. Uniform pricing across regions for the homogeneous good holds by construction, and regional markup differences arise through quality-assortment selection.

Q: Which existing macroeconomic models are consistent with the time-series evidence, and which are not? A: The evidence is inconsistent with models featuring countercyclical markups (Rotemberg-Woodford 1992 imperfect competition, Ravn-Schmitt-Grohe-Uribe deep habits, Jaimovich-Floetotto entry-exit, and standard New Keynesian models with sticky prices and procyclical marginal costs). The time-series evidence is consistent with models featuring sticky retail prices and acyclical marginal costs (Nakamura-Steinsson 2010, Coibion-Gorodnichenko-Hong 2015) and models with price and wage rigidities at the manufacturing level (Erceg-Henderson-Levin 2000, Christiano-Eichenbaum-Evans 2005). Mildly procyclical search models (Alessandria 2009) are also consistent when procyclicality is mild.

Q: Which existing trade and regional models are consistent or inconsistent with the regional evidence? A: The spatial price discrimination models of Greenhut-Greenhut (1975) and Thisse-Vives (1988), which predict higher markups in less competitive regions, are inconsistent with the data. The Bertoletti-Etro (2017) non-homothetic model predicts that regional markup variation is driven by deviations from uniform pricing, which is also inconsistent. The Fajgelbaum-Grossman-Helpman (2011) model predicts countercyclical markups when costs are procyclical, contradicting the time-series results. Most existing macroeconomic models rely on homothetic preferences, predicting markups independent of regional income, inconsistent with the regional facts.

Q: What are the scope conditions on the measurement approach? A: Gross margins are valid proxies for markups only in the retail sector, where cost of goods sold is the dominant variable cost (over 80 percent of total costs). In manufacturing, where labor and other costs represent a larger fraction of total variable costs, gross margins would not be a reliable markup measure. The product-level scanner data cover the 2006-2009 period for the U.S. and 2016-2018 for Canada; the U.S. sample includes a recession while the Canadian sample covers a moderate expansion.

Gross margin as markup proxy: The ratio of (sales minus cost of goods sold) to sales, computed at the product level using the replacement cost for each item at each store and time period. Used as a proxy for the price-cost markup because cost of goods sold is the dominant variable cost in retail (over 80 percent of total costs), and the replacement cost is the marginal cost concept managers use in pricing decisions.

Replacement cost: The cost at which the retailer would replenish a unit of inventory at current prices, available at the store level in the scanner datasets. Distinct from average historical cost and used here as a direct proxy for marginal cost, eliminating one of the main sources of markup mismeasurement in prior empirical work.

Assortment composition: The set of products stocked and the expenditure weights of those products within a region. The paper’s central mechanism for regional markup variation — higher-income regions carry different (higher-quality, higher-margin) goods rather than charging different prices for the same goods.

Uniform pricing: The practice of charging identical prices for the same item across different geographic regions. Confirmed empirically in both the U.S. and Canadian scanner datasets, and embedded structurally in the theoretical model for the homogeneous good.

Active versus passive margin changes: A decomposition of gross margin changes into active changes (arising from retailer price decisions, irrespective of cost changes) and passive changes (arising when replacement cost changes but the retailer holds price fixed). Ninety-one percent of U.S. margin changes and 93 percent of Canadian changes are active.

Non-homothetic quadratic preferences: The utility specification (following Melitz and Ottaviano 2008) in which the absolute value of the own-price demand elasticity falls as quality consumption rises. This property implies that higher-quality goods carry higher markups and that richer regions, which demand higher quality, have higher average markups — the key mechanism linking income to markups in the model.

Hall approach to markup estimation: A production-function-based method in which the markup equals the output elasticity with respect to a variable input divided by that input’s cost share in revenue. The paper shows this yields estimates close to gross-margin estimates when implemented with true output quantities, but produces markups roughly 13-14 percent lower when revenue is substituted for output (a common approximation), confirming the Bond et al. 2020 bias.

Markups: A Search-Theoretic Perspective

Mon, 01 Jan 0001 00:00:00 +0000

What this paper finds — and why it matters

Across macroeconomics, market power is almost always modelled with the Dixit–Stiglitz (1977) monopolistic-competition framework, in which a seller’s markup is pinned down by how substitutable buyers perceive its variety to be. This paper instead derives a closed-form formula for the equilibrium distribution of markups in the search-theoretic model of imperfect competition of Butters (1977), Varian (1980) and Burdett–Judd (1983), where a seller has market power not because its good lacks substitutes but because search and information frictions leave some buyers unable to reach the cheapest seller. In this model markups are strictly positive even though all sellers’ varieties are perfect substitutes, are dispersed even when all sellers operate the same technology, and — once sellers differ in marginal cost — can be increasing, decreasing, or constant in a seller’s size; yet the equilibrium is efficient. Menzio proves an “anything-goes” result: any twice-differentiable markup function can arise as an equilibrium for an appropriate choice of parameters, so a Dixit–Stiglitz model can always reproduce the search model’s markups — but only with reduced-form buyer preferences that depend on the search model’s deep parameters and are therefore unstable to policy changes (a Lucas-critique problem), and that would (incorrectly) read those markups as symptoms of inefficiency and a case for corrective subsidies. The paper’s central and deliberately modest claim is a cautionary one for macroeconomics: because two well-established models can both match observed markups yet imply opposite conclusions about welfare, optimal policy, and counterfactuals, markup data alone cannot identify the macroeconomic consequences of market power — one also needs evidence on the origin of that market power. The results are theoretical (unit demand, constant returns to scale, a Poisson contact process); the sharp comparative statics are derived for a log-uniform cost distribution, and the same logic extends to labor-market markdowns in the Burdett–Mortensen (1998) model.

In depth

Q1. What two theories of market power does the paper compare, and how do they differ at root?

The paper contrasts the Dixit–Stiglitz (1977) monopolistic-competition framework, in which market power comes from product differentiation, with the search-theoretic framework of Butters (1977), Varian (1980) and Burdett–Judd (1983), in which market power comes from buyers’ limited choice sets. In Dixit–Stiglitz, “every seller is a monopolist of its own product variety,” and the size of markups “is determined by the substitutability of different varieties in the buyers’ utility function.” In the search-theoretic framework, by contrast, “a seller has market power not because it carries a good that has no perfect substitutes, but because (some) buyers do not have every seller in their choice set due to informational frictions … or physical frictions,” so markups are instead “determined by the distribution of the size of buyers’ choice sets.” Menzio motivates the second view with retail examples (e.g., the same bottle of Heinz ketchup sold at many stores at different markups), where it strains credulity that buyers see one store’s bottle as a poor substitute for the identical bottle elsewhere.

Q2. What is the equilibrium markup formula when all sellers are identical?

With homogeneous sellers, a seller at quantile x of the price distribution charges a gross markup μ(x) = 1 + (u/c − 1)·e^(−λ(1−x)), the product of a monopoly markup and a rank-dependent discount factor. Here u is the buyer’s valuation, c the common marginal cost, and λ the Poisson coefficient for the number of sellers a buyer contacts — “the average number of sellers with which a buyer is in contact, and, in this sense, … a measure of the extent of competition in the market.” The term u/c − 1 is “the net markup for a monopolist.” The discount factor e^(−λ(1−x)) “is equal to 1 for the seller at the top of the price distribution” (no discounting) and falls to its minimum e^(−λ) for the seller at the bottom; a higher λ makes markups decline more steeply down the price ranking. The equilibrium price distribution and its support are derived in closed form (F(p) and the lowest price p_ℓ = c + e^(−λ)(u − c)), and the equilibrium is shown to exist, be unique, and be efficient (Proposition 1).

Q3. Why are markups positive and dispersed even when goods are perfect substitutes and technology is identical?

Markups are positive because search frictions leave some buyers “captive” — in contact with only one seller — which forces equilibrium profits, and hence prices, strictly above marginal cost; markups are dispersed for the same reason there is price dispersion in these models — non-captive buyers prevent any mass point in the price distribution. As Menzio puts it, “sellers meet a positive measure of buyers that are captive, in the sense that these buyers cannot purchase from any other seller,” so “prices must be strictly above marginal cost”; simultaneously, the positive measure of non-captive buyers “implies that the price distribution cannot have any mass points above marginal cost.” The two facts together require sellers to post different prices and therefore charge different markups, despite identical goods and identical technology.

Q4. In the homogeneous-seller case, how do markups relate to a seller’s price and size?

With identical sellers, markups are increasing in a seller’s price and decreasing in a seller’s size. Because μ(x) and the posted price p(x) both rise with rank x while quantity sold q(x) = bλ·e(−λx) falls with x, “markups are increasing in the seller’s price” and “decreasing in the seller’s size.” Menzio notes this is the opposite of “Marshall’s second law of demand,” and that it implies larger sellers face a higher elasticity of demand. He stresses this counterfactual pattern (empirically, larger firms tend to charge higher markups) is exactly why the paper goes on to add cost heterogeneity.

Q5. What changes when sellers differ in marginal cost?

With heterogeneous marginal costs, the markup formula gains an extra term reflecting that higher-ranked (higher-cost) firms put less competitive pressure on a seller, and equilibrium markups need no longer be decreasing in size — they can be increasing, decreasing, or hump-shaped. A seller’s price is a strictly increasing function of its cost (Lemma 3), so its rank in the price distribution equals its rank in the cost distribution. The generalized markup (eq. 3.22) adds, to the monopoly-times-discount term, “the additional markup that the seller can charge because the firms ranked above it in the price distribution produce at higher marginal cost,” with the excess cost of nearer-ranked firms weighted more heavily. Using a phase-diagram (nullcline) analysis, Menzio shows the markup function μ(x) can be strictly increasing, strictly decreasing, or hump-shaped in rank depending on parameters. The heterogeneous-cost equilibrium is again shown to exist, be unique, and be efficient (Proposition 2).

Q6. What is the “anything-goes” theorem, and why does it matter?

Menzio proves (Theorem 3) that any twice-continuously-differentiable markup function μ(x) > 1 can be generated as an equilibrium of the search-theoretic model, given an appropriate contact intensity λ and cost distribution c(x).* Concretely, for any target markup schedule there is a λ and a quantile cost function c(x) (given in closed form) that deliver it as the equilibrium outcome. The consequence is sharp: “the search-theoretic model of market power can rationalize any pattern of markups observed in the data,” so “markup data cannot be used to reject the search-theoretic model.” Combined with the fact that the Dixit–Stiglitz model can reproduce the same markups, both theories are consistent with any markup evidence — which is the crux of the paper’s identification argument.

Q7. Can a Dixit–Stiglitz model reproduce these markups, and at what cost?

Yes — a Dixit–Stiglitz model can always reproduce the search model’s markups, but only with reduced-form buyer preferences that depend on the search model’s deep parameters (λ, u, c, b) and are therefore unstable. Menzio constructs the buyer utility function v(q) (its marginal utility solves a differential equation, eq. 2.24) that makes a Dixit–Stiglitz seller choose the same markups and quantities as in the search model. That reduced-form utility has v’(q) decreasing (so varieties look like imperfect substitutes, rationalizing positive markups) and an elasticity of demand that rises with q (rationalizing markups that fall with size). Critically, “the reduced-form utility function depends on the parameters of the search-theoretic model” and so “is unstable, in the sense that changes in the environment and counterfactual experiments lead to changes in the reduced-form utility function” — meaning any policy or counterfactual exercise that holds these preferences fixed “would not produce valid predictions,” i.e., is subject to the Lucas critique.

Q8. Why would reading these markups through the Dixit–Stiglitz lens give the wrong welfare and policy conclusions?

Because in Dixit–Stiglitz positive and heterogeneous markups signal inefficiency and call for subsidies, whereas the search-theoretic equilibrium that generated those very markups is efficient. Through the Dixit–Stiglitz lens, positive net markups imply “sellers produce an inefficiently small quantity,” and heterogeneous markups imply misallocation across sellers, leading an analyst to “recommend the introduction of consumption subsidies” and “finely-tuned production subsidies that reallocate inputs and consumption from low to high-markup sellers.” “None of these welfare and policy implications are, however, correct, since the equilibrium of the search-theoretic model … is efficient.” The root of the error is the demand curve’s interpretation: the quantity q(p) − q(c) a seller does not sell is, in Dixit–Stiglitz, lost gains from trade (an inefficiency), but in the search model it is “equally valuable trades that the buyers make with other sellers,” and so is not an inefficiency.

Q9. What determines the level and shape of the markup distribution?

For a log-uniform cost distribution (Theorem 4), markups decrease with the extent of competition λ, increase with the buyers’ valuation u, decrease with the highest marginal cost c_h, and increase with the rate κ at which marginal costs decline across sellers; the sign of the markup–size relationship flips at parameter thresholds. Specifically, the markup function is strictly decreasing in rank x (markups rising with size) when competition is weak (λ below a cutoff λ*), constant when λ = λ*, and strictly increasing in x (markups falling with size) when λ > λ*; analogous thresholds u* and κ* govern the slope’s sign as u and κ vary. The intuition: when λ is low, sellers rarely compete for the same buyers and low-cost sellers face little pressure, so markups are high and higher for low-cost (large) sellers; when λ is high, low-cost sellers are pushed toward marginal-cost pricing while high-cost sellers — facing no pressure from above — retain markups near u/c_h. Menzio notes the monotone-level results (markups decreasing in λ and c_h, increasing in u and in κ(x) = c’(x)/c(x)) generalize beyond the log-uniform family to arbitrary cost distributions, while the slope-sign results are stated for the log-uniform case.

Q10. What is the bottom-line claim for macroeconomics?

Markup data alone are insufficient to draw conclusions about the welfare, policy, and counterfactual consequences of market power; identifying those consequences requires evidence on the source of market power — product differentiation versus search/information frictions. The paper frames this as “a cautionary note to the macroeconomic literature that uses the Dixit–Stiglitz framework to model market power and markups” — a literature spanning monetary policy (e.g., Blanchard–Kiyotaki 1985; Christiano, Eichenbaum and Evans 2005; Golosov and Lucas 2007), misallocation and aggregate TFP (Hsieh and Klenow 2009), and the gains from trade (Krugman; Melitz 2003). In Dixit–Stiglitz estimations, markup heterogeneity is “quantitatively important” for the welfare cost of inflation in sticky-price models (Galí 1995), the gains from trade (Dhingra and Morrow 2019), and the cost of market power (Boar and Midrigan 2024); Menzio’s point is that “neither the level nor the dispersion of markups observed in the data are necessarily symptomatic of any inefficiency.”

Q11. Does the paper claim the search-theoretic model is the correct one?

No — the paper explicitly does not argue that the search-theoretic model is closer to the truth than monopolistic competition; it makes the “more modest, but not unimportant” claim that two sensible, well-established models fit the same markup data yet imply very different welfare, policy, and counterfactual conclusions. Menzio notes both theories “are likely to be overly simplified descriptions of the world,” and that the existence of still other models generating the same markups “only strengthens” the point. The constructive takeaway he poses is an empirical identification question: “How much of the downward sloping demand curve facing a seller is due to the heterogeneity in buyer’s outside options and how much is it due to preferences?”

Q12. Does the argument extend beyond product markets?

Yes — the same logic applies to the labor market: in the Burdett–Mortensen (1998) search model one can derive a closed-form formula for equilibrium markdowns that are positive even when employers are perfect substitutes to workers, are heterogeneous even with identical technology, and may be increasing, decreasing, or constant in firm size, with the equilibrium again efficient. Menzio concludes that “the same caution that I recommend using when interpreting markups should be applied to the interpretation of markdown data.”

Q13. What are the scope conditions, and what does the paper not do?

The results are theoretical, derived under unit buyer demand, constant returns to scale, and a Poisson process for the number of sellers each buyer contacts; the closed-form comparative statics of Theorem 4 assume a log-uniform marginal-cost distribution; and the paper offers no empirical calibration or estimation. Menzio notes the efficiency result depends on the model’s assumptions — relaxing unit demand or adding externalities could make the equilibrium inefficient — but argues this does not weaken the core identification point. A companion paper (Menzio 2024b, NBER WP 33253) shows the efficiency of the search-theoretic equilibrium extends to a general-equilibrium setting with endogenous firm entry. The paper’s contribution is an analytical characterization and a cautionary/identification argument, not a quantitative welfare estimate.

Key concepts

Search-theoretic model of imperfect competition : The Butters (1977)/Varian (1980)/Burdett–Judd (1983) framework in which sellers carry identical (perfectly substitutable) goods, and market power arises because buyers contact only a random subset of sellers — so some buyers are “captive” to a single seller. Markups are determined by the distribution of buyers’ choice-set sizes, not by preferences over differentiated varieties.

Dixit–Stiglitz monopolistic competition : Any model in which each seller is the sole producer (monopolist) of its own variety, sets its price, and is too small to affect the aggregate; the size of markups is governed by the substitutability of varieties in buyers’ utility (CES, VES, translog, or Kimball preferences all qualify in the paper’s usage).

Gross / net markup : The gross markup μ is the ratio of a seller’s posted price to its marginal cost (p/c); the net markup is μ − 1.

Captive vs. non-captive buyers : A captive buyer is in contact with only one seller and so cannot shop around (the source of strictly positive markups); a non-captive buyer is in contact with several sellers and buys from the cheapest (the source of price dispersion and the absence of mass points in the price distribution).

λ (extent of competition) : The coefficient of the Poisson distribution governing how many sellers a buyer contacts — equivalently the average number of contacts per buyer; higher λ means more competition and lower markups.

Reduced-form preferences / Lucas critique : The buyer utility function a Dixit–Stiglitz modeller would infer to rationalize the search model’s markups; because it depends on the search model’s deep parameters (λ, u, c, b), it shifts whenever the environment or policy changes, so counterfactuals computed holding it fixed are invalid.

Efficiency (of the equilibrium) : The search-theoretic equilibrium maximizes the sum of buyer and seller payoffs — every contacted buyer buys (since valuation u exceeds cost c) and, with heterogeneous costs, buys from the lowest-cost contacted seller — so the positive, dispersed markups are not symptoms of any inefficiency.

Markdown : The labor-market analogue of a markup — the gap between a worker’s marginal product and the wage — which in the Burdett–Mortensen (1998) search model has the same qualitative properties (positive, heterogeneous, size-dependent, efficient) as product-market markups here.

Mis(sed) Diagnosis: Physician Decision Making and ADHD

Mon, 01 Jan 0001 00:00:00 +0000

This paper develops and estimates a structural model of ADHD diagnosis to decompose the mechanisms driving the observed 2.3:1 male-to-female diagnostic difference in the United States. The research question is: to what extent does the large gender gap in ADHD diagnosis reflect true differences in symptom prevalence, versus patient-side utilization costs, versus physician decision-making under uncertainty? The setting is particularly well-suited to this question because DSM-V diagnostic guidelines for ADHD are explicitly gender-neutral, making any gender difference in physician thresholds a detectable deviation from uniform clinical rules.

The data come from de-identified electronic health records from a large Arizona healthcare system covering January 2014 through September 2017. The sample encompasses 36,193 unique encounters for approximately 11,070 pediatric patients. The raw male-to-female diagnostic ratio in the data is 2.32:1 (7.2% of males vs. 3.1% of females receive a clinical ADHD diagnosis). This gap persists after controlling for demographics, general healthcare utilization, and mental health utilization in reduced-form regressions, motivating the structural approach.

Because two key variables — whether a patient received a behavioral assessment (Qi) and the ADHD match signal observed by the physician (xi) — are not directly recorded in the EHR, the author constructs them from clinical doctor note text. A random forest machine learning classifier trained on labeled appointments predicts behavioral assessment take-up for unlabeled encounters; approximately 20.8% of children are predicted to have received a behavioral assessment (23.2% of males vs. 18.3% of females). The ADHD match signal is constructed via an adjusted Bag-of-Words cosine similarity measure comparing each patient’s aggregated note text to the DSM-V symptom list, rescaled to [0,1]. The average signal is 0.319 overall, with males averaging 0.326 and females 0.311.

The structural model has three stages. First, patients/caregivers decide whether to schedule a behavioral assessment, a function of underlying latent ADHD risk (vi) and mental healthcare utilization costs (ci). Second, conditional on assessment, the physician receives a noisy signal of vi and updates beliefs via Bayesian learning; signal quality ρ governs diagnostic uncertainty. Third, the physician diagnoses ADHD if posterior risk exceeds a gender-specific diagnostic threshold τ. Population mean ADHD risk (μ) is identified using regression-adjusted initial primary care provider referral rates as a quasi-exogenous cost-shifter — patients of high-referral-rate providers select into assessment less selectively, so their observed signals approach population mean risk. This extrapolation approach follows Arnold et al. (2022).

The structural parameter estimates reveal that male and female children have similar but slightly different mean ADHD risk (μm = 0.290 vs. μf = 0.262) and similar mean utilization costs (cm = 0.116 vs. cf = 0.109). The most striking differences are in physician parameters: signal quality is lower for male patients (ρm = 0.479 vs. ρf = 0.552), indicating higher diagnostic uncertainty for boys; and diagnostic thresholds are substantially lower for male patients (τm = 0.257 vs. τf = 0.312), meaning physicians are willing to diagnose ADHD in boys with lower posterior risk.

Counterfactual decomposition simulations attribute approximately 20–25% of the 2.32:1 diagnostic gap to underlying differences in ADHD risk, approximately 20% to differences in selection into behavioral assessments, and the remaining majority — approximately 55–60% — to physician decision-making. Within physician decision-making, differences in diagnostic thresholds alone account for roughly two-thirds of the overall diagnostic gap.

The paper offers economic rationales for why gender-specific thresholds may be consistent with physician rationality despite uniform guidelines: higher diagnostic uncertainty for boys justifies lower thresholds under Bayesian updating; hyperactive/impulsive symptoms predominant in boys impose larger classroom externalities (Aizer, 2008); and female patients show higher rates of internalizing co-morbidities (anxiety, depression) that may reduce the marginal benefit of an additional ADHD diagnosis. A type-specific threshold extension finds that for male patients the threshold for hyperactive/impulsive symptoms is significantly lower than for inattentive symptoms, consistent with salience of externally disruptive behaviors. These rationalizations do not vindicate the gap as fully guideline-consistent, but suggest physicians may be responding to real heterogeneity in external costs and co-morbidity patterns.

Q: What is the main research question and why is ADHD a useful setting? A: The paper asks what mechanisms produce the 2.3:1 male-to-female ADHD diagnostic difference: true symptom prevalence, patient utilization costs, or physician decision-making. ADHD is well-suited because (1) clinical guidelines (DSM-V) are explicitly gender-neutral and require the same symptom count threshold regardless of sex; (2) diagnosis is based on subjective behavioral assessment rather than objective testing, creating substantial physician discretion; and (3) both missed and excess diagnosis carry meaningful costs — missed diagnosis limits educational accommodations; excess diagnosis exposes children to Schedule II controlled substances.

Q: What data does the paper use and what are the key descriptive facts? A: The data are de-identified electronic health records from a large Arizona healthcare system, 2014–2017, covering 36,193 encounters for 11,070 pediatric patients aged 5 and above. Overall ADHD diagnosis rate is 5.2%, with males at 7.2% and females at 3.1%, a 2.32:1 ratio that matches national levels. Approximately 49.5% of the sample is Hispanic, which the author notes contributes to a below-national-average overall diagnosis rate. The gender diagnostic gap persists even after controlling for demographics, general healthcare utilization, and mental health utilization in reduced-form regressions.

Q: How does the paper construct the behavioral assessment indicator (Qi) and the ADHD match signal (xi)? A: Qi is constructed using a random forest classifier trained on doctor notes from appointments where assessment status is known with near-certainty (ADHD diagnosis or DSM-V comorbid diagnosis = positive; non-mental-health diagnosis code for patients with no mental health history = negative). The classifier uses 41 features including note length and top-20 word frequencies for each label class. xi is constructed via an adjusted Bag-of-Words cosine similarity between each patient’s combined behavioral assessment notes and the DSM-V symptom list, separately for inattentive and hyperactive/impulsive sub-types, taking xi = max{xi1, xi2}. The average xi is 0.319 (males 0.326, females 0.311) in the behavioral assessment subsample.

Q: What is the identification strategy for recovering population mean ADHD risk (μ)? A: Because xi is observed only for endogenously selected patients, the observed sample mean overestimates population mean risk. The author uses regression-adjusted referral rates of each patient’s initial primary care provider (IPCP) as a quasi-exogenous cost-shifter satisfying (a) relevance — IPCP referral intensity lowers patient scheduling costs — and (b) independence from patient ADHD risk vi, since IPCPs are typically chosen before behavioral symptoms develop and only 28% of IPCPs in the sample ever diagnose ADHD themselves. Population mean risk is then recovered by extrapolating the relationship between IPCP referral propensity and average observed xi to propensity = 1, following Arnold et al. (2022). The maximum observed IPCP referral propensity is only about 0.75, so the estimate requires extrapolation beyond the observed support.

Q: What are the estimated structural parameters and what do they imply? A: Mean ADHD risk is μm = 0.290 vs. μf = 0.262 — males have modestly higher underlying risk. Mean utilization costs are cm = 0.116 vs. cf = 0.109 — nearly identical across genders. Signal quality (diagnostic certainty) is lower for males: ρm = 0.479 vs. ρf = 0.552, indicating physicians face more diagnostic uncertainty when assessing boys. Most importantly, diagnostic thresholds are lower for males: τm = 0.257 vs. τf = 0.312, meaning physicians diagnose ADHD in boys at a lower required posterior risk level, consistent with viewing missed diagnosis as relatively more costly for male patients.

Q: How much of the 2.32:1 diagnostic gap can be attributed to each mechanism? A: Counterfactual simulations decompose the gap as follows: differences in underlying ADHD risk distribution account for approximately 20–25% of the diagnostic difference; differences in selection into behavioral assessments (utilization costs operating through assessment rates) account for approximately 20%; and physician decision-making differences account for the remaining majority, approximately 55–60%. Within physician factors, differences in diagnostic thresholds (τm < τf) are the single largest contributor, explaining roughly two-thirds of the overall male/female diagnostic gap.

Q: What do the type-specific threshold estimates reveal? A: When the baseline model is extended to allow separate diagnostic thresholds for inattentive vs. hyperactive/impulsive symptom sub-types, male patients show significantly lower thresholds for hyperactive/impulsive symptoms relative to inattentive symptoms (τ^HI_m < τ^Inatt_m). This is consistent with the hypothesis that more externally salient and disruptive symptoms carry larger classroom externalities, which physicians may implicitly factor into diagnosis decisions (following Aizer, 2008). For female patients, the threshold differences across symptom types are smaller and less statistically significant.

Q: What economic rationales does the paper offer for gender-specific diagnostic thresholds despite uniform guidelines? A: Three mechanisms are identified. First, higher diagnostic uncertainty for males (lower ρm) implies that under symmetric costs, Bayesian-rational physicians should set lower thresholds when the signal is noisier — this alone partially rationalizes the threshold gap. Second, hyperactive/impulsive symptoms predominant in boys impose greater externalities on classroom peers (Aizer, 2008), increasing the social benefit of diagnosis for boys on the margin. Third, females show substantially higher rates of co-morbid internalizing conditions (anxiety, depression) whose treatment may mitigate ADHD-related behaviors or whose interaction with stimulant medication makes the marginal ADHD diagnosis less beneficial for girls (Currie et al., 2014). These factors together suggest physicians may be responding to genuine heterogeneity in net diagnosis benefits, even if their behavior deviates from gender-neutral clinical guidelines.

Q: What share of the 2.3:1 national diagnostic gap is consistent with genuine symptom prevalence differences? A: Simulations indicate that only about 20–25% of the 2.32:1 male/female diagnostic difference can be explained by the underlying difference in ADHD risk distributions. The majority — roughly 75–80% — reflects factors beyond true prevalence: selection into care and, most substantially, physician decision-making differences including both signal quality and diagnostic thresholds.

Q: What are the policy implications? A: The findings suggest that targeted interventions in physician awareness and clinical training are likely more effective than generic awareness campaigns, since the dominant driver of the diagnostic gap is physician threshold-setting rather than symptom prevalence. Structured decision support tools or updated training that make physicians aware of gender-specific diagnostic patterns could reduce medically unwarranted diagnostic differences. Policies targeting patient-side access barriers (the ~20% explained by selection) remain relevant but secondary. The roughly 20–25% of the gap attributable to genuine symptom prevalence differences is, by construction, guideline-consistent and should not be targeted for elimination.

Q: What are the methodological contributions? A: The paper makes three methodological contributions. First, it develops a structural model of mental health diagnosis that explicitly incorporates endogenous patient selection — a feature absent from standard physician decision-making models — which is shown empirically important. Second, it applies machine learning and NLP to clinical doctor note text to construct key unobserved clinical variables (behavioral assessment indicator and ADHD match signal) that are unavailable as structured data in EHRs. Third, the identification of population mean health risk uses a quasi-exogenous variation approach (IPCP referral rates) analogous to Arnold et al. (2022)’s method for measuring racial discrimination in bail decisions, adapted here to a continuous health risk setting with endogenous selection.

Diagnostic threshold (τ_θ): The gender-specific posterior ADHD risk level above which a physician chooses to diagnose ADHD. Set ex-ante, it reflects the physician’s perceived tradeoff between the costs of over-diagnosis (misdiagnosis) and under-diagnosis (missed diagnosis). A lower threshold implies the physician views missed diagnosis as relatively more costly for that patient group. By construction, uniform clinical guidelines imply a single threshold independent of patient gender.

ADHD match signal (x_i): A physician-observed, noisy signal of a patient’s true latent ADHD risk (v_i), observed only conditional on the patient receiving a behavioral assessment. In estimation, it is proxied via a cosine similarity measure between the patient’s aggregated clinical doctor note text and the DSM-V symptom list, constructed separately for inattentive and hyperactive/impulsive sub-types.

Signal quality / diagnostic uncertainty (ρ_θ): The correlation between the physician’s observed ADHD match signal and the patient’s true ADHD risk. Higher ρ means the physician’s signal is more informative and diagnostic uncertainty is lower. In the Bayesian updating framework, higher ρ implies the physician places more weight on the observed signal relative to the prior.

Mental healthcare utilization cost (c_i): The composite of all patient/caregiver factors that affect the decision to schedule a behavioral assessment net of child symptom level. Includes non-monetary barriers such as time constraints, distance, stigma, and information from primary care providers during wellness visits; does not include monetary out-of-pocket costs since insurance typically covers behavioral assessments.

Initial Primary Care Provider (IPCP) referral rate: The regression-adjusted share of a given PCP’s patients who ultimately receive a behavioral assessment at some point in the sample. Used as a quasi-exogenous cost-shifter that influences patient scheduling costs without being correlated with patient ADHD risk, enabling identification of population mean ADHD risk via extrapolation.

Latent ADHD risk (v_i): An unobserved continuous measure of a child’s underlying ADHD-related behavioral symptoms, drawn from a gender-specific normal distribution N(μ_θ, σ²_θ). A child’s true ADHD status is Si = 1(v_i > v̄), where v̄ is the DSM-V minimum symptom threshold, defined identically for boys and girls.

Adjusted Bag-of-Words (BOW) cosine similarity: The NLP method used to construct the ADHD match signal proxy. Patient notes are tokenized into uni-grams and bi-grams after preprocessing (spell check, abbreviation replacement, part-of-speech tagging, synonym replacement), and tf-idf weighted. The cosine similarity between the resulting document vector and the DSM-V symptom text vector is computed separately for each ADHD sub-type and rescaled to [0,1].

Misspecified Expectations among Professional Forecasters

Mon, 01 Jan 0001 00:00:00 +0000

Analyzing panel data from the U.S. Survey of Professional Forecasters (SPF, 1992Q1–2019Q4, 77 forecasters, 1,520 forecaster-quarter observations), Julio Ortiz finds that a “misspecified expectations” model — in which forecasters perceive an AR(2) data-generating process to be an AR(1), causing them to misperceive its underlying persistence — tends to outperform a noisy-information rational benchmark and two leading non-FIRE alternatives (overconfident and diagnostic expectations) when fit to forecast errors and revisions. The models are estimated by maximum likelihood and ranked using forecast-encompassing weights; for the baseline real GDP growth case, misspecified expectations earns the largest encompassing weight (0.539 vs. 0.462 for diagnostic, ~0 for rational and overconfident) and the highest log-likelihood. Across 14 macroeconomic variables, misspecified expectations provides the best fit for most series both in-sample and out-of-sample, though diagnostic expectations fits better for some (e.g., GDP deflator, industrial production, real residential investment) and rational expectations fits the unemployment rate best. The author argues misspecified expectations succeeds in part because its bias enters both the prediction and updating equations, producing overreaction to new information plus overextrapolation across horizons, which makes forecast errors longer-lived; he concludes it can serve as a “suitable approach” / useful benchmark to model professional-forecaster expectation formation, while emphasizing the results are specific to the context of professional forecasting and may not carry over to household or firm expectations.

In depth

Q1. What question does the paper address?

The paper undertakes a formal comparison of competing non-FIRE theories of expectation formation to move toward establishing a benchmark non-FIRE model in the context of professional forecasting. Ortiz motivates this with the observation that survey forecast errors are predictably correlated with real-time information — a violation of full-information rational expectations (FIRE) — but that, as noted in Reis (2020), the literature “has not yet settled on a benchmark non-FIRE model.” The paper offers “a partial answer to this question.”

Q2. What models are compared?

Four models are estimated: a noisy-information rational expectations baseline plus three biased non-FIRE models — overconfident expectations (Daniel et al., 1998), diagnostic expectations (Bordalo et al., 2020), and misspecified expectations (in the spirit of Fuster et al., 2010). All are embedded in a common noisy-information environment where the latent variable is unobservable and forecasters update via a Kalman filter from a noisy private signal. Overconfidence has forecasters misperceive their signal noise as smaller than it is; diagnostic expectations introduces a representativeness distortion ϕ > 0 generating overreaction to recent news; misspecified expectations has forecasters treat an AR(2) process as an AR(1).

Q3. What exactly is “misspecified expectations” in this paper?

Misspecified expectations is a model in which the underlying state follows an AR(2) process but forecasters treat it as an AR(1), so they misperceive the true persistence of the data-generating process. The author notes this version is “closest to natural expectations as modeled in Fuster et al. (2010),” with forecasters neglecting longer lags. Importantly, forecasters still understand the information structure. If the perceived persistence loads excessively onto the first lag, forecasters overextrapolate. The author flags three technical differences from Fuster et al. (2010): he does not model an AR(2) in levels with AR(1)-in-growth-rates forecasting; the perceived persistence is estimated from the data rather than defined as a function of the true autocorrelation parameters; and he does not define expectations as a weighted average of rational and naive AR(1) expectations.

Q4. What data and sample are used?

The estimation uses U.S. SPF panel data from 1992Q1 to 2019Q4, yielding 77 unique forecasters and 1,520 forecaster-quarter observations for the baseline. The 1992 start is chosen to avoid spanning different regimes and because the survey redefined output from GNP to GDP in 1992. The procedure requires unbroken observation sequences, so only each forecaster’s longest spell is kept, with a minimum spell length of eight quarters (because entry/exit may be non-random, per Engelberg et al., 2011). Real GDP growth is the baseline variable; 13 other macroeconomic variables are also estimated. Real-time forecast errors (not errors based on revised figures) are used, following the literature.

Q5. How are the models estimated and compared?

The models are estimated via a three-step maximum likelihood procedure, and their relative fit is compared using forecast-encompassing weights (West, 2001; Harvey et al., 1998; West, 2006), supplemented by AIC and a Vuong (1989) non-nested likelihood-ratio test. Step 1 estimates the fundamental process parameters (ρ₁, ρ₂, σ_w) from the macro time series and fixes them across models; step 2 estimates the signal-noise dispersion σ_v from the rational model and calibrates it across the other three; step 3 estimates each bias parameter (α_v, ϕ, ρ̂) by MLE on SPF data. This keeps fundamental and information parameters consistent across biased models so they are evaluated solely on the biases they generate, and makes identification transparent (notably, σ_v and α_v cannot be jointly identified in the overconfidence model). Encompassing weights are obtained from a constrained linear regression of realizations on model-based one-quarter-ahead forecasts, with weights summing to 1.

Q6. What are the baseline real GDP growth results?

For real GDP growth, the misspecified expectations model produces the highest log-likelihood and the largest encompassing weight, 0.539, versus 0.462 for diagnostic expectations and approximately 0.000 for both rational and overconfident expectations. The fundamental process estimates imply relatively low persistence (first-order autocorrelation ρ₁ ≈ 0.434, second-order ρ₂ ≈ −0.006). The estimated bias parameters are: overconfidence ≈ 0.72, diagnosticity ≈ 0.23, and perceived persistence ρ̂ ≈ 0.564. Because ρ̂ ≈ 0.56 exceeds the estimated ρ₁ ≈ 0.43, the misspecified model implies forecasters overestimate the first-order autocorrelation and neglect the partial reversal in the second lag, generating overreactions. The signal-to-noise ratio implied by the estimated private noise dispersion is σ_w/σ_v ≈ 1.09. AIC rankings (and BIC) do not change the ordering relative to the maximized likelihoods.

Q7. Does the result hold across other macroeconomic variables?

Across the 14 SPF macroeconomic variables, misspecified expectations provides the best in-sample fit for most series, but not all. Diagnostic expectations registers larger encompassing weights for certain series — the GDP deflator (0.771), industrial production (1.000), and real residential investment (0.624). Rational expectations provides the best fit for the unemployment rate (0.745) and housing starts (in-sample). For the bulk of the remaining variables (e.g., CPI 0.859, payroll employment 1.000, real consumption 0.777, real federal spending 1.000, real GDP 0.539, real nonresidential investment 1.000, real state/local spending 1.000, 3-month Treasury bill 0.713, 10-year bond 0.746), misspecified expectations carries the largest weight. Overconfident expectations “does not yield particularly large encompassing weights for any variable.”

Q8. Why does misspecified expectations fit better, and for which variables especially?

The author finds that, among variables exhibiting overreactions, misspecified expectations tends to offer a better fit for less persistent series, because the scope for it to generate overreaction (ρ̂ − ρ₁) is greater when ρ₁ is low. Unlike the alternatives, the persistence bias ρ̂ − ρ₁ can be positive or negative, allowing the model to account for both overreacting and underreacting variables; the alternative models cannot generate forecaster-level underreaction. Figure 2 plots the encompassing weight on misspecified expectations against the sum of autoregressive coefficients and suggests (with some exceptions) that less persistent variables have higher weight on misspecified expectations.

Q9. Does the model perform out of sample?

The misspecified expectations model also provides a better out-of-sample fit for more of the variables, estimated on 1992Q1–2005Q4 and evaluated on the latter half of the sample. However, out of sample diagnostic expectations now outperforms for the GDP deflator (0.987), industrial production (0.959), payroll employment (0.813), and real federal government expenditures (0.591); overconfident expectations outperforms for the 10-year government bond (0.653); and rational expectations outperforms for housing starts (0.502) and the unemployment rate (1.000). The author cautions that these results do not imply forecasters could improve their forecasts in real time, because the MLE observations include contemporaneous individual and consensus forecast errors that are not known to forecasters when they issue forecasts; for the same reason, the results are “not inconsistent with” Eva and Winkler (2023) on the poor out-of-sample performance of error-predictability regressions.

Q10. Could the apparent advantage of misspecified expectations just reflect learning?

The author argues that learning about the data-generating process does not appear to drive the relative model rankings in favor of misspecified expectations, based on two exercises. First, using the full pre-COVID sample (1968Q4–2019Q4) over 25-year rolling windows (three-year roll), the misspecified model outperforms diagnostic expectations in six of ten sub-samples and all models in five of ten, while diagnostic expectations wins four of ten — patterns that “do not indicate that learning over time favors misspecified expectations.” Second, splitting forecasters by “age”/tenure (a proxy for experience), misspecified expectations outperforms the others among experienced (above-median age) forecasters (encompassing weight 0.766, with overconfidence 0.234) and is dominant among inexperienced ones (1.000). The author concedes learning “is likely reflected in professional forecasts” but does not appear to drive the rankings.

Q11. What additional moments does misspecified expectations match?

Beyond overall fit, the author shows in the appendix that misspecified expectations matches five features of the data — overreaction, underreaction, overshooting, persistent disagreement, and updating behavior — and is the only model generating delayed overshooting. All three non-rational models generate individual-level overreaction (Bordalo et al., 2020 errors-on-revisions regression) and aggregate underreaction (Coibion-Gorodnichenko, 2015 consensus regression). But when simulating impulse responses, “only the misspecified expectations model generates a sign switch in the forecast error,” indicating delayed overshooting (Angeletos et al., 2020). The author reports “stronger evidence” favoring misspecified expectations on two further moments: it better generates persistent disagreement across horizons, and it better matches the relative weights forecasters place on priors versus news — because its bias also enters the prediction equation (not just the update equation), producing longer-lived errors.

Q12. What are the scope conditions and limitations the author stresses?

The author emphasizes that the results are specific to the context of professional forecasting and that the relative model rankings “may be different” for household or firm expectations, or for micro-level expectations rather than aggregate forecasts. He notes professional forecasters are arguably the most well-informed agents, so the literature has treated their predictions as informative about a lower bound on economy-wide information frictions and biases. The paper abstracts away from learning in the model setup and from theories that generate only underreaction. Models excluded from the comparison (e.g., imperfect memory, multi-frequency forecasting, asymmetric attention, learning) are set aside mainly because they cannot be flexibly nested into the common setting and would introduce additional parameters posing identification challenges.

Ortiz concludes that misspecified expectations “can serve as a suitable approach” / useful benchmark to model expectation formation among professional forecasters for a variety of macroeconomic aggregates, while framing this as only “a partial answer” to the search for a non-FIRE benchmark. He highlights a practical advantage: embedding this form of misspecified expectations into a quantitative model “only requires introducing two parameters into an otherwise standard model.” He also notes misspecification can arise either from a behavioral bias or because adopting parsimonious forecasting models is optimal (Branch and Evans, 2006; Pfajfar, 2013). A promising avenue for future research is whether evidence favors misspecified expectations in other settings.

Key concepts

Full-information rational expectations (FIRE): The benchmark in which forecast errors are uncorrelated with any information in the forecaster’s time-t information set; the orthogonality conditions it implies “tend to be violated in the data,” motivating non-FIRE models.
Misspecified expectations: The paper’s focal bias — the true state follows an AR(2) process, xₜ = ρ₁xₜ₋₁ + ρ₂xₜ₋₂ + wₜ, but forecasters treat it as an AR(1), xₜ = ρ̂xₜ₋₁ + uₜ, misperceiving its persistence; forecasters retain the correct information structure. The bias enters both the predict and update equations.
Persistence bias (ρ̂ − ρ₁): The gap between perceived AR(1) persistence and true first-order autocorrelation; positive values generate overextrapolation/overreaction, negative values generate underreaction, and its overreaction scope is larger when ρ₁ is low.
Overconfident expectations: Forecasters misperceive their private signal noise as smaller (σ̃_v = α_v σ_v, α_v ∈ [0,1]) than it truly is, placing excessive weight on new private information.
Diagnostic expectations: A representativeness-based distortion (Bordalo et al., 2020; Gennaioli-Shleifer, 2010) in which, with diagnosticity ϕ > 0, forecasters overweight outcomes representative relative to a “no news” reference scenario, generating overreaction to recent news.
Encompassing weight: The model-comparison metric — a weight wₖ from a constrained linear regression of realized one-quarter-ahead values on competing models’ forecasts, with weights summing to one; a larger weight indicates a better-fitting model.
Delayed overshooting: The Angeletos et al. (2020) pattern of initial underreaction followed by later overreaction to a shock; in this paper, only misspecified expectations produces the sign switch in the forecast-error impulse response that signals it.
Overreaction vs. underreaction: Individual-level overreaction is measured via the Bordalo et al. (2020) errors-on-revisions regression; aggregate/consensus-level underreaction via the Coibion-Gorodnichenko (2015) regression — the data exhibit both, and a successful non-FIRE model must reproduce both.

Mixing It Up: Inflation at Risk

Mon, 01 Jan 0001 00:00:00 +0000

This paper introduces a Bayesian Gaussian mixture density regression framework that estimates the complete forecast distribution of inflation — not just selected quantiles — and decomposes the entire risk outlook into contributions from individual economic predictors. The methodology accommodates multimodality, skewness, and fat tails without parametric restrictions, and allows construction of risk measures calibrated to the central bank’s own loss function rather than generic percentile-based measures. Applied to the recent U.S. inflation surge, the framework finds that post-pandemic inflation risk was primarily driven by the recovery of the U.S. business cycle and surging commodity prices, while adjustments in monetary policy contributed negatively — partially mitigating the increase in right-tail inflation risk — and credit spreads also offset some risk. The Gaussian mixture structure enables fast MCMC estimation and produces well-calibrated density forecasts across a range of macroeconomic variables.

In depth

Q1. What is the key methodological contribution relative to existing inflation-at-risk approaches?

Existing approaches to macroeconomic at-risk measures focus on specific quantiles of the forecast distribution — typically the 5th or 25th percentile — discarding information contained in the rest of the distribution; this paper redirects attention to the full forecast distribution while retaining the nonparametric flexibility of quantile regression. The Gaussian mixture density regression estimates a conditional distribution that is a weighted mixture of Gaussians, capturing multimodality, asymmetry, and fat tails simultaneously. The key innovation is decomposability: each predictor’s contribution to any region of the forecast distribution can be quantified, enabling a driver-level accounting of what generates tail risk in any given period.

Q2. What does the U.S. application reveal about the inflation surge?

The framework attributes the increase in right-tail U.S. inflation risk during 2021–2023 primarily to surging commodity prices and the recovery of the domestic business cycle, while monetary policy tightening contributed negatively — its effect partially offset the upward pressure from commodity and cycle drivers. Credit spreads also partially mitigated the risk. The decomposition implies that the dominant drivers of inflation risk were supply-side and aggregate-demand factors, and that monetary policy, when it tightened, reduced the right-tail risk as intended — providing quantitative support for the interpretation that policy was reactive but directionally correct.

Q3. How does the framework construct policy-relevant risk measures?

The framework allows weighting probability mass over the forecast distribution by any user-specified loss function, including asymmetric central bank preferences, yielding risk measures that integrate the full distributional information in proportion to the policymaker’s actual valuation of different inflation outcomes. A central bank that penalizes above-target inflation more heavily than below-target inflation (consistent with empirical evidence on CB loss functions) would weight the upper tail more, producing a risk statistic that is higher than a symmetric measure for the same distribution. This policy-preference-aligned risk measure could have provided a more accurate signal of the urgency of the 2021–2023 inflation risk than standard percentile measures.

Key concepts

inflation at risk : the quantile-based or distribution-based characterization of future inflation uncertainty; extended in this paper from a single quantile to the complete forecast distribution and its risk decomposition by driver.

density regression : a regression model in which the conditional distribution of the outcome — not just its mean or a specific quantile — is the object of estimation; the paper uses a Gaussian mixture density regression to capture non-standard distributional shapes.

risk decomposition : the attribution of shifts in the full forecast distribution to individual predictor variables; the paper’s key tool for identifying which economic factors drive right-tail inflation risk in any period.

CB-preference-aligned risk measure : a summary statistic constructed by weighting probability mass over the forecast distribution by the central bank’s loss function; captures asymmetric preferences and goes beyond standard percentile measures.

Monetary and Macroprudential Policy and Welfare in an Estimated Four‐Agent New Keynesian Model

Mon, 01 Jan 0001 00:00:00 +0000

This paper introduces a four-agent estimated New Keynesian DSGE model—comprising banked simple households, underbanked simple households, firm owners, and bank owners—to examine agent-specific and social welfare effects of monetary and macroprudential policy, estimated on U.S. quarterly data (1985Q1–2016Q4) via Bayesian methods. The model features two layers of endogenous default probability (for borrowers and banks), nominal, real, and financial frictions, and trend inflation and stochastic growth. The optimal bank capital requirement ratio (CRR) is estimated at 12.6%, which is 2.1% above Basel III’s 10.5%; increasing CRR up to approximately 12.2% raises welfare for all four agent types, though with smaller gains for credit-reliant simple households and firm owners. Countercyclical capital buffers benefit firm owners and bank owners with smaller gains for simple households. Coordinated monetary and macroprudential policy yields higher social welfare than non-coordinated policies.

In depth

Q1. Why does the paper use four agent types instead of the usual borrower-saver distinction?

The standard borrower-saver split lumps together all interest-earning agents—including both simple deposit-holding households and wealthy bank owners—so that macroprudential policies that shift surplus from borrowers to savers appear to benefit the simple household and the banker equally; the four-agent framework separates these groups and allows for heterogeneous welfare effects. Population shares are calibrated using Compustat and the Survey of Consumer Finances (firm owners and bank owners as shareholders of non-financial and financial firms) and the National Survey of Unbanked and Underbanked Households (underbanked simple households with very limited access to banking services).

Q2. What is the optimal CRR and how does it compare to existing benchmarks?

The optimal social CRR is estimated at 12.6%, which is 2.1% higher than Basel III’s 10.5%, 4.6% higher than Basel II’s 8%, and 3.6% higher than the 9% optimal CRR of Mendicino et al. (2019) who use a borrower-saver welfare framework. Increasing the CRR up to approximately 12.2% improves welfare for all four agent types, though unequally: simple households and firm owners who rely on credit see smaller gains. Above 12.2%, stricter CRR harms firm owners and simple households (tighter credit reduces activity), while bank owners continue to gain via higher capital income share until the CRR exceeds 25.9%, above which even bank owners are harmed as loans fall dramatically.

Q3. How do countercyclical capital buffers and loan loss provisions affect welfare by agent type?

Countercyclical capital buffers support firm owners and bank owners with smaller gains for the two simple household types; countercyclical loan loss provisions improve social welfare only for specific shocks and benefit underbanked simple households and firm owners at the expense of bank owners and banked simple households. The asymmetry reflects the different income streams: bank owners’ income derives primarily from loan returns and capital gains on bank equity, while underbanked simple households are most sensitive to credit availability. Loan loss provisions affect the timing of income recognition and loss absorption, generating distributional trade-offs that differ from those of capital requirements.

Q4. What are the gains from coordinating monetary and macroprudential policy?

Coordinating monetary and macroprudential policy yields higher social welfare than assigning each policy to an independent authority targeting its own objective, demonstrating that the interaction between interest rate policy and bank capital regulation matters for welfare outcomes. Investment shocks (27.41% of GDP growth variance) and financial risk shocks (~20%) are quantitatively important in this interaction. The model’s rich friction structure means that optimal monetary policy must account for how macroprudential policy changes the credit supply environment, and vice versa; failing to coordinate creates inefficiencies that coordinated policy avoids.

Key concepts

four-agent model : the model’s typology distinguishing banked simple households, underbanked simple households, firm owners, and bank owners; enables agent-specific welfare analysis of macroprudential policy with heterogeneous income streams and credit access. optimal capital requirement ratio (CRR) : the bank capital-to-assets ratio that maximizes social welfare; estimated at 12.6% in this model; 2.1% above Basel III’s current 10.5% requirement. countercyclical capital buffer (CCyB) : a macroprudential tool requiring banks to hold additional capital during economic expansions to be released in downturns; shown here to benefit firm owners and bank owners with smaller gains for simple households. dynamic loan loss provisions : a macroprudential tool requiring banks to build provisions against future expected losses during expansions; shown here to have welfare effects that depend on the source of the shock and to benefit different agent types than capital requirements.

Monetary Policy and Endogenous Financial Crises

Mon, 01 Jan 0001 00:00:00 +0000

This paper asks whether a central bank should deviate from strict inflation targeting (SIT) to promote financial stability, studying the question in a textbook New Keynesian model augmented with capital accumulation and microfounded endogenous credit-market crises. The model embeds two financial frictions — limited contract enforcement and asymmetric information about firm productivity — that together generate fragile credit markets in which, when productive firms’ marginal return on capital falls below a threshold, the credit market collapses (a “financial crisis”). The calibrated model matches the empirical regularity that economies spend roughly 8% of time in financial crises. The central finding is threefold: (1) monetary policy affects crisis probability both in the short run (via output and markups) and in the medium run (via capital accumulation dynamics); (2) a Taylor-type rule that responds to output fluctuations — rather than SIT — reduces crisis incidence and raises welfare, with TR93 (φ_y = 0.125) generating a 0.016% permanent consumption equivalent gain over SIT; (3) prolonged unexpected monetary easing followed by abrupt tightening is itself a mechanism that can trigger financial crises. These findings imply a genuine price-versus-financial-stability tradeoff and challenge the “divine coincidence” view that SIT is sufficient in the presence of financial frictions.

Layer 1: Overview

Boissay, Collard, Galí, and Manea build a New Keynesian model with capital accumulation and endogenous credit-market crises to study whether central banks should deviate from inflation targeting to promote financial stability. The model departs from the textbook three-equation NK framework in four ways: capital accumulation that allows persistent booms, firm heterogeneity in productivity that generates a credit market, financial frictions (limited enforcement and asymmetric information) that make the credit market fragile, and global (nonlinear) solution methods that can capture the boom-bust dynamics. A financial crisis — credit-market collapse — occurs when productive firms’ marginal return on capital falls below the minimum loan rate that unproductive firms require to willingly lend. The model is calibrated so that the economy spends 8% of time in crisis (consistent with cross-country evidence from Reinhart and Rogoff, Laeven and Valencia, and Baron et al.) and the additional parameter governing financial frictions (the proportion μ = 2.42% of unproductive firms) is chosen to match this target. Three main findings emerge: monetary policy operates through short-run aggregate demand channels and a medium-run capital accumulation channel; a Taylor-type rule that responds to output improves welfare over SIT, with TR93 raising permanent consumption by 0.016% relative to SIT; and discretionary loosening followed by abrupt tightening can itself generate crises.

In depth

Q1. How do financial crises arise in the model, and what is the triggering condition?

A financial crisis in the model is a credit-market breakdown in which the credit market collapses to autarky: unproductive firms stop lending because the loan rate they can credibly demand falls below the return on holding idle capital. The friction generating this fragility is a combination of limited contract enforcement (firms that borrow to purchase capital can abscond with sale proceeds) and asymmetric information about idiosyncratic productivity. Together, these frictions imply that productive firms cannot borrow beyond an incentive-compatible leverage cap, and that the minimum loan rate required to induce unproductive firms to lend is a positive threshold $\bar{r}^k = \mu/(1-\mu) - \delta$. A crisis occurs if and only if productive firms’ marginal return on capital $r_t^k$ falls below this threshold — which happens at the end of a protracted boom when the economy has accumulated excess capital, driving down marginal productivity. The average simulated crisis is triggered by a roughly three-standard-deviation negative TFP shock (around 1.5% below steady state) hitting an economy where the capital stock has been elevated by a long sequence of positive shocks. The same shock would not trigger a crisis at lower capital stocks — the capital overhang is a necessary precondition.

Q2. Through what channels does monetary policy affect financial stability, and how do short-run and medium-run channels differ?

The paper identifies three channels: a Y-channel (output), an M-channel (markups), and a K-channel (capital accumulation), with the K-channel operating only in the medium run through expectations about the policy rule. In the short run, a rate hike that compresses output and raises markups reduces the marginal return on capital, pushing the economy closer to a crisis — a destabilizing short-run effect. In the medium run, however, a commitment to lean against output booms (high φ_y) slows capital accumulation during expansions through two mechanisms: (i) it reduces investors’ expected returns from expansion, dampening incentives to accumulate capital; and (ii) it provides households with implicit insurance against aggregate shocks, reducing precautionary savings. Because capital accumulation is slow, these medium-run effects only materialize over multiple years and require that the central bank pre-commit to the rule. Expectations of the rule thus shape the boom dynamics before any crisis.

Q3. What does the welfare comparison across Taylor rules reveal about the price-versus-financial-stability tradeoff?

Responding to output raises welfare in the presence of financial frictions, even though it reduces welfare in the frictionless benchmark, generating a genuine price-versus-financial-stability tradeoff. Under strict inflation targeting, the welfare loss relative to the first best is 0.11% in consumption equivalent variation, entirely attributable to financial crises (since SIT eliminates price distortions). Responding more aggressively to output (higher φ_y) reduces crisis incidence from 9.85% of time (under SIT) to as low as 0.45% (under φ_y = 0.75), but raises inflation volatility. The welfare gain is non-monotone in φ_y: under the baseline φ_π = 1.5, welfare is highest around φ_y ≈ 0.5–0.6, and declines for higher φ_y as markup volatility (M-channel) more than offsets the financial stability gain. TR93 (φ_y = 0.125) already delivers 0.016% higher permanent consumption than SIT.

Q4. What is the role of monetary policy discretion in generating financial crises?

The model shows that sustained discretionary loosening followed by abrupt tightening can itself trigger a crisis, formalizing the “rates too low for too long” narrative of the 2007-08 Global Financial Crisis. Using only monetary policy shocks (either AR(1) with ρ = 0.5, σ = 0.25% or i.i.d.) as the source of aggregate uncertainty, the average simulated crisis follows a long period of unexpectedly accommodative policy that feeds an investment boom, with the crisis triggered by three consecutive unexpected rate hikes (persistent shock case) or a single 60-basis-point jolt (i.i.d. case) at the end of the boom. This is consistent with empirical evidence (Schularick, Ter Steege, and Ward 2021) that unanticipated rate hikes at the end of a boom are more likely to trigger crises than prevent them.

Q5. How much additional welfare gain is available from a “backstop” commitment that forestalls crises entirely?

A nonlinear backstop rule — under which the central bank deviates from its normal rule just enough to prevent a crisis whenever one would otherwise occur — nearly eliminates the welfare cost of financial crises, requiring only modest policy deviations. Under SIT, the backstop improves welfare by 0.11% in consumption equivalent variation — the full cost of crises — leaving a residual welfare loss of only 0.0013% relative to the first best. The backstop requires rate cuts of on average 20 basis points below TR93, or tolerance of 0.6 percentage points of extra inflation above the SIT target, in the periods when a crisis would otherwise emerge. The tradeoff is that backstopping increases the frequency with which the central bank must intervene, since knowing that the bank will intervene can increase the financial sector’s risk-taking (fragility).

Q6. How does the paper’s approach to microfounding crises compare to reduced-form alternatives?

Unlike Woodford (2012) and Gourio, Kashyap, and Sim (2018), who use reduced-form functions linking credit or leverage gaps to crisis probability, this paper derives crisis probability and severity endogenously from first principles, with implications for the policy prescriptions. Because crises and their depth are both endogenous to policy, the model can determine not only how policy affects the probability of a crisis but also how it affects the size of the output loss conditional on a crisis. This distinction matters: the model shows that not all credit booms are equally dangerous — a boom accompanied by genuine productivity gains carries lower crisis risk than an equivalent capital accumulation driven by precautionary saving externalities. The endogenous crisis mechanism also implies that some forms of leaning that superficially appear to reduce crisis probability may actually increase it by raising markup volatility, an effect absent from reduced-form models.

Key Concepts

divine coincidence : the standard New Keynesian result that strict inflation targeting (SIT) simultaneously eliminates output gap fluctuations and is welfare-optimal in the absence of financial frictions; the paper shows this coincidence breaks down when the credit market is fragile, because SIT does not internalize the externalities driving capital overhang and crisis risk.

financial crisis (in the model) : the autarkic equilibrium of the credit market, in which productive firms’ marginal return on capital falls below the minimum loan rate required for unproductive firms to willingly lend; characterized by credit-market collapse, capital misallocation (unproductive firms retain idle capital), severe output loss, and inflationary pressure.

K-channel of monetary policy on financial stability : the medium-run mechanism by which a commitment to respond strongly to output fluctuations dampens capital accumulation during booms, reducing the likelihood of the excess capital overhang that triggers crises; operates through expectations and requires multi-year lead times, distinguishing it from the short-run output (Y) and markup (M) channels.

savings glut externality : the tendency of households to over-accumulate capital relative to the socially efficient level in anticipation of a crisis, because individual households do not internalize the aggregate effect of their precautionary saving on the economy’s distance from the credit-market collapse threshold; identified by Boissay, Collard, and Smets (2016) and present in this model as a driver of endogenous boom-bust dynamics.

backstop rule : a nonlinear monetary policy rule in which the central bank follows a standard Taylor or SIT rule in normal times but commits to deviating just enough from that rule to forestall a financial crisis whenever one would otherwise emerge; shown to nearly eliminate the welfare cost of crises at the cost of modest and infrequent policy deviations, with the side effect of increasing the frequency of needed interventions.