Published | Macro Paper Warehouse

Adverse Selection and Small Business Finances

Thu, 01 Jan 2026 00:00:00 +0000

Layer 1: Overview

This paper asks why small firms hold large quantities of liquid assets — cash and cash equivalents that earn low or negative real returns — even when external credit is available. The conventional answer is a precautionary motive: liquidity buffers the risk of being shut out of credit markets. Liang proposes a second, complementary motive: a signaling motive, whereby firms hold liquid assets specifically to pledge as collateral and credibly signal their repayment ability to lenders, thereby obtaining better loan terms. The empirical backdrop is striking: about 28% of small business assets are cash and cash equivalents (Kauffman Firm Survey 2011 wave); about 7% of commercial business loans are secured by liquid collateral (SSBF 2003); and 43% of small firms sought a commercial business loan in 2020.

The theoretical framework embeds directed search (Guerrieri, Shimer, and Wright 2010, hereafter GSW) and asymmetric information inside a Lagos-Wright general equilibrium monetary model. There are two types of entrepreneurs — low types (success probability δ_L) and high types (δ_H > δ_L) — who privately know their own type. Bankers post loan contracts specifying a down payment d, loan amount ℓ, and repayment R, and then entrepreneurs direct their search to contracts. Investment opportunities arrive stochastically. Entrepreneurs who fail to match with a banker self-finance from their liquid holdings; this endogenous outside option gives liquidity value and generates a precautionary demand for it. The opportunity cost of holding liquidity equals the policy rate i (equivalently, the inflation rate π).

The main equilibrium characterization (Proposition 2) shows that as the policy rate rises, the economy passes through four regimes: (1) no participation in the credit market; (2) only high types borrow, no screening needed; (3) both types borrow, bankers screen using down payment only; (4) both types borrow, bankers screen using both down payment and loan approval rate (market tightness). The key distortion is in the extensive margin: under adverse selection with binding incentive constraints, high-type borrowers must pledge more liquid assets (dH = zH > z*_H) and face a tighter loan market (θ_H < θ*_H) than under complete information, but the loan size is undistorted (ℓ_H = ℓ*_H, Proposition 3). Low-type borrowers’ allocations are never distorted by adverse selection.

The interest rate pass-through from the policy rate to the real lending rate on high-type loans can be negative (Proposition, Section 4 and Figure 5). With an urn-ball matching function, γ_H (the real lending rate for high types) falls in i when screening is active, even as the aggregate lending rate rises monotonically. With a Cobb-Douglas matching function, lending rates always increase in i. Whether negative pass-through obtains therefore depends on the matching technology.

Screening intensity — the degree to which high-type borrowers must hold excess liquidity and accept lower loan approval odds — is non-monotone in the low types’ success probability δ_L (Proposition 4). When δ_L is very small or very close to δ_H, a small down payment suffices. Distortions are largest for intermediate values of δ_L, where the low types have large incentives to misreport but the cost of mimicry is neither trivially high nor trivially low.

Without the self-finance channel — the endogenous outside option — both the precautionary and signaling motives vanish entirely, and liquid assets become redundant (Proposition 5). Bankers then use only market tightness to screen, which is less costly than using both down payment and approval rate. This result cleanly isolates why self-finance is the structural ingredient making liquidity essential.

On policy, the competitive equilibrium is generically constrained inefficient when both screening tools are used, because bankers in one submarket do not internalize the externality they impose on the other submarket through the binding incentive constraint. A utilitarian social planner who faces the same information and search frictions can restore the complete information allocation by taxing high types and subsidizing low types, under a sufficient condition (Proposition 6): the high types’ surplus from borrowing relative to self-finance exceeds the low types’ net gain from misreporting, scaled by the population ratio and inverse success probability ratio. This condition is more likely to hold when i is large, when there are few low types (small ν_L), or when the low types’ net gain from misreporting is small. Conversely (Proposition 7), the competitive equilibrium is constrained efficient — and no transfers are needed — if δ_L/δ_H + ν_H/ν_L < 1, which obtains when the low types are very risky (low δ_L) or very numerous (high ν_L), making subsidization costly.

Empirically, Liang estimates a dynamic panel model of liquidity-to-assets ratios using the Kauffman Firm Survey (KFS), a longitudinal survey of 4,928 new U.S. firms from 2004-2011 (660 in the balanced panel after cleaning). Using a first-difference transformation with Anderson-Hsiao IV (instrumenting lagged differenced liquidity-to-assets with its second lag and differenced liquid collateral with its own lag), the preferred estimate (column 5) shows that firms holding liquid collateral to obtain loans hold on average 19.83% more liquid assets as a share of total assets before the loan application than do comparable firms that pledge illiquid or no collateral. This is treated as evidence for the signaling motive. The precautionary motive is confirmed: firms reporting credit difficulties hold an additional 9.93% of total assets in liquid form, and a one-percentage-point increase in R&D-to-assets (proxy for growth opportunities) is associated with 0.09% higher liquidity-to-assets. The transaction motive is confirmed: a one-percentage-point increase in total assets is associated with 0.09% lower liquidity-to-assets. The tax and agency motives are not statistically significant for small firms.

A moral hazard extension (Appendix E) relaxes the assumption that banknotes can only be used to purchase capital. When entrepreneurs can divert loan proceeds to consumption (at cost), a third screening tool is added — loan size — and equilibria are more distorted and more likely to be distorted (Propositions 8-10). The threshold i above which two-tool screening kicks in falls, and loan amounts are reduced below the complete information optimum, which does not occur in the baseline.

In depth

Q1. What is the paper’s core identification challenge in the empirical section, and how does it address it?

The main challenge is that the decision to pledge liquid collateral is endogenous to unobserved firm characteristics that also affect liquidity holdings. OLS suffers from omitted variable bias (the lagged liquidity-to-assets ratio is correlated with the error). Fixed effects corrects for firm heterogeneity but introduces Nickell (1981) downward bias in the lagged dependent variable. The first-difference transformation removes fixed effects but creates a mechanical correlation between the differenced lagged liquidity variable and the differenced error. The Anderson-Hsiao IV strategy instruments the differenced lagged liquidity-to-assets with its second lag in levels (column 4) and additionally instruments differenced future liquid collateral with its own lagged difference (column 5), addressing the endogeneity of the collateral-pledging decision. The Cragg-Donald Wald F-statistic is 62.056, exceeding the Stock-Yogo weak instrument threshold of 7.03, supporting instrument relevance.

Q2. What is the signaling mechanism in precise terms, and how does it differ from Leland-Pyle (1977)?

In the model, high-type entrepreneurs hold excess liquid assets (beyond what precaution alone requires) and pledge them as down payments on bank loans. Because the precautionary marginal benefit of holding liquid assets is higher for high types (they have better investment projects and thus more to gain from self-financing), the cost of holding the additional liquidity required by a high-type loan contract is lower for high types than for low types. This makes the down-payment requirement a credible separating device: low types will not mimic high types by holding the required level of liquidity because the cost of doing so outweighs the savings on repayment. The marginal benefit of liquidity thus includes both a precautionary term (gain when unmatched) and a signaling term (relaxes the incentive compatibility constraint on low types). Leland-Pyle (1977) also features signaling through self-finance, but obtains a continuum of signaling equilibria. The present model has a unique separating equilibrium because directed search imposes bilateral matching and a capacity constraint on bankers, eliminating the equilibrium multiplicity.

Q3. How are the four equilibrium regimes generated and what determines which one prevails?

The regime depends on the opportunity cost of holding liquidity i (equivalently, the policy rate) relative to three cutoffs i < i-bar < i-double-bar. At low i, both types prefer self-finance (high net return on liquidity, so the gain from a bank loan is small). As i rises, high types enter the credit market first because they have a larger surplus from obtaining a bank loan; low types follow at a higher cutoff. Once both types are in the market, the incentive compatibility constraint for low types (IC-LH) may or may not bind. When IC-LH is slack, only a small down payment is needed, and the allocation is undistorted (regime 3). When IC-LH binds — at yet higher i because holding large amounts of liquidity becomes even more attractive to misreporting low types as the precautionary value of liquidity falls — bankers must use both down payment and market tightness, distorting the allocation (regime 4). The policy rate thus operates on the outside option, reshaping the credit market structure endogenously.

Q4. Why is the loan size (intensive margin) undistorted even when the extensive margin (market tightness and down payment) is distorted?

Once bankers successfully screen out low types using down payment and market tightness, they have no further incentive to distort the loan amount issued upon matching. The first-order condition for loan size in the high-type contract remains δ_H f’(ℓ_H) = 1 (Equation 8), which is the complete information optimum. The logic is that down payment and market tightness are the instruments that affect the incentive compatibility constraint, and once these are set at levels that prevent mimicry, the loan size can be set efficiently to maximize surplus from the match. This is a standard feature of competitive screening equilibria in the GSW framework and contrasts with the moral hazard extension, where the loan size is distorted because diversion of funds is possible.

Q5. What is the key externality that makes the competitive equilibrium constrained inefficient, and how does the planner correct it?

Bankers in the high-type submarket post contracts taking the payoff of low-type entrepreneurs (in the low-type submarket) as given. But the low-type payoff enters their incentive compatibility constraint (IC-LH), which governs how much down payment and rationing they must impose. When the planner raises the low-type payoff (by subsidizing low types), the IC-LH constraint relaxes: the low types are already better off and have less incentive to mimic. This allows bankers to offer high types smaller down payments and more loan supply, increasing high-type welfare. If the benefit to high types (lower screening cost) exceeds the tax cost, a Pareto improvement is possible. The planner implements this through type-contingent transfers: taxing bankers who serve high types, subsidizing bankers who serve low types. The planner can internalize the cross-submarket externality because it controls both submarkets simultaneously, whereas competitive bankers each maximize their own submarket’s contracts taking the other as given.

Q6. What is the non-monotonicity of screening intensity in δ_L, and what is the intuition?

Proposition 4 shows that the equilibrium high-type liquidity holding z_H and market tightness θ_H are non-monotone in δ_L (the low type success probability), with a cutoff δ-bar_L. For low δ_L: either the low types are not in the loan market at all, or they would not want to mimic the high types even if the down payment is small, because the precautionary value of holding so much liquidity outside the loan market is very low for low types with poor prospects. As δ_L rises (low types become moderately good), they want to mimic high types more aggressively (higher repayment savings) while the cost of mimicry remains moderate, so down payment and rationing must both be higher. At very high δ_L (low types nearly as good as high types), the types are similar and a small amount of screening suffices again. Distortions peak at intermediate δ_L where the benefit-cost ratio of misreporting for low types is maximized.

Q7. How does the moral hazard extension change the results compared with the baseline?

In the baseline, banknotes can only purchase capital (observable investment). In the extension (Appendix E), banknotes can also buy consumption goods at unit cost C(χ), introducing dual deviation: a low-type entrepreneur who misreports can both obtain a high-type loan and divert some of the proceeds to consumption. This raises the low types’ payoff from misreporting (U^mh_LH > U_LH), tightening the incentive constraint. As a result: (i) a third screening tool is deployed — bankers reduce the loan size below the complete information optimum (ℓ^mh_H < ℓ*_H); (ii) the threshold i above which multi-tool screening kicks in is lower (i-double-bar^mh ≤ i-double-bar), so distorted equilibria occur over a larger parameter space; (iii) in the distorted region, allocations are more distorted along all three margins (loan size, liquidity, market tightness). When χ ≤ δ_L/δ_H (the cost of diverting banknotes to consumption is high enough that low types prefer to invest all proceeds), the extension coincides exactly with the baseline.

Q8. How does this paper relate to Guerrieri, Shimer, and Wright (2010) and what does it add?

GSW show that directed search with adverse selection generates a unique separating equilibrium in which market tightness (loan approval rate) is the dominant screening device, while down payment (liquidity) is not used when the self-finance option is absent. In GSW’s setup applied to credit markets, liquid assets are redundant — without an endogenous outside option, there is no precautionary demand and no signaling demand for liquidity (Proposition 5 of this paper). Liang’s contribution is to introduce the self-finance channel as an endogenous outside option to the GSW framework. This makes liquidity valuable both outside the credit market (precautionary motive) and inside it (signaling/screening device). The result is that both down payment and market tightness are used as screening instruments in the fully distorted regime, whereas GSW uses only market tightness. This also changes the constrained efficiency analysis: Liang shows that the planner can fully undo adverse selection under certain conditions, a result that does not arise in the vanilla GSW model.

Q9. What robustness and consistency checks are run in the empirical section?

The empirical section runs OLS (column 1), one-way fixed effects (column 2), first-difference transformation OLS (column 3), Anderson-Hsiao IV with one instrument (column 4), and Anderson-Hsiao IV with two instruments (column 5, the preferred specification). The consistency of the lagged liquidity estimator is checked against the Nickell bounds: Bond (2002) recommends the consistent estimate should lie between the OLS and FE estimates (0.4920 and -0.1833); the preferred IV estimate (0.2766) satisfies this. Instrument strength is verified with the Cragg-Donald Wald F-statistic (62.056 vs. threshold 7.03). The paper acknowledges that the liquid collateral coefficient may be biased in either direction: upward if firms that plan to pledge liquid collateral but fail to obtain loans are misclassified as non-signalers, or downward if ineligible firms (with insufficient liquid assets to pledge) are misclassified as non-signalers. The direction of bias is ambiguous, which limits the paper’s ability to bound the true signaling motive magnitude.

Q10. What are the policy implications and their scope conditions?

First, the paper recommends cross-subsidization — taxing high-type borrowers and subsidizing low-type borrowers — to restore the complete information allocation when the equilibrium is distorted. This is implementable through type-contingent tax policies on bank loans. The scope condition (Proposition 6) is that the high types’ net surplus from borrowing must exceed the low types’ scaled gain from misreporting (Equation 11); this is more likely to hold when i is large (high policy rate), ν_L is small (few low types), or δ_L/δ_H is very small or very close to 1 (extreme types). Second, and more restrictively, if δ_L/δ_H + ν_H/ν_L < 1 (low types are very risky or very numerous), the competitive equilibrium is already constrained efficient and no transfers are needed. Third, on monetary policy: a rise in the policy rate can trigger a transition from an undistorted to a distorted equilibrium, causing welfare to fall. The paper interprets this as a caution against using high policy rates when credit market adverse selection is a concern. The paper also connects to loan guarantee programs (analogous to low-type subsidies), citing Chilean evidence (Cowan et al. 2015) showing that guarantees increase both guaranteed and non-guaranteed credit supply, consistent with the model’s cross-submarket externality mechanism.

Q11. What are the main data limitations acknowledged in the empirical analysis?

The KFS records the type of debt collateral only in the last three years of the survey (2009-2011), severely limiting the time dimension for liquid collateral analysis. This prevents the use of GMM estimators (Arellano-Bond 1991) that require different lag instruments across periods. The KFS does not record ex post loan outcomes (interest rates, default rates), so the paper cannot directly test the model’s prediction that loans with liquid collateral carry lower interest rates and lower default rates (unlike Berger et al. 2016 using Bolivian data). Loan application outcomes are also not available, preventing a sample restriction to successful applicants, which would resolve one direction of bias in the signaling motive estimator. The liquid collateral variable encompasses all debt types (business loans, credit cards, lines of credit), not only commercial bank loans, which is the model’s focus.

Key Concepts

Signaling motive for liquidity: In the paper’s sense: small firms hold liquid assets specifically to satisfy bank down payment requirements, thereby credibly signaling their investment quality (high success probability) to lenders who cannot observe borrower type. This is distinct from the textbook corporate finance definition of signaling; here the signal operates through costly liquid collateral pledged inside the credit contract, not through equity stakes or dividends.

Self-finance channel: In the paper’s sense: the outside option to bank borrowing, in which an entrepreneur uses accumulated liquid holdings to directly purchase capital and invest when she either fails to match with a banker or prefers not to. The channel is endogenous — its value depends on the entrepreneur’s liquidity holdings z and investment success probability δ_j — and is the structural ingredient that makes liquidity valuable both inside and outside the credit market.

Market tightness (θ) as a screening device: In the paper’s sense: bankers deliberately make high-type loan contracts scarce (low θ_H, i.e., few bankers per entrepreneur in the high-type submarket), reducing the loan approval probability µ(θ_H). Because low types have a lower surplus from obtaining a high-type loan than high types do, they are disproportionately discouraged by a low approval probability. Market tightness is the extensive-margin screening instrument in the GSW framework; this paper adds down payment as a second instrument.

Down payment (d) as inside collateral: In the paper’s sense: liquid assets pledged at the time of loan application, paid from the entrepreneur’s own liquid holdings z. Called ‘inside collateral’ because the pledged assets (liquidity) are used in financing the project, as opposed to ‘outside collateral’ (equipment, inventory) not used in the financed project. The down payment is the intensive-margin screening instrument; high types pledge d_H = z_H, their full liquid holdings.

Constrained efficiency with adverse selection: In the paper’s sense: the best allocation achievable by a social planner who faces the same information asymmetry (types are private) and the same search frictions as agents, and who maximizes a welfare-weighted sum of entrepreneur payoffs subject to incentive compatibility, participation, and budget balance constraints. The paper shows the competitive equilibrium may fail constrained efficiency due to a cross-submarket externality not internalized by individual bankers.

Dual deviation (moral hazard extension): In the paper’s sense (Appendix E): when loan proceeds (banknotes) can be used to purchase consumption goods as well as capital, a low-type entrepreneur who misreports her type faces two deviation margins — misreporting her type (adverse selection) and diverting loan proceeds to consumption rather than investment (moral hazard). Dual deviation raises the low types’ payoff from mimicry and forces bankers to add loan size as a third screening tool, at the cost of an inefficiently small loan.

Opportunity cost of liquidity (i) and regime transitions: In the paper’s sense: i = 1/(β(1+r_z)) − 1, the per-period cost of holding one unit of liquid assets, which equals the inflation rate π in steady state. As i increases, it simultaneously raises the self-finance outside option (liquidity becomes a better investment channel) and affects the low types’ incentive to mimic high types, triggering discrete transitions between four equilibrium regimes from no credit market participation through increasingly distorted screening configurations.

An Analytical Model of Behavior and Policy in an Epidemic

Thu, 01 Jan 2026 00:00:00 +0000

Layer 1: Overview

This paper builds a tractable, fully analytical version of the workhorse macro-epidemiology (“econ-epi”) model and uses it to characterize how susceptible individuals behave during a deadly epidemic, how a social planner would have them behave, and the externality that separates the two. The motivation is that prior macro-SIR results came almost entirely from numerical simulation; a closed-form treatment can expose general insights those simulations missed and provide a transparent benchmark for any future epidemic. The model appends the standard Kermack-McKendrick SIR system (susceptible S, infected I, recovered R, deceased D, with transmission rate β, recovery rate γr, death rate γd, and γ := γr + γd) with forward-looking agents who choose an activity level λ ∈ [0,1] that scales transmission via β = βa·λ + βo. The single key modeling departure is LINEAR (rather than convex) costs of mitigation, microfounded by indivisible activity choices in the spirit of Rogerson (1988); this makes the optimal control bang-bang or singular and yields closed-form solutions. Three constants organize the analysis: the herd immunity threshold S̄ := γ/β, the basic reproduction number R0 := 1/S̄, and the infection fatality rate IFR := γd/γ. A central composite statistic is the cost-benefit ratio of mitigation κ := (uW − uL)/(βa·IFR·VSL), where VSL := uW/ρ is the value of statistical life in utility terms.\n\nMain results. (1) Decentralized equilibrium (Proposition 1): there is no mitigation at the very start and the very end of the epidemic; mitigation occurs only over an interval [t0, t1). Susceptibles begin mitigating just below full susceptibility, the infection rate peaks exactly at t0 (when precautions are greatest), and from then on the effective reproduction number sits slightly below one, producing a gently declining infection path — a pattern the author notes is broadly consistent with first-wave Covid-19 data. The equilibrium infection trajectory is approximated by the simple ray I(t) ≈ (S(t)/S̄)·κ, and the equilibrium steady-state susceptibility is S∞ ≈ S̄ − S̄·√(2κR0). A higher κ and lower S̄ both reduce mitigation and raise infections (a “fatalism effect”). (2) Socially optimal behavior (Propositions 2-3): optimal policy is bang-bang (λ* ∈ {0,1}) — no mitigation at start and end, full mitigation in a single intermediate interval. The planner “holds fire,” lets infections climb high, then imposes maximal restrictions late, driving the system quickly to herd immunity. The optimal long-run susceptibility is S∞* ≈ S̄ − S̄·2κR0/(κR0 − 1)². (3) The externality: contrary to the conventional view, susceptibles’ privately optimal behavior is EXCESSIVELY cautious — the equilibrium infection rate lies below the optimal infection rate for any S above herd immunity — yet cumulative deaths are HIGHER in equilibrium than under the planner. Mitigation by susceptibles mostly substitutes infection risk intertemporally (“flattening the curve also makes it fatter”); beyond eliminating epidemic overshoot it cannot prevent the inevitable share 1 − S̄ from being infected. The planner’s late-strong-short lockdown comes close to implementing a lottery that randomly selects who gets sick.\n\nImplications. Because the externality runs in the opposite direction to standard intuition, optimal policy can call for the government to INCREASE interaction (the paper cites the UK’s 2020 “Eat Out To Help Out” subsidy as an analogue). Results are framed as technical/foundational insights, not direct prescriptions: the benchmark abstracts from reinfection, variants, vaccines/cures, healthcare capacity limits, and endogenous IFR, all of which can shift specific recommendations while leaving the underlying forces intact.

In depth

Q1. What is the ‘identification’ or solution strategy, and what makes the analytical characterization possible?

This is a theory paper, so the relevant strategy is solving the dynamic optimization analytically rather than empirically. The enabling assumption is LINEAR costs of mitigation (instantaneous utility u = λ·uW + (1−λ)·uL), microfounded by indivisible activity choices as in Rogerson (1988), where λ is the probability of being active in a mixed-strategy equilibrium. Linearity makes the current-value Hamiltonian linear in the control λ, so the optimal control is bang-bang or singular with switching function ψ(t) := uW − uL − (ηs(t) − ηi)·βa·I(t). This permits closed-form characterization of switching points and trajectories. The main ’threat’ the author addresses is generality: does linearity drive the conclusions? Section VI shows numerically that convex costs (U = uL + λ^(1−α)·(uW − uL), with α the convexity degree) merely smooth out the kinks and corners without changing qualitative features — passing what the author calls the ‘Solow test.’

Q2. What is the core economic mechanism behind ’excessive caution,’ and the two ways the paper frames the externality?

In equilibrium, the singular-control optimality condition equates a constant marginal cost of mitigation (uW − uL) to a marginal benefit (ηs(t) − ηi)·βa·I(t). The shadow value of being susceptible ηs(t) rises over time (cumulative future infection risk and cumulative future mitigation effort both decline as the epidemic progresses), while ηi is constant. To keep the equation balanced, βa·I(t) must fall, so agents become more cautious over time. First framing of the externality: the planner recognizes that at least 1 − S̄ of the population must eventually be infected (and a share IFR of those die); individuals recognize this too (perfect foresight) but each wants to avoid being in the infected group, so they over-mitigate, merely delaying rather than preventing infections. Second framing: stronger mitigation today lowers near-term infections but raises later infections — ‘flattening the curve also makes it fatter’ — so beyond removing overshoot, mitigation only substitutes infection risk intertemporally. The planner internalizes the whole time path; individuals take the aggregate infection rate as given.

Q3. Why is the optimal lockdown ’late, strong, and short’ rather than gradual?

From the planner’s law of motion, the velocity Ṡ/S is proportional to I. An interior λ would lower instantaneous costs proportionately but increase the duration of mitigation more than proportionately (since both λ and I are lower), so gradualism is dominated. This makes optimal policy bang-bang with a single interval of maximal restriction. The planner therefore holds fire, lets I climb high (where the system moves fast), then imposes λ=0 to drive the trajectory quickly to herd immunity — minimizing cumulative deaths at minimum cost rather than flattening the curve.

Q4. How do equilibrium and optimal cumulative deaths compare, and why does the more cautious equilibrium produce MORE deaths?

Cumulative deaths equal IFR·(1 − S∞). The equilibrium steady-state susceptibility S∞ ≈ S̄ − S̄·√(2κR0) lies below the planner’s S∞* ≈ S̄ − S̄·2κR0/(κR0 − 1)², meaning the equilibrium overshoots herd immunity by more, so 1 − S∞ (cumulative infections) and hence deaths are higher in equilibrium. The equilibrium’s caution lowers the infection rate at each S above herd immunity and stretches the epidemic out (raising economic cost), but does not prevent the inevitable infections and in fact allows more overshoot than the planner’s quick-to-herd-immunity strategy. Cumulative death toll is increasing in R0 and in κ.

Q5. What is the role of the cost-benefit ratio κ and the ‘fatalism effect’?

κ := (uW − uL)/(βa·IFR·VSL) combines preferences, epidemiology, and policy effectiveness: the numerator is the utility cost of mitigation; the denominator is the benefit (lower activity reduces transmission by βa, preventing deaths by IFR, each life worth VSL = uW/ρ). A higher κ lowers mitigation and raises the equilibrium infection rate, starts mitigation later (lower S(t0)), and raises cumulative deaths. The ‘fatalism effect’ has two parts: a lower S̄ (greater lifetime chance of falling ill) dissuades mitigation today; and the high expected cumulative future mitigation effort at the epidemic’s start lowers the value of staying alive, further tempering precaution. The simple approximation I(t) ≈ (S(t)/S̄)·κ captures the first part but omits the second.

Q6. What is the practical ‘back-of-the-envelope’ contribution?

The paper provides a recipe to trace the equilibrium epidemic path without solving the full dynamic model: (1) compute the thresholds S(t0) ≈ 1 − κ/(√(2κR0)·(1−S̄))·S̄(1−S̄), S(t1) ≈ S̄ − ρ/(βo + βa), and S∞ ≈ S̄ − S̄·√(2κR0); (2) plot the ray I = (S/S̄)·κ between the thresholds; (3) splice it on both sides with the no-mitigation (λ=1) trajectory I = −S + S̄·log S + C0. This rivals running the naive SIR model in simplicity but is grounded in optimizing behavior, giving a more plausible benchmark for human populations. The author intends it for forecasting any future epidemic.

Q7. How do the results relate to and differ from prior numerical econ-epi work?

The equilibrium characterization is qualitatively consistent with Farboodi et al. (2021) — little mitigation at the start, then a jump keeping the effective reproduction number just below 1 — the only difference being their path is smoother due to convex costs. Eichenbaum-Rebelo-Trabandt (2021) get a qualitatively different, still hump-shaped equilibrium infection path because in their calibration mitigation is too weak to push the effective reproduction number below 1 (so βo is not ‘sufficiently low’). For the planner, the paper’s late-strong-short lockdown differs from work finding early/strong responses (Farboodi et al.) or intermediate restrictions (Alvarez et al. 2021; Eichenbaum et al. 2021), for two reasons: (1) this model rules out suppression/vaccine arrival as a feasible endgame, whereas papers allowing vaccine arrival find early strong suppression optimal; (2) the planner here controls only susceptibles’ behavior with linear costs, whereas broader instruments and convex costs make intermediate restrictions more attractive. The paper is, to the author’s knowledge, the first to derive equilibrium and optimal behavior fully analytically and to show the susceptibles’ externality makes the infection rate too LOW socially.

Q8. What do the costate (shadow-value) dynamics reveal?

The private value of infection ηi = (uI + (γr/ρ)·uW)/(ρ+γ) is time-invariant (payoffs while ill/recovered/dead don’t depend on timing). The social value of an infected person ηi is time-varying because the planner internalizes onward transmission via a (ηi − ηs)(βaλ + βo)S* term. ηi is deeply negative at the epidemic’s start (diverging as I→0, because an infinitesimal seed inflicts unboundedly large relative damage), rises sharply and roughly tracks the private value during the bulk of the epidemic (e.g. when S ∈ [0.5, 0.9]), and settles just above zero in the long run. In the long run the social value of an additional infected person can even be negative when γd is high, because the value of that person’s life is below the welfare loss from infections they spread. The social value of a susceptible ηs is always below the private value (except converging to uW/ρ in the long run), reflecting unpriced future contagion.

Q9. What robustness/extension checks does the paper run?

Section VI: (1) Convex costs (numerical, α=0.3) smooth kinks but preserve qualitative features. (2) Broader planner instruments — controlling susceptibles AND infected (without distinguishing them), or restricting everyone identically — are ‘double-edged’: more costly (especially late when many are recovered) but more effective because they also restrict the infected; effectiveness gains peak at intermediate restrictions (around λ=1/2) due to the quadratic contact function, which makes intermediate restrictions and earlier/longer lockdowns more attractive, moving results toward Alvarez et al. (2021). Section VII discusses healthcare/ICU capacity constraints (optimal to hold infections at the capacity level until near herd immunity; endogenous IFR brings equilibrium and optimal paths closer but doesn’t change the externality’s nature), feasible suppression (optimal policy becomes a discrete choice between herd-immunity and best suppression strategy; equilibrium behavior is largely insensitive to suppression feasibility), and temporary immunity/endemicity (strengthens the fatalism effect, raising equilibrium infections; optimal policy still rushes to steady state, now also to avoid costly multiple waves).

Q10. What is the calibration used for the figures, and is it meant to be quantitatively serious?

The calibration resembles Covid-19 but is explicitly illustrative, not a serious quantitative calibration. A model period is a week. Epidemiological parameters: βo = 0.7, βa = 1.24, γr = 0.77, γd = 0.0078, implying R0 = 2.5, S̄ = 0.4, IFR = 1%, and average disease duration of 9 days; under full mitigation (λ=0) R0 falls to 0.9. Annual discount rate is 4% (weekly ρ = 0.96^(−1/52) − 1). Utility is logarithmic; weekly consumption is $60,000/52 ≈ $1,250 so uW = log(1250) ≈ 7; full lockdown cuts consumption 20%, giving uL = 6.6, (uW − uL)/uL = 3.2%. With VSL = $10 million, κ = 0.002 (0.2%).

Q11. What are the key caveats and the scope of the policy implications?

The author stresses the model is a stripped-down BENCHMARK: no reinfection, no variants, constant IFR, no cure or vaccine (so herd immunity pins down minimum feasible deaths). Specific results are ’technical contributions, not direct normative prescriptions.’ The striking implication that a planner might subsidize interaction (forcing susceptibles to interact, since optimal activity sometimes exceeds equilibrium activity) faces an implementability problem — restricting activity is easier than increasing it. The herd-immunity-quick strategy ceases to be optimal once suppression is feasible (vaccine/cure expected), ICU constraints bind with endogenous IFR, or immunity is only temporary; but the underlying forces (the susceptibles’ intertemporal infection-substitution externality) continue to operate in all these richer settings.

Key Concepts

Herd immunity threshold (S̄): S̄ := γ/β, the level of susceptibility below which the infected pool shrinks; in this model, because there is no cure or vaccine, it pins down the minimum feasible deaths and is the endgame both equilibrium and planner converge toward.

Cost-benefit ratio of mitigation (κ): κ := (uW − uL)/(βa·IFR·VSL), a composite statistic combining preferences, epidemiology, and policy effectiveness; the numerator is the utility cost of mitigation and the denominator the benefit (transmission reduction βa times deaths averted IFR times value of statistical life). Higher κ means less mitigation and more infections.

Excessive caution / susceptibles’ externality: The paper’s central finding that privately optimal mitigation by susceptibles is too cautious socially — the equilibrium infection rate lies below the optimal rate for any S above herd immunity — because each individual wants to avoid being in the inevitable infected share, merely substituting infection risk intertemporally rather than preventing it; the conventional one-way infected-spreader externality view is therefore incomplete.

Linear costs of mitigation / singular control: The assumption (microfounded by indivisible activity choices à la Rogerson 1988) that utility is linear in activity λ, making the Hamiltonian linear in the control so the optimum is bang-bang or singular; this delivers sharp closed-form solutions whose intuitions survive under convex costs (the ‘Solow test’).

Late-strong-short lockdown: The socially optimal policy in this benchmark: hold fire while infections climb high, then impose maximal restrictions (λ=0) in a single intermediate interval that quickly drives the system to herd immunity — minimizing cumulative deaths at minimum cost rather than flattening the curve.

Costates (ηs, ηi): Shadow values of being in the susceptible and infected states. ηi (private) is constant since the payoffs of being ill are timing-independent; the planner’s η*i is time-varying because it internalizes onward transmission and can even be negative in the long run when the death rate is high.

Central Banks as Dollar Lenders of Last Resort: Implications for Regulation and Reserve Holdings

Thu, 01 Jan 2026 00:00:00 +0000

Layer 1: Overview

This paper investigates why non-U.S. central banks accumulate large holdings of dollar-denominated foreign exchange reserves, focusing on a previously under-emphasized motive: the currency mismatch of private-sector non-financial firms. When domestic firms borrow heavily in dollars despite having predominantly local operating revenues, the central bank faces potential liability as a dollar lender of last resort (DOLLR) in the event of a banking crisis coinciding with a dollar appreciation. The paper combines motivating empirical evidence with a formal theoretical model to analyze the optimal policy mix between ex ante financial regulation (bank capital requirements) and ex post reserve accumulation, and then extends the model to characterize global externalities arising from decentralized reserve-holding decisions.

The empirical work uses an unbalanced panel of 52 non-U.S., non-Eurozone countries (excluding Hong Kong as an extreme outlier) with 357 observations covering 2013-2020. The sample includes 12 advanced economies, 29 emerging economies, and 11 developing economies. The key dependent variable is central bank dollar reserves as a share of GDP; the key right-hand-side variable is cross-border dollar-denominated bank loans to non-financial corporations (NFC), also as a share of GDP, drawn from BIS Locational Banking Statistics. Because banks tightly offset their own currency exposures (dollar assets and liabilities correlate at 0.965 in the panel), the relevant mismatch resides on NFC balance sheets, not bank balance sheets. Cross-border NFC dollar lending proxies for total NFC dollar lending, with correlations of 0.66 overall, 0.89 for advanced economies, and 0.73 for emerging economies in the 21-country subsample where total data are available.

In the full 53-country univariate regression including Hong Kong, the R-squared is 0.53 and the slope coefficient is 5.3 (t-statistic 7.6): a one-percentage-point increase in NFC dollar loans to GDP is associated with a 5.3-percentage-point increase in dollar reserves to GDP. Excluding Hong Kong, the R-squared falls to 0.083 and the slope to 1.3 (t-statistic 2.5). Splitting by income group, the relationship holds for advanced economies (coefficient 3.7, t-statistic 2.2, R-squared 0.31) and emerging economies (coefficient 2.4, t-statistic 2.5, R-squared 0.18) but is absent and wrongly signed for developing economies. Panel regressions with standard reserve-accumulation controls (M2/GDP, financial openness, bilateral trade with the U.S., GDP per capita, log population) and country fixed effects leave the key coefficient broadly stable and significant at the 5% level for both advanced and emerging economies.

The theoretical framework models a two-period small open economy in which households have an exogenous preference for dollar-denominated safe assets (capturing the dollar’s special status), banks intermediate between these households and a fixed investment project, and banking crises occur with probability q. When the home currency depreciates, currency-mismatched NFC borrowers incur liquidity costs that are quadratic in the share of dollar funding; these costs flow through to the banking system. The central bank can respond with two instruments: (i) accumulate dollar reserves R$ at a carrying cost equal to the dollar-domestic interest rate spread S; (ii) impose capital requirements, which crowd out home-currency deposits but cannot directly control dollar deposits (since mismatch resides off the bank balance sheet in the NFC sector). The optimal level of dollar reserves is decreasing in S and increasing in the fraction of failing banks’ dollar liabilities (pB$). When banking crises and exchange rate depreciations are correlated — as is empirically documented — dollar reserves serve an additional hedging function, because the central bank is more likely to need dollar liquidity precisely when the dollar is strong.

The paper’s primary normative contribution is to show that decentralized central banks over-accumulate reserves relative to a global planner’s optimum. Each central bank, acting as a price-taker in the market for safe dollar assets, ignores that its own reserve hoarding reduces the global supply of dollar-denominated safe assets, driving down the dollar interest rate. A lower dollar rate, in turn, widens the dollar-domestic rate spread S and makes dollar borrowing more attractive to NFCs, amplifying the very mismatch the reserves are supposed to hedge. A global planner internalizes this feedback and therefore prefers lower reserve accumulation combined with tighter capital requirements. This result (Proposition 1) holds for all values of the households’ discount factor beta above a threshold that is shown to be below zero under the natural condition that reserve holdings do not exceed the supply of safe dollar assets — meaning the proposition holds robustly for any realistic calibration, including in extensive numerical experimentation where the threshold never exceeds 0.5. In the paper’s global numerical example, the global planner’s equilibrium has dollar reserves fall from 54.62 to 27.99, capital requirements rise from K=7.61 to K=23.77, dollar borrowing B$ fall from 59.99 to 42.98, and the interest-rate spread S narrow by approximately one percentage point, relative to the decentralized outcome. The welfare decomposition shows that bank profits decline but are more than offset by gains in household utility from dollar deposits and reductions in carrying costs, taxation deadweight costs, and liquidity costs from mismatch.

A further extension examines global risk-sharing. When banking crises are imperfectly correlated across countries, a supranational pooling of reserves (e.g., through the IMF) allows reserves to be reallocated ex post to countries in crisis, reducing total required reserve holdings. This risk-sharing motive reinforces the case for international coordination but raises additional institutional challenges around moral hazard and monitoring. The paper concludes that, analogously to the Basel process for capital regulation, an international coordination mechanism for reserve holdings would be globally welfare-improving, but this potential benefit is less widely recognized.

In depth

Q1. What is the paper’s core empirical identification strategy and what are the main limitations?

The empirical strategy is correlational: the paper regresses central bank dollar reserves (as a share of GDP) on cross-border NFC dollar loans (as a share of GDP) in a panel of 52 countries over 2013-2020, progressively adding controls (M2/GDP, financial openness, bilateral trade with the U.S., GDP per capita, log population, nominal exchange rate) and country fixed effects. The authors are explicit that the regressions cannot establish causality and should be interpreted as suggestive motivating patterns rather than tight causal tests. The main data limitation is that the BIS only provides complete cross-border NFC dollar lending data, not total (cross-border plus local) NFC dollar lending; total data are available for only 21 countries (10 advanced, 11 emerging), and the correlation between the two measures is 0.66 overall (0.89 advanced, 0.73 emerging). Additionally, dollar-denominated bond-market borrowing by NFCs is excluded. The paper also cannot cleanly separate dollar borrowing by exporters (who are naturally hedged) from dollar borrowing by purely domestic non-tradable firms (who are genuinely mismatched).

Q2. What is the mechanism through which reserve accumulation creates a global externality?

Central banks collectively purchase large quantities of dollar-denominated safe assets (e.g., U.S. Treasuries). Each individual central bank takes the dollar interest rate as given (price-taking assumption) and does not account for the effect of its own purchases on the aggregate supply of dollar safe assets in global markets. In the global equilibrium, however, central bank reserve accumulation reduces the net supply of dollar safe assets available to private households, pushing up dollar asset prices and lowering the dollar interest rate. A lower dollar interest rate narrows the dollar-domestic rate spread S, making dollar borrowing cheaper for NFCs, and therefore encouraging greater currency mismatch of private-sector liabilities. This increased mismatch is the very risk that motivated reserve accumulation in the first place, creating a self-defeating dynamic: decentralized reserve hoarding amplifies the aggregate fragility it seeks to hedge. The global planner internalizes this feedback and prefers less reserve accumulation to let the dollar interest rate remain higher, which discourages NFC dollar borrowing even without direct regulatory control over the NFC funding mix.

Q3. What roles do capital requirements and funding-mix regulation play in the model, and how do they differ?

Capital requirements (equity capital mandates) act by crowding out home-currency bank deposits; they do not directly affect dollar deposits because the interior optimum for dollar borrowing by banks is independent of total deposit funding in the baseline model without crisis-exchange rate correlation. Thus in the baseline model, capital requirements do not change dollar borrowing and do not change optimal reserve holdings. When banking crises and exchange rate depreciations are positively correlated, however, capital requirements that reduce total deposits (both home-currency and dollar) do reduce optimal reserve holdings, because holding dollar reserves hedges the need to bail out both types of deposits when crises concentrate in strong-dollar states. Funding-mix regulation (direct control over the proportion of dollar versus home-currency deposits) more directly reduces dollar mismatch and allows the central bank to cut reserves substantially further. In the numerical example with capital-only regulation, reserves fall from 56.9 to 54.6; with both capital and funding-mix regulation, reserves fall to 38.5. The paper notes, however, that funding-mix regulation is unlikely to be empirically relevant because currency mismatch resides predominantly on NFC balance sheets outside the regulatory perimeter, not on bank balance sheets.

Q4. Under what conditions does the global planner prefer more reserves than the decentralized outcome (the ‘wrong-way’ effect)?

There is one channel through which a global planner might want more reserves than individual central banks: by holding more reserves, the planner would depress the dollar interest rate and thereby increase bank profitability (banks can borrow cheaply in dollars and earn the spread). This ‘wrong-way’ bank-profit effect is captured by the term (Q$ - beta) in the global planner’s first-order condition and grows when the spread between the cost of equity capital and the dollar deposit rate is large — i.e., when beta (the discount factor, or equivalently the inverse of the gross cost of equity) is very low. Proposition 1 establishes that the global planner prefers fewer reserves than the decentralized outcome for all beta above a threshold beta-hat. Under the natural constraint that reserves cannot exceed the total supply of dollar Treasury securities, beta-hat is shown to be negative, meaning the global-planner-prefers-fewer-reserves result holds for all positive values of beta. In extensive numerical experimentation, the threshold was never found to exceed 0.5, implying that the wrong-way effect would only dominate if the cost of equity capital exceeded 100% — an implausible calibration.

Q5. How does the paper handle the correlation between banking crises and exchange rate depreciations?

The baseline model assumes crisis probability is independent of the exchange rate. The paper then extends to allow a positive correlation: the probability of a banking crisis rises to (q + h) when the home currency depreciates (dollar strengthens) and falls to (q - h) when it appreciates. This setup nests the baseline as h = 0. With h > 0, two new effects arise. First, dollar borrowing by banks increases because their effective cost of dollar debt is reduced by the implicit put option they have when the dollar appreciates: they default more in the appreciation state, and dollar depositors bear losses. Second, the central bank’s optimal reserve holdings increase substantially, because holding dollars hedges not only future dollar-denominated bailout costs but also home-currency-denominated bailout costs (since crises cluster in dollar-appreciation states where home-currency deposits are worth less in dollars). The formula for optimal reserves gains an additional term proportional to (ph/qz)(Bh + B$) — meaning total bank deposits, not just dollar deposits, now motivate reserve holdings. In this richer environment, any capital regulation that reduces total bank deposits will also reduce optimal reserve holdings, which was not true in the baseline.

Section 5 asks what happens when banking crises are imperfectly correlated across countries, creating scope for risk-pooling. The paper reverts to h = 0 (no exchange rate-crisis correlation) and an inelastic dollar safe asset supply (theta_$2 = 0) to isolate the risk-sharing effect. If a mass q of countries experience crises independently each period, and a supranational institution (like the IMF) can hold a common pool of reserves and allocate them to countries in crisis, then each dollar of pooled reserves provides 1/q times the crisis coverage of a dollar held at the individual-country level. This multiplier means the total required pool of reserves is dramatically smaller: optimal pooled reserves scale with pqB$ rather than pB$. However, the carrying-cost term in the FOC is also reduced by q^2, which partly offsets the coverage multiplier. For empirically relevant small values of the interest-rate spread S, the coverage effect dominates and pooled reserves are substantially lower than individual-country reserves. The extension reinforces the paper’s main message — international coordination reduces required reserve holdings — but also highlights additional institutional challenges: pooling requires the supranational institution to be able to reallocate reserves away from countries not currently in crisis, raising serious moral hazard and monitoring issues.

Q7. How does this paper relate to Bocola and Lorenzoni (2020), and what is the key theoretical distinction?

Bocola and Lorenzoni (2020) is the closest antecedent: it also models reserve accumulation as driven by currency mismatch in the private sector and the central bank’s role as a dollar lender of last resort. The current paper’s key additions are: (i) it explicitly introduces financial regulation (capital requirements, and hypothetically funding-mix regulation) as an alternative or complementary tool to reserve accumulation, showing how the optimal mix depends on the carrying cost of reserves relative to the welfare cost of stringent regulation; (ii) it develops the global externality argument — that decentralized reserve accumulation depresses the dollar rate and thereby endogenously exacerbates the mismatch the reserves are intended to hedge — and shows that a global planner prefers a different mix (more regulation, fewer reserves); and (iii) it provides explicit cross-country empirical evidence linking central bank dollar reserve holdings to NFC dollar borrowing to motivate the mechanism.

Q8. How does this paper relate to the literature on ‘mercantilist’ versus ‘precautionary’ motives for reserve accumulation?

The paper classifies its motive as falling within the broad ‘precautionary’ view, alongside the sudden-stops literature and the banking-system flight-to-dollar-assets literature (Obstfeld, Shambaugh and Taylor 2010, who use M2/GDP as their key proxy). The paper differs from M2-based frameworks by focusing specifically on corporate-sector dollar mismatch rather than the risk of domestic depositor flight. The paper distinguishes itself from the mercantilist view (Dooley et al. 2003; Aizenman and Lee 2010; Benigno and Fornaro 2012), which attributes reserve accumulation to exchange rate management and trade surplus recycling. The normative contribution also relates to Fanelli and Straub (2021), who find that individual countries over-accumulate reserves relative to a global planner; however, that paper’s mechanism is mercantilist (exchange rate stabilization) whereas this paper’s is precautionary (dollar LOLR).

Q9. How does this paper connect to the literature on international coordination of financial regulation?

The paper shares with Clayton and Schaab (2022) the conclusion that countries acting individually impose insufficiently stringent capital requirements relative to the global optimum, motivating the Basel Process of international regulatory cooperation. However, the paper argues that even if capital regulation is fully coordinated internationally, this is not sufficient to achieve the global optimum — there additionally needs to be a separate mechanism to restrain reserve accumulation, because excess reserve holding depresses the dollar interest rate and exacerbates corporate dollar mismatch through a general-equilibrium channel that capital regulation alone cannot offset. The paper thus identifies reserve coordination as a distinct policy dimension that has received less policy attention than capital coordination.

Q10. Why are Eurozone countries excluded from the empirical sample?

Eurozone member countries benefit from either explicit or implicit ECB support in dollar markets. Measuring dollar reserve holdings at the individual country level (e.g., on the Bank of Italy’s balance sheet) and relating them to that country’s corporate-sector dollar borrowing would be conceptually misleading, because the relevant backstop is the ECB at the union level rather than the national central bank. The relevant LOLR function is pooled across Eurozone members. Including them would therefore introduce a systematic bias in the proxy for the dollar LOLR motive.

Q11. What are the scope conditions on the empirical results?

The significant positive association between NFC dollar borrowing and central bank dollar reserve holdings holds for advanced economies (coefficient 3.7, t-statistic 2.2) and emerging economies (coefficient 2.4, t-statistic 2.5) but is absent and correctly (negatively) signed but insignificant for developing economies. The authors note that for advanced economies, the result for the subsample is sensitive to removing both Hong Kong (already excluded from the baseline) and Switzerland, given the small number of countries. The results are presented as suggestive correlations rather than causal estimates; missing data on local-currency NFC dollar lending (available for only 21 countries) and on dollar bond-market borrowing are acknowledged as limitations. The theoretical results apply most cleanly when the interest-rate spread S is not too large (so that the small-S configuration is empirically relevant) and when the discount factor beta is above a threshold that is never found to exceed 0.5 in calibrations.

Q12. What is the model’s treatment of the dollar interest rate and safe asset scarcity?

In the small open economy version, the dollar interest rate (equivalently, the price of dollar safe assets Q$) is exogenously given, consistent with the small-country price-taking assumption. In the global model, Q$ is endogenized: households have a quadratic extra utility from holding dollar safe assets, so Q$ = beta + theta_d + theta_$1 - theta_$2 * D$, where theta_$2 governs the sensitivity of the dollar rate to the total supply of dollar assets (D$). The spread S = Q$/Q_h - 1 becomes endogenous and falls when central banks absorb dollar assets (reserves R$), since this reduces the net supply available to private households. The externality is zero when theta_$2 = 0 (perfectly elastic supply), and increasing in theta_$2. The paper thus situates the externality squarely in the ‘global safe asset scarcity’ framework originating with Caballero, Farhi and Gourinchas (2008) and Bernanke (2005).

Q13. What is the welfare decomposition from the global numerical example?

Table 5 normalizes total welfare in the no-regulation, no-reserve benchmark to 100. Moving from no-regulation to the local-planner outcome (with capital requirements and reserves) raises total welfare from 100 to 113.4, driven largely by a reduction in the deadweight costs of taxation (from -131.9 to -70.7) as reserves substitute for costly fiscal bailouts, despite increased carrying costs of reserves (-18.6) and higher liquidity costs due to unchanged dollar borrowing. Moving from the local-planner to the global-planner outcome raises welfare further to 120.4. This additional gain comes from: a large reduction in carrying costs of reserves (from -18.6 to -5.8), reduced deadweight taxation costs (from -70.7 to -61.3), reduced liquidity costs from mismatch (from -13.8 to -7.1), and increased household utility from dollar deposits (55.8 vs. 43.9) — all more than offsetting a decline in bank profits (138.8 vs. 172.6).

Q14. What policy implications does the paper draw, and how are they scoped?

First, international coordination of reserve holdings — analogous to the Basel Process for capital regulation — would improve global welfare by internalizing the safe-asset-scarcity externality. The paper frames itself as initiating a conversation about what such a coordination process might look like; it does not propose a specific mechanism. Second, tighter capital regulation combined with reduced reserve accumulation is the globally optimal policy mix, but individual central banks will not choose this combination unilaterally because they do not internalize the general-equilibrium impact of their reserve holdings on global dollar rates. Third, the risk-sharing extension implies that pooled supranational reserve management (e.g., through the IMF) could substantially reduce the total quantity of reserves needed globally, but this requires the supranational institution to have significant powers to reallocate reserves across countries mid-crisis, raising governance challenges around moral hazard and monitoring. Fourth, the paper does not advocate for coordinating away all reserve holdings — it acknowledges other legitimate reserve motives (sudden stops, domestic bank runs, exchange rate management) not modeled here.

Key Concepts

Dollar lender of last resort (DOLLR): A central bank that stands ready to supply dollar liquidity to its domestic banking system during a crisis in which currency-mismatched borrowers face distress because the home currency has depreciated against the dollar. The DOLLR role motivates holding dollar reserves in advance.

Currency mismatch: A situation in which non-financial corporations (and, by extension, the banking sector that lends to them) have liabilities denominated in dollars while their revenues and assets are predominantly in home currency, creating exposure to losses when the home currency depreciates. In this paper’s framework, mismatch is measured by the ratio of cross-border NFC dollar bank borrowing to GDP.

Carrying cost of reserves: The expected negative return earned by the central bank on its dollar reserve holdings, equal to the spread S between the domestic interest rate (what the central bank pays on the government bonds it issues to finance reserve purchases) and the dollar interest rate (what the reserves earn). A higher S makes reserves more costly to hold and tilts the optimal policy toward financial regulation.

Safe dollar asset scarcity externality: The general-equilibrium feedback by which individual central banks’ reserve accumulation reduces the net supply of dollar-denominated safe assets available to private households, lowers the dollar interest rate, and thereby makes dollar borrowing cheaper for NFCs — amplifying the currency mismatch that motivated reserve accumulation in the first place. Individual price-taking central banks do not internalize this externality.

Decentralized vs. global-planner equilibrium: The decentralized equilibrium is one where each country’s central bank sets capital requirements and reserve holdings to maximize own-country welfare, taking the dollar interest rate as given. The global-planner equilibrium internalizes the impact of aggregate reserve accumulation on the endogenous dollar interest rate. The paper establishes (Proposition 1) that the global planner chooses strictly fewer dollar reserves and strictly higher capital requirements than the decentralized equilibrium, for all empirically plausible parameter values.

Precautionary reserve motive: The class of explanations for foreign exchange reserve holdings based on self-insurance against adverse future shocks, including sudden stops, domestic depositor flight, and (in this paper) the need to serve as dollar lender of last resort when corporate currency mismatch generates systemic banking distress. Contrasted with the ‘mercantilist’ motive based on exchange rate management and trade surplus recycling.

Risk-sharing (pooled reserves): The efficiency gain achievable when banking crises are imperfectly correlated across countries and a supranational institution holds reserves centrally and redistributes them to countries experiencing crises. Each dollar of pooled reserves provides 1/q times the crisis coverage of a dollar held by an individual country, where q is the fraction of countries in crisis at any given time, enabling total reserve requirements to be substantially smaller.

Deciphering Federal Reserve Communication via Text Analysis of Alternative FOMC Statements

Thu, 01 Jan 2026 00:00:00 +0000

This paper proposes a text-based measure of monetary policy stance by modelling FOMC post-meeting statements as convex combinations of the staff-drafted dovish (“alternative A”) and hawkish (“alternative C/D”) versions that accompany each meeting, providing a transparent and adaptive reference spectrum. The authors fine-tune the Universal Sentence Encoder—a pre-trained language model—using synthetic examples that mirror numerical information in policy actions, enabling the model to capture both semantic tone and quantitative precision. Stance is defined as the product of tone (alignment with the dovish/hawkish alternatives) and novelty (semantic shift from the previous statement), and is decomposed into expected and surprise components using intraday financial data. Surprises arise from shifts in tone relative to market expectations or from statement novelty. The resulting surprise measure aligns closely (correlations of 70–80%) with established high-frequency measures (Swanson 2017, Nakamura-Steinsson 2018, Bauer-Swanson 2023), and the framework enables counterfactual analysis of how alternative communication could have moved markets.

In depth

Q1. What are the alternative FOMC statements and how are they used?

For each FOMC meeting, staff draft multiple versions of the policy statement—typically a more dovish “alternative A,” a baseline “alternative B,” and a more hawkish “alternative C” or “D”—and the paper uses these pre-structured alternatives as a reference spectrum against which to position the released statement. This institutional feature provides a transparent, adaptive measure of tone that evolves with the policy environment and internal deliberations, avoiding the rigidity of pre-fixed tone definitions. The released statement’s embedding in the language model space is compared to the dovish and hawkish alternatives to determine its location on the policy spectrum.

Q2. How is the policy stance measure constructed?

Stance is defined as the product of novelty and tone: novelty captures semantic shifts from previous statements, and tone reflects the alignment of the released statement with the dovish or hawkish alternatives; the observed stance reflects both the content of each position and the relative positioning of the Committee along the policy spectrum. A second, structural interpretation models the released statement as the outcome of internal deliberation—a weighted average of dovish and hawkish stances—linking textual variation to shifts in the internal balance of influence within the Committee.

Q3. How is the stance decomposed into expected and surprise components?

The decomposition into expected and surprise components uses intraday bond price movements to recover the market-expected dovish weight of the released statement, then defines the surprise as the deviation between the realized stance and the market-expected stance. Surprises arise from two sources: deviations in tone relative to expectations, and statement novelty. This framework shows that monetary policy surprises are not just about what the Fed did but also about how it communicated—capturing interpretable surprises that reveal shifts in the Committee’s internal balance.

Q4. How is the measure validated and what are its macroeconomic effects?

The surprise measure aligns closely with established high-frequency measures (correlations of 70–80% with Swanson 2017, Nakamura-Steinsson 2018, and Bauer-Swanson 2023); surprise tightenings reduce stock prices, raise short-term Treasury yields, dampen real activity and inflation, and raise credit risk premia. Local projection estimates corroborate that surprise contractionary shocks have the expected macroeconomic effects, providing a validation that the text-based measure captures meaningful monetary policy information beyond what is already priced in.

Q5. What counterfactual analysis does the framework enable?

The framework enables counterfactual analysis of how alternative FOMC communication could have moved markets—for example, estimating what asset price movements would have occurred had the Committee released the more dovish or hawkish alternative statement rather than the actual release. This counterfactual capability stems from the explicit modelling of stance as a position on a spectrum defined by the staff-drafted alternatives, so the market impact of any point on that spectrum can be estimated.

Key concepts

alternative FOMC statements : staff-drafted dovish (“alternative A”) and hawkish (“alternative C/D”) versions of the FOMC post-meeting statement prepared for each meeting; used as the reference spectrum for measuring the tone and position of the released statement. monetary policy stance : as defined in this paper, the product of tone (alignment with the dovish/hawkish alternatives) and novelty (semantic shift from the previous statement); captures both the direction and the information content of the released statement. tone : the alignment of a released FOMC statement with the dovish or hawkish alternative drafts in the Universal Sentence Encoder embedding space; reflects the direction of the Committee’s communication along the policy spectrum. novelty : the semantic distance of the released FOMC statement from the previous statement in the embedding space; captures how much new information or emphasis the statement introduces. Universal Sentence Encoder (USE) : the pre-trained language model applied by the paper; fine-tuned on synthetic examples that mirror numerical information in policy actions (e.g., rate-hike sizes) to capture both semantic tone and quantitative policy precision.

Did the US Really Grow Out of Its World War II Debt?

Thu, 01 Jan 2026 00:00:00 +0000

Layer 1: Overview

Research question and motivation. The fall in the US federal debt-held-by-the-public/GDP ratio from a postwar peak of 106% in fiscal year 1946 to a trough of 23% in 1974 is widely cited (Elmendorf-Mankiw, Krugman) as evidence that an economy “grows out of” debt because the GDP growth rate exceeds the interest rate on government debt (r < g). That narrative underpins the modern view (Blanchard 2019; Furman-Summers 2020) that high public debt “may have no fiscal cost.” Acalin and Ball ask how much of the postwar debt decline was genuinely due to growth exceeding undistorted real interest rates, versus three other factors: primary budget surpluses, the Fed’s 1942-1951 interest-rate peg before the Fed-Treasury Accord, and surprise inflation.

Method and data. The authors simulate counterfactual debt/GDP paths from the standard debt-dynamics identity D_t = (1+i_t)D_{t-1} - P_t, starting from the actual 1946 debt level and holding nominal GDP fixed at its historical path. They build three counterfactuals: (i) “primary balance” (set primary surplus to zero each year); (ii) “adjusted interest rate” (remove distortions from both the peg and surprise inflation); and (iii) “combined” (both), whose path is driven purely by r* - g, the undistorted real rate minus growth. A key innovation is measuring the “reverse maturity structure” — the fractions of currently outstanding debt issued in each past year — using Hall-Payne-Sargent (2018) data for 1942-1960 and CRSP thereafter. They construct a term structure of inflation expectations from one-year (Livingston, SPF) and ten-year (FRB/US) survey data, and estimate undistorted peg-era real rates from ex-ante real rates on securities issued in 1952-1961. T-bills and TIPS are assumed unaffected by inflation surprises (conservative). Debt is par value, held by the public, by fiscal year.

Main quantitative findings. In the combined counterfactual, debt/GDP falls only to 74% in 1974 (vs. 23% actual); the individual counterfactuals give 40% (primary balance) and 51% (adjusted rate) in 1974. Of the actual 83-point fall (106 to 23), 51 points are explained by surpluses plus rate distortions, decomposed as 17 points from surpluses alone, 28 from rate distortions alone, and 6 from their interaction; only 32 points (the fall to 74%) reflect growth net of undistorted rates. Extending to the present, the combined counterfactual ratio starts rising in 1980, dipping to 70% in 1979 before climbing to 84% in 2022 — only 22 points below the 1946 level of 106. Over the full 76 years, undistorted growth alone would have cut debt/GDP by just 22 points. The post-1979 reversal reflects a sign change in r* - g: average r* rose from 2.3% (1947-1979) to 2.8% (1980-2022) while average g fell from 3.5% to 2.6%. The estimated undistorted real-rate term structure is 1.7% (1yr), 2.2% (5yr), 2.5% (10yr), 2.7% (30yr).

Mechanisms and implications. Primary surpluses averaged 1.1% of GDP over 1947-1974 (peaking at 6.3% in 1948), then turned to persistent deficits. The peg (caps of 0.375% on bills to 2.5% on 30-year bonds) combined with post-1946 inflation surges (CPI averaging 7.1% in FY1947-1951) produced deeply negative ex-post real rates; the aggregate interest-rate adjustment x_t reached 13 points in 1947 and 8 points in 1951. Policy implication: the distortions are unlikely to recur (no peg/price controls planned, Fed committed to low inflation, shorter average maturity — down from 4.4 years in 1951 to 2.2 years in 2022 — blunts inflation’s effect), so substantially reducing today’s 97% (FY2022) ratio will likely require primary surpluses, which CBO projections suggest are not forthcoming.

In depth

Q1. What is the identification/counterfactual strategy and what are its main threats?

There is no causal identification in the econometric sense; the strategy is an accounting simulation of the debt-dynamics identity under counterfactual interest rates and primary balances, holding nominal GDP (and real GDP and undistorted real rates) fixed at historical values. Threats: (1) the undistorted peg-era real rates are unobserved and must be guessed from 1952-1961 ex-ante real rates; (2) the reverse maturity structure (weights w) is held at historical levels even though higher counterfactual debt would alter issuance; (3) general-equilibrium feedback is ignored — higher counterfactual debt would raise real rates and crowd out capital, lowering GDP, both of which would push debt/GDP even higher, so the authors interpret their paths as LOWER BOUNDS; (4) pre-1943 debt is not adjusted for surprise inflation because long-term expectations data do not exist before 1943, which the authors argue biases against finding a large inflation role.

Q2. How are the effects of the peg and surprise inflation distinguished, and can they be separated?

The adjusted-interest-rate scenario removes both jointly. The authors state it would be difficult to separate them cleanly because that requires measures of expected inflation during the peg period (1942-1951), and there are no data on long-term inflation expectations before 1951 or short-term expectations before 1947 (start of Livingston). For post-1952 debt, the surprise-inflation adjustment is pi_t minus the expectation formed when the security was issued; for peg-era debt the adjustment is the gap between the ex-post real rate and the assumed undistorted real rate.

Q3. What is the decomposition relative to Hall and Sargent (2011)?

Hall-Sargent decompose the 1946-1974 debt/GDP change into r-g and primary surpluses but do not ask how interest-rate distortions shape r-g. Replicating their approach (Table 2A), the authors attribute -48.1 points to r-g and -29.6 points to primary surpluses (the terms sum to -78 points, less than the actual -82.9 because of the debt-dynamics residual). The paper’s extension (Table 2B) splits the -48.1 r-g contribution into only -11.7 points from r*-g (undistorted) and -36.3 points from the distortion r-r*, with surpluses still -29.6. So most of the apparent ‘growth out of debt’ was actually interest-rate distortion.

Q4. Why do the Table 2 surplus contributions differ from the Table 1 scenario differences?

In Table 2 surpluses contribute -29.6 points, larger than the 17-point effect implied by the Table 1 difference between actual 1974 debt/GDP and the primary-balance scenario. The reason is an interaction: eliminating surpluses raises the debt path d_{t-1}, which magnifies the r-g term, so additional debt is partly eroded by r-g. The authors call the Figure 7 / Table 1 scenario paths the more precise representation.

Q5. How do the findings reconcile with Blanchard’s (2019) claim that r < g since 1979?

The authors find r > g on average since 1979 (even in the primary-balance counterfactual with actual ex-post rates), so debt/GDP would rise. The difference from Blanchard is purely measurement: (1) they use the government’s interest payments on outstanding debt — the rates set at issuance — whereas Blanchard uses current market yields (a weighted average of 1- and 10-year Treasury rates), which since 1979 have been lower because rates trended down; (2) the authors use pre-tax rates while Blanchard uses after-tax rates. Figure A.11 confirms: with the authors’ measure debt/GDP rises 1979-2022; with Blanchard’s pre-tax market yields it rises then falls back near its 1979 level; with his after-tax rates it falls significantly. The authors argue the rate paid by the government is the relevant one for the debt-dynamics identity, and that a natural baseline assumes debt has no net effect on tax revenue (so pre-tax rates apply).

Q6. What is a notable nuance about the post-1979 period in the primary-balance counterfactual?

The post-1979 rise in debt/GDP is LARGER in the primary-balance counterfactual (19 points, from 34% to 53%) than in the combined counterfactual (14 points). This is because inflation surprises since 1979 have on average been negative (post-Volcker disinflation, actual below expected), raising ex-post real rates and thus debt/GDP. It confirms that actual r has exceeded g since 1979.

Q7. What robustness checks are run?

(1) Undistorted peg-era real rates shifted by +/-0.5% and +/-1% across the whole term structure: 1974 combined debt/GDP ranges from 67% (-1%) to 81% (+1%) around the 74% baseline; 2022 ranges from 78% to 91% around 84% (Table A.2). (2) Pre-1962 interest measured by net interest times 1.1; using net interest directly gives 73% in 1974 and 83% in 2022 vs. 74% and 84% baseline. (3) The debt-dynamics residual epsilon (mainly Treasury cash balances) is held at historical values; setting it to zero gives a combined counterfactual of 78% in 1974 and 77% in 2022, showing the residual contributed -0.19% GDP/year on average over 1947-1974 and +0.25% over 1975-2022. (4) Term-structure shape assumptions and the GDP-deflator-vs-CPI expectation-error approximation are checked in the Appendix as reasonable.

Q8. What heterogeneity across the debt structure matters?

The reverse maturity structure is central: the share of debt with reverse maturities above five years peaked at 48% in 1951 (long-term WWII bonds), then fell, fluctuating between 10% and 25% from 1975-2022; average reverse maturity fell from 4.4 years in 1951 to 2.2 years in 2022. Shorter maturity means inflation surprises erode less debt — a reason later inflation surprises had smaller effects than the 1940s-1970s ones. T-bills (assumed unaffected by surprise inflation since rolled over at adjusting rates) and TIPS (post-1997, indexed) are excluded from the inflation-surprise adjustment. Non-marketable debt fell from 23% of total in 1960 to 3% in 2022; its reverse maturity structure is assumed constant after 1960.

Q9. What are the timing/measurement complications?

Unit is fiscal year (July-June before FY1977, October-September after), creating a ‘Transitional Quarter’ in Q3 1976 requiring special handling. Inflation is GDP-deflator growth. Pre-1970 deflator expectations are proxied from Livingston CPI forecasts assuming equal expectation errors for CPI and deflator. Ten-year expectations before 1968 are fitted from one-year expectations via a regression (1968-1997) with a negative coefficient (-1.549) on the change in smoothed one-year expectations, capturing long-term expectations lagging short-term moves.

Q10. What are the policy implications and their scope conditions?

Because the postwar debt reduction came largely from one-off distortions (the peg with price controls, and surprise inflation) unlikely to recur — and the Fed is committed to low inflation while shorter average maturity weakens inflation’s erosive power — economic growth alone is unlikely to resolve the current ~97% (FY2022) ratio. Substantial reduction will probably require primary surpluses, which CBO projects will not occur under current policy (large primary deficits forecast for three decades). Scope conditions: results are lower bounds (GE crowding-out omitted); they depend on the assumed undistorted real-rate term structure; the 2021-2022 inflation surge is again temporarily reducing debt/GDP.

Key Concepts

Expecting Floods: Firm Entry, Employment, and Aggregate Implications

Thu, 01 Jan 2026 00:00:00 +0000

Layer 1: Overview

This paper studies how the expectation of rising flood risk — distinct from realized flood events — reshapes where firms locate, where workers live and how much they work, and what this implies for U.S. aggregate output. The motivation is climate-driven: roughly 6 million Americans lived within a 100-year flood zone in 1998, rising to 13 million by 2018, and FEMA floodplains are projected to grow about 45% by century’s end. Prior work largely studied actual floods or housing-price effects; this is among the first to examine firm entry and employment responses to anticipated risk.

Data and design: The authors digitize FEMA Special Flood Hazard Zone maps (historic Q3 maps tied to 1998 Flood Insurance Rate Maps, and 2018 National Flood Hazard Layer), measuring flood risk as the share of land area within flood zones at the county and ZIP-code (ZCTA) level. Average flood-zone share rose 1.5 percentage points from 1998 to 2018, with a 20-pp increase at the 90th percentile of ZIP-level changes. Firm entry/exit, employment, population and county real GDP come from Census Business Dynamics Statistics, ZIP Codes Business Patterns, and BEA; actual flood events come from the Dartmouth Flood Observatory. The baseline specification is a two-period (1998, 2018) fixed-effects regression with county (or ZCTA) fixed effects, state-by-year fixed effects, demographic/economic controls (female labor share, manufacturing share, population density, China import-penetration change), and a control for actual flooded area.

Main reduced-form findings: A one-standard-deviation (7-percentage-point) increase in flood risk over 1998-2018 reduced firm entry by 1.2%, employment by 1.2%, population by 0.8% (smaller than employment, implying both relocation and labor-supply margins), and real GDP by 2.4%. Firm exits also declined with higher risk (smaller magnitude), reflecting reduced business dynamism. A county at the 90th percentile of risk increase saw a 3.3% drop in firm entry. ZIP-level estimates are similar. An IV using the interaction of rest-of-state risk change with local geo-climatic conditions (rainfall, temperature, evaporation) yields comparable magnitudes (entry -1.2%, employment -1.4%, GDP -2.2%); a placebo (1990-1998 outcomes) test is insignificant. In sharp contrast, actual flood events had negligible effects on entry, exit, employment and population, but a one-SD (0.4) increase in flooded-area share lowered real GDP by 0.2% in the same year, driven by current-year shocks (lagged effects negligible).

Model and quantification: The authors build a spatial-equilibrium model (McFadden 1978 location choice, Krugman 1980 monopolistic competition) with M = 2,772 counties (96% of 2018 GDP), σ = 5, exit rate κ = 0.08. Flood risk operates through three channels: direct damage, an employment channel (relocation + endogenous labor supply), and a love-of-variety channel (fewer firms). Damage parameters are disciplined by reduced-form evidence (δ = 0.005, δκ = 0.003) and Barrage (2020) (η = 0.002); labor-supply elasticities φL = 1.55, φM = 0.83 are set by indirect inference targeting employment and population responses. Non-targeted moments (output, entry, exit) match the data.

Counterfactuals: Eliminating 2018 flood risk shows it reduced aggregate output by 0.52% (employment -0.31%, firm entry -0.30%, welfare -0.51%). Decomposition: direct damage -0.11% (21%), labor relocation 0%, labor supply -0.33% (63%), variety -0.08% (15%) — so about 80% of the loss is expectation-driven and 20% direct damage. Effects are highly unequal: top-5% and top-1% counties (by output loss) lost 7.9% and 13.9% of output. A projected 4.5% rise in at-risk properties (2020-2050) would cut output 0.12%. Extensions (entry costs in goods, interregional trade, capital and land) yield somewhat larger losses (0.57%, 0.62%, 0.67%). Policy implication: counting only direct damages badly understates disaster costs and the social cost of carbon, because firms and workers rationally adjust to anticipated risk.

In depth

Q1. What is the identification strategy, and what are the main threats to it?

The core design is a two-period (1998 and 2018) fixed-effects regression of log outcomes (firm entry, exit, employment, population, real GDP) on the share of land in FEMA flood zones, absorbing locality fixed effects (time-invariant characteristics like industry composition), state-by-year fixed effects (statewide growth/business cycles), demographic/economic controls, and a control for actual flooded area. The main threat is measurement error in FEMA risk maps: some underlying data are outdated, and political-economy incentives lead politicians and homeowners to resist map updates to avoid higher insurance premiums, so designations may reflect politics rather than true risk. A second threat is omitted local economic trends correlated with both risk and outcomes. The authors address measurement error with a Bartik-type IV (rest-of-state average risk change interacted with own geo-climatic features — satellite temperature, cumulative rainfall, evaporation), controlling for cumulative past flooded area. IV estimates are close to the fixed-effects ones (entry -1.2%, employment -1.4%, GDP -2.2%), with first-stage KP F-statistics around 63-66. A placebo/pre-trend test (regressing 1990-1998 changes on 1998-2018 risk changes, following Goldsmith-Pinkham et al. 2020) yields small, insignificant coefficients, arguing against omitted-trend confounding.

Q2. What are the main mechanisms, and how are they distinguished empirically and in the model?

Three channels: (1) direct damage — realized floods lower firm productivity and firm survival; (2) employment channel — anticipated risk lowers real wages/amenities, prompting out-migration and reduced labor supply per household; (3) love-of-variety — fewer firms enter, reducing the variety component of welfare/output. Empirically, the authors distinguish flood risk (long-run anticipation) from flood events (short-run realization) by estimating both: risk hits entry/employment/population strongly while events do not, but events hit current-year GDP (productivity) while risk hits it more through adjustment. In the model, direct damages are calibrated from the actual-flood GDP and exit responses (δ, δκ); the employment and variety channels are separated in the counterfactual by sequentially allowing population shares, then labor supply, then variety to respond. The decomposition attributes -0.11% to direct damage, ~0% to labor relocation (offsetting in- and out-migration), -0.33% to labor supply, and -0.08% to variety.

Q3. Why does population fall less than employment, and why do firm exits decline?

Employment falls 1.2% while population falls only 0.8% for a one-SD risk increase, implying the response is not purely relocation — remaining households also reduce labor supply. This motivates introducing a positive labor-supply elasticity φL alongside migration elasticity φM, capturing ‘immobile labor’ (as in Autor et al. 2013) where some workers cut hours rather than move. Firm exits decline with higher risk even though floods mechanically raise closures, because higher risk deters entry so much that the stock of firms shrinks, lowering the base of firms that can exit — reflecting reduced business dynamism rather than greater firm survival.

Q4. What heterogeneity is documented?

Large regional dispersion. While national output fell 0.52%, the top-5% and top-1% counties by output loss lost 7.9% and 13.9% of output respectively (the abstract describes top-5% losses of 7-14%). The hardest-hit counties — coastal and riverine areas in southern and eastern regions (e.g., Cape May NJ, Marion County FL, Sharkey County MS) — lost population, labor supply per household, and firms (top-1% counties: -6.1% population, -4.7% labor supply per household, -10.8% firms). Conversely, mildly affected counties (some Midwestern) were ‘winners,’ gaining in-migration, more firm entry, and higher labor supply per worker. For the 2020-2050 projection, direct damages play a smaller relative role (12% vs 21% for 2018) because projected risk increases are more positively correlated with regional productivity, amplifying aggregate adjustment effects.

Q5. What robustness checks are run?

(1) Controlling vs. not controlling for actual flooded area leaves risk estimates stable. (2) ZIP-code-level regressions exploiting finer spatial variation give similar magnitudes (establishments -0.233, employment -0.240, payroll -0.221). (3) Restricting to counties with available Q3 (1998) FEMA maps gives qualitatively similar, slightly larger estimates (Appendix Table A.2); the authors conservatively use baseline estimates for calibration. (4) IV estimation and (5) placebo pre-trend tests as above. (6) Lagged flood shocks (Appendix A.4) have negligible effects, confirming floods act through current-year productivity. (7) Model non-targeted moments (output, entry, exit) match data, and model-data correlations of regional GDP, population, emp-to-pop ratio, and firm count are near unity. (8) The implied regional-population-to-real-wage elasticity φM(1+φL) ≈ 2.1 lies within the 1.1-2.5 range from Fajgelbaum et al. (2018).

Q6. What model extensions are explored and how do results change?

Four extensions, all yielding somewhat larger output losses than the 0.52% baseline: (1) entry costs paid partly/fully in final goods rather than labor — with α=1 the loss is 0.57%, because final-goods prices respond more to risk than wages; (2) interregional trade with traded/nontraded sectors — requires a larger labor-supply elasticity (φL=1.72) to match data, giving a 0.62% loss; (3) capital (mobile, rented at constant global rate) and land (fixed, congestion force) in production — 0.67% loss, since risk also lowers the capital-to-labor ratio (by 0.34%) as capital becomes relatively more expensive, outweighing land congestion (small land share). The authors read the modest size of these differences as evidence the simplified baseline captures the key forces.

It contributes to climate-spatial-economics work (Costinot et al. 2016, Desmet et al. 2021, Alvarez & Rossi-Hansberg 2021, Rudik et al. 2021). Closest are three flood-aggregate studies: Desmet et al. (2021) on coastal-flooding costs via migration and local technology investment; Balboni (2019) on infrastructure misallocation under sea-level risk; Lin et al. (2021) on coastal housing construction. Differences: prior work focuses mainly on coastal land inundation from sea-level rise, whereas this paper uses historic flood-zone designation maps capturing overall flood risk and studies production damage rather than land loss; and it reconciles structural estimates with reduced-form evidence showing firm/worker responses to risk differ from responses to actual floods. Relative to Kocornik-Mina et al. (2020) (satellite-nightlight evidence that floods reduce output transiently), this paper confirms the short-run finding but shows risk has larger, longer-run effects via behavioral adjustment. It relates to Hino & Burke (2020) (same risk data; floods cut property values 1-2%), interpreting housing-price effects as amenity changes; their estimate implies a 0.3-0.6% utility loss, comparable to the paper’s calibrated amenity loss of 0.2%.

Q8. What are the policy implications and their scope conditions?

The central implication is that evaluations counting only direct flood damages substantially understate true costs, since about 80% of the 0.52% 2018 output loss comes from expectation-driven adjustments (labor supply, migration, fewer firms) rather than the 20% direct damage. Direct damages (-0.11%) match FEMA’s ~$17B/year (~0.1% of GDP) estimate, validating the model’s lower bound. Policies addressing climate damage — and estimates of the social cost of carbon — should incorporate firms’ and workers’ long-run general-equilibrium adjustments. Scope conditions: the analysis is U.S.-specific (chosen for systematic flood-risk data), uses establishments as ‘firms,’ abstracts from flood insurance (justified by near-actuarially-fair pricing evidence) and from explicit housing, treats unmapped areas as zero-risk, and assumes observed FEMA designations are the risk signal agents act on despite measurement error. The authors note the approach generalizes to other natural disasters.

Q9. What are notable caveats or limitations?

GDP data do not capture variety/welfare changes, so the love-of-variety channel matters for welfare but is invisible in GDP-based estimates. The amenity parameter η is not directly estimated but imported from Barrage (2020) (output-to-utility damage ratio ~3); the authors note η has little effect on national productivity impact because amenity mostly drives offsetting migration. Labor supply is assumed fixed before shocks (micro-founded by job-search frictions). Flood insurance and housing are not modeled explicitly. Risk is measured by flood-zone land share, which is converted to flood probabilities {rm} via a regression of 2015-2019 actual flooded shares on 2018 zone shares. The two-period long-run design limits dynamics, and counties without FEMA maps are assigned zero risk.

Key Concepts

Flood risk vs. flood events: The paper sharply separates anticipated flood risk (the share of local land in FEMA Special Flood Hazard Zones, a long-run signal firms/workers observe and act on) from realized flood events (the share of area actually flooded in a given year, from Dartmouth data). Risk drives firm-entry and employment relocation; events drive transient productivity/GDP losses.

Expectation effects (vs. direct damages): Output losses arising because firms and workers rationally adjust location, entry, and labor supply in anticipation of flood risk — comprising the employment and variety channels. In 2018 these accounted for about 80% (the employment channel 0.33% plus variety 0.08% of the 0.52% loss), four times the 20% from direct physical damage.

Employment channel: In the model, the mechanism by which higher flood risk lowers real wages and amenities, inducing both out-migration (relocation, ~0% net aggregate effect due to offsetting regions) and reduced labor supply per household (the dominant -0.33% component), governed by elasticities φM (migration) and φL (labor supply).

Love-of-variety channel: The output/welfare loss from fewer firms entering under higher risk, operating through the CES variety term (agglomeration force 1/(σ-1)). It reduced 2018 output by 0.08% and matters for welfare but is not captured in GDP data.

Direct damage channel: The component of flood losses from realized floods lowering firm productivity (parameter δ=0.005) and destroying a fraction of firms (δκ=0.003) plus amenity loss (η=0.002), calibrated from the short-run actual-flood reduced-form estimates; it caused a 0.11% output decline in 2018 (21% of the total).

Indirect inference calibration: The simulated-method-of-moments procedure (Gouriéroux & Monfort 1996) used to set labor-supply elasticities φL=1.55 and φM=0.83: running the same 1998-vs-2018 panel regressions on model-generated data and choosing elasticities so model employment and population responses to flood risk match the empirical coefficients.

Immobile labor: Following Autor et al. (2013), the model feature that some households respond to local flood risk by reducing labor supply rather than relocating, which is why employment falls more (1.2%) than population (0.8%) and motivates a positive labor-supply elasticity φL.

Firm dynamics, monopsony, and aggregate productivity differences

Thu, 01 Jan 2026 00:00:00 +0000

Layer 1: Overview

Research question and motivation. Firms are larger and grow faster over the life cycle in high-income countries, while labor markets in poorer countries are less competitive (employers hold more wage-setting power). The paper asks how important employer labor market power (monopsony) is for explaining cross-country differences in firm dynamics and aggregate productivity. The novelty is that beyond the standard static misallocation-of-workers channel, monopsony also distorts selection into entrepreneurship and productivity-enhancing technology adoption, potentially making the losses larger than prior static estimates suggest.

Data and setup. Stylized facts come from the World Bank Enterprise Surveys (WBES), an establishment-level survey of non-agricultural, non-financial private firms with at least 5 full-time permanent employees, covering more than 90 countries from 2006 to 2021, merged with World Development Indicators GDP per capita (2017 constant USD). The estimation sample restricts to countries that ever had GDP per capita above 25,000 USD and to manufacturing firms with non-missing sales/workers/material/capital data, yielding 37,096 firm-year observations across 31 middle- and high-income countries (poorest: Kazakhstan, 19,615 USD in 2009; richest: Ireland, 91,791 USD in 2020). Local labor markets are defined as location-industry (2-digit ISIC v3.1) pairs. The model is a dynamic general-equilibrium neoclassical-monopsony model with occupational choice (entrepreneur vs. wage worker), endogenous productivity investment, and Card-et-al.-style taste-for-employer (amenity) differentiation that gives firms wage-setting power. It is calibrated to the Netherlands (GDP per capita 54,275 USD; median wage markdown 1.301, implying firm-level labor supply elasticity 3.318) via method of simulated moments.

Main quantitative findings. Empirically, moving from poorer to richer countries in the sample, average firm age triples from 11 to nearly 30 years; annualized firm growth rises ~1.6 percentage points per year per doubling of GDP per capita; the share of firms doing R&D more than doubles (from ~15% to >40%); product innovation rises from 20% to 80% and process innovation from 20% to 50%; and median wage markdowns fall (from ~2.25 at 25,000 USD GDP per capita — workers paid ~55% below marginal product — to ~1.25 at 60,000 USD — paid 20-25% below). The calibrated model matches a right-skewed firm-size distribution, life-cycle growth, employer turnover, age distribution, and R&D share (sum of squared deviations between empirical and simulated moments = 1.7%). In counterfactuals raising the markdown from 1.2 to 3, average firm growth shrinks by more than half (from ~150% to ~50%), average firm size falls from ~60 to ~45 employees, the innovating share halves (from ~40% to ~25%), and average firm productivity is ~20% higher in competitive markets. Differences in wage markdown alone account for 25% of observed cross-country TFP variation (model TFP std dev 0.051 vs. data 0.201), and no less than 11% across robustness checks. In a Netherlands-vs-Greece decomposition, about 85% of the model-implied TFP gap is attributable to lower technology adoption, ~9% to distorted selection into entrepreneurship, and ~6% to static employment reallocation.

Mechanisms and implications. Labor market competition acts as a “skill-biased” force favoring high-productivity firms through three channels: (i) static labor reallocation toward high-productivity, low-amenity firms; (ii) improved selection into entrepreneurship (low-productivity high-amenity agents stop being able to profitably attract workers as ϵL rises); and (iii) higher returns to innovation. The policy implication is that raising labor market competition in less-developed economies could yield substantial productivity gains, and that prior static studies understate the cost of monopsony because they omit the dynamic investment/selection channels.

In depth

Q1. What is the identification/calibration strategy and what are the main threats to it?

The model is calibrated to the Netherlands using a mix of externally set and internally estimated (method-of-simulated-moments) parameters. Externally: model period = 1 year; σν (Gumbel scale) normalized to 1; β = 0.961 (4% annual rate); δw = 0.025 (40-year working life); revenue elasticity of labor ξ = 0.333 (estimated via control function in Section 2); labor supply elasticity ϵL = 3.318 backed out from median markdown 1.301 via ϵL = 1/(µ−1). Six parameters {c_f, c_x, p_i, p_n, σ_z, σ_a} are estimated by MSM. The markdown itself is a key input and is estimated as the ratio of marginal revenue product of labor to wage, with revenue elasticity ξ from a standard control-function approach. Threats: the markdown estimate drives the whole quantitative exercise; the WBES sample is truncated at firms with ≥5 employees (biasing toward larger firms), addressed by re-estimating with imputed moments; and the cross-country counterfactual attributes all variation in ϵL to labor market power while holding all other parameters at Netherlands values, so other cross-country differences are not separately identified.

Q2. What are the three mechanisms and how are they distinguished quantitatively?

(1) Static labor allocation: lower competition raises marginal factor cost only for sufficiently high-productivity firms, reallocating employment toward less-productive, lower-paying employers. (2) Selection into entrepreneurship: when ϵL is low, amenities matter more for profits, letting low-productivity high-amenity agents profitably self-select into entrepreneurship. (3) Technology adoption: returns to innovation increase with ϵL, so weak competition lowers the share of firms investing. They are distinguished via a decomposition that sequentially fixes policy functions at benchmark levels: ~6% of the TFP loss is from employment allocation alone, ~85% from the distortion to innovation policy, and ~9% from distorted selection into entrepreneurship.

Q3. What heterogeneity across firms is documented?

Firms differ in entrepreneurial productivity z and amenity a. Average revenue product of labor rises with productivity and falls with amenities, and this dispersion is much steeper under weak competition: the elasticity of APL with respect to productivity is 0.31 in the baseline (Netherlands) vs 0.79 in the counterfactual (Greece), and with respect to amenities -0.28 vs -0.81. High-productivity, low-amenity firms face the biggest barriers in less-competitive markets and stay inefficiently small; low-productivity, high-amenity firms are propped up. Innovation distortion is concentrated among high-productivity firms.

Q4. What robustness checks are run and what do they show?

Four main checks, each reported as the share of cross-country TFP variation explained (data std dev 0.201): (1) Productivity-amenity correlation — allowing entrants to draw correlated (z,a) with σ_za = 0.296 (matching Sockin 2024’s 0.622 wage-satisfaction correlation) lowers explained variation to ~15% (model std dev 0.030), because correlation reduces scope for reallocation. (2) Costs in terms of labor instead of final goods (per Klenow and Li 2025) gives ~22% (std dev 0.044). (3) Imputed firm-level moments covering all firms (not just ≥5 employees) gives ~14% (std dev 0.028). (4) Over-identified alternative identification using size/age/R&D shares and annualized growth gives ~11% (std dev 0.023). The headline range is therefore 25% baseline, no less than 11% across checks.

It builds on static monopsony cost estimates: Berger et al. (2022, eliminating US labor market power raises average wage 48%, welfare +6% of lifetime consumption); Armangüé-Jubert et al. (2025, labor market power explains 15% of GDP-per-capita gap over development); Deb et al. (2022, less competition lowered US low/high-skill wages 12% and 11%); Amodio et al. (2025b, eliminating monopsony in Peru raises earnings 26%); Bachmann et al. (2022, monopsony caused a 10% aggregate productivity loss in East Germany). Its contribution is to add the entrepreneurial-selection and innovation channels, yielding larger losses than static studies, and to bridge the monopsony-cost literature with the misallocation literature (Restuccia-Rogerson, Guner et al., Hsieh-Klenow).

Q6. What are the policy implications and their scope conditions?

Raising labor market competition (higher firm-level labor supply elasticity) improves allocative efficiency, selection into entrepreneurship, and innovation, raising firm growth and aggregate productivity. Scope conditions: the quantitative results apply to middle- and high-income countries (sample restricted to those ever above 25,000 USD GDP per capita); the 25% headline depends on the assumption that initial productivity and amenities are independent (falls to ~15% under positive correlation); and the decomposition attributing 85% to innovation is specific to the Netherlands-vs-Greece comparison. The model treats labor supply elasticity differences as the sole varying parameter, so the counterfactuals isolate the labor-market-power channel rather than reproducing total cross-country income gaps.

Q7. What is the Netherlands-vs-Greece comparison specifically?

Greece has roughly half the GDP per capita of the Netherlands (29,000 vs 54,000 USD) and much weaker competition (wage markdown 2.623 vs 1.301, labor supply elasticity 0.616 vs 3.318). In the Greece counterfactual, average firm size is 26 vs 59 employees, life-cycle growth 84.5% vs 153%, average age 22.5 vs 30 years, and R&D investing share 18% vs 41%. Labor market competition differences explain 29% of the firm-size gap, 27% of the firm-age gap, and 74% of the R&D-share gap between the two countries.

Q8. What does the model get right that was not targeted?

The firm size and age distributions are not targeted yet are matched: in the data ~57.6% of firms have <20 employees and ~6.2% have >100; ~60% of firms are under 30 years old and ~10% over 60. The estimated parameters imply investing firms are 15% more likely to grow (p_i=0.649 vs p_n=0.499); innovation and operating costs equal ~43% and ~8% of average incumbent profits respectively; standard errors are small, indicating informative moments.

Key Concepts

Import Liberalization as Export Destruction? Evidence from the United States

Thu, 01 Jan 2026 00:00:00 +0000

Layer 1: Overview

Research question and motivation. How does import liberalization affect a country’s export performance and welfare? Economic theory (Graham 1923, Ethier 1982, Krugman 1984) shows the answer hinges on whether production exhibits increasing returns to scale at the sector level. Krugman (1984) argued that with scale economies, import protection can be export-promoting because a protected industry expands, exploits scale economies, becomes more productive, and exports more — so conversely import liberalization is “export destroying.” The paper turns this logic into an empirical test: the sign of the import-liberalization-to-export relationship discriminates between constant-returns and increasing-returns trade models. Researchers otherwise lack tools to choose between these model classes, yet the choice matters greatly for multi-sector trade policy analysis.

Model and data. The authors build a multi-sector general-equilibrium gravity model generalizing Krugman (1980) to many countries/sectors with input-output linkages (as in Caliendo-Parro 2015). The model nests constant returns (Armington, σ→∞) and increasing returns. The “scale elasticity” is 1/(σ−1); the “output elasticity” of exports equals the trade elasticity (ε−1) times the scale elasticity, and is positive iff there are increasing returns. The empirical application exploits US Permanent Normal Trade Relations with China (PNTR), passed Oct 2000, which removed tariff-revocation uncertainty. Exposure is measured by Pierce-Schott’s NTR gap (log gap between non-NTR and NTR tariffs; mean 0.23, SD 0.13, range 0–0.59). Trade data are from CEPII BACI; the baseline sample covers exports from 23 OECD countries (including the US) to 141 importers across 444 NAICS goods industries, in long differences (1995–2000 pre-period vs 2000–07 post-period).

Main findings. Reduced-form: US export growth fell in higher-NTR-gap industries after PNTR. The raw Figure 1 slope is −0.51 (SE 0.057); a 10-log-point NTR-gap increase is associated with 5.0 log points lower annual export growth, and the NTR gap explains 18% of cross-industry variation. This is inconsistent with constant returns and implies increasing returns in US goods production. An offsetting input cost effect (lower imported-input costs) raises exports: PNTR reduced 2007 exports by 13% more for a 75th- vs 25th-percentile NTR-gap industry, but raised them 20% more for a 75th- vs 25th-percentile input-cost-shock industry; net effects range from −18% (Cigarettes) to +56% (Automobiles). A structural IV (NTR gap instrumenting output growth) yields an output elasticity of 0.74 (SE 0.41, preferred column).

Quantitative GE results. Calibrating the output elasticity to 0.821 (matching the −0.10 conditional NTR-gap effect; trade elasticity set to 5), PNTR raised aggregate US exports/GDP by 3.2%, decomposed into −1.8% real market potential (export destruction), +2.4% input cost, and +2.7% foreign demand. Aggregate export growth is 28% larger with scale economies than without, because scale economies make the input-cost effect almost five times stronger (2.4% vs 0.5%). Exports nevertheless declined in the most exposed sectors (Textiles & Leather, Other Manufacturing), shifting US comparative advantage away from high-NTR-gap sectors. Welfare: PNTR raised US real income 0.068% (real expenditure 0.087%); gains are ~30% smaller than under constant returns because a negative specialization effect (−0.15%) offsets a larger ACR openness gain (0.22%). Chinese gains exceed US gains tenfold.

In depth

Q1. What is the core theoretical test and why does the sign of the import-liberalization-to-export relationship identify returns to scale?

From the bilateral trade equation, the elasticity of exports to output equals the output elasticity (ε−1)/(σ−1), which is strictly positive iff there are increasing sector-level returns. Under constant returns (Proposition 1), conditional on foreign demand and domestic input costs, import liberalization does not affect exports (α1=0). Under increasing returns (Proposition 2), import liberalization shrinks domestic real market potential, lowers output, and — because productivity falls with output under scale economies — reduces exports to ALL destinations (α1<0), with the effect’s magnitude strictly increasing in the output elasticity. So estimating whether export growth falls in more-liberalized industries distinguishes the two model classes.

Q2. What is the identification strategy and its main threats?

A triple-difference: changes in US bilateral export growth by sector after PNTR relative to changes in other OECD exporters’ growth, identified from the NTR gap interacted with Post and a US-exporter dummy. The estimating equation (12) uses importer-exporter-industry, importer-exporter-period, and importer-industry-period fixed effects to absorb importer demand, common-across-exporter technology shocks, and industry trends in supply capacity and trade costs. The NTR gap is plausibly exogenous because variation stems mostly from Smoot-Hawley (1930) non-NTR tariffs, unlikely related to economic conditions 70 years later; any endogeneity from NTR tariffs being higher in weak-growth industries would bias against finding a negative effect. Threat 1: unobserved US-specific technology shocks negatively correlated with the NTR gap not captured by input/skill/capital intensity controls. Addressed by re-estimating at HS 6-digit level with NAICS-industry-exporter-period fixed effects (Table 3), still finding negative effects. Threat 2: US-China competition in third markets — if PNTR shifted China’s export basket toward US-type products in high-NTR-gap industries. Tested by interacting with China’s market share (Table 4); the quadruple interaction is positive and insignificant, ruling this out.

Q3. What are the three mechanisms and how are they distinguished empirically and quantitatively?

(1) Real market potential / export destruction: import liberalization lowers the US price index, makes the domestic market more competitive, shrinks real market potential and output, and (under scale economies) cuts productivity and exports — identified by the negative α1 on the NTR gap. (2) Input cost effect: lower imported-input costs cut production costs and raise exports — identified by α2 on the input-output-weighted upstream NTR gap (CostShock), found negative and significant (lower input costs → higher exports). (3) Foreign demand effect: GE expansion of global demand and the trade-balance link between imports and exports — absorbed by fixed effects in the regression but recovered in the calibrated model’s decomposition (equation 16). In GE: −1.8% (market potential), +2.4% (input cost), +2.7% (foreign demand), netting +3.2%.

Q4. What heterogeneity is documented?

Sector-level: the real market potential effect is negative in all goods sectors and stronger where the NTR gap is higher; the input cost effect is positively correlated with the NTR gap (due to heavy diagonal weight in the I-O table); the foreign demand effect is positive everywhere but uncorrelated with the NTR gap. Net exports/GDP rise in 12 of 15 goods sectors but fall in the highest-NTR-gap sectors — Textiles & Leather falls 22% (−32% market potential, +8.5% input cost, +4.6% foreign demand) and exports decline in 3 of the 4 highest-NTR-gap sectors. Under constant returns, by contrast, export growth is positive in all sectors and weakly POSITIVELY correlated with the NTR gap — qualitatively opposite. The correlation between sector-level export growth with vs without scale economies is insignificant (excluding Textiles & Leather) or significantly negative (including it).

Q5. What robustness checks are run?

Appendix C checks robustness to: starting the post-period in 2001 instead of 2000; alternative NTR-gap definitions; aggregating exports across destinations; varying the exporter/importer/industry samples; allowing PNTR to affect domestic expenditure; and controlling for China import growth driven by non-PNTR shocks. An event study (equation 13, Figure 2) shows no NTR-gap/export relationship before 2000 and a negative one from 2001 until the 2007–08 financial crisis, ruling out pre-trends. The first-stage (Table 5) confirms higher-NTR-gap industries had lower OUTPUT growth (paralleling Pierce-Schott’s employment result). Alternative calibrations (Appendix D.5): without I-O linkages the market potential effect weakens but total export growth is roughly unchanged; allowing services scale economies raises US gains; combining Textiles & Leather with Other Manufacturing preserves results; using Bartelme et al. (2019) sector-varying elasticities still yields a negative specialization effect.

Q6. How is the output elasticity calibrated and how does it compare to the structural estimate?

The output elasticity for goods is calibrated to 0.821 by matching the simulated NTR-gap effect to the −0.10 conditional reduced-form estimate (Table 2, column i), with services output elasticity set to zero and trade elasticity (ε−1) set to 5 (Head-Mayer 2014). This is below the value of 1 implied by Krugman (1980) or the Pareto-Melitz model but close to the Bartelme et al. (2019) mean of 0.83. It is reassuringly close to the independent structural IV estimate of 0.74 (SE 0.41). The simulated effect is decreasing in the output elasticity (consistent with Proposition 2 part ii) and rises sharply as the elasticity approaches one; the model has a unique solution for output elasticities below 0.95.

Q7. How does the welfare decomposition work and why are gains smaller with scale economies?

Following Costinot-Rodríguez-Clare (2014), real-income gains decompose into an ACR term (changes in domestic expenditure share / trade openness) and a specialization term that exists only with scale economies (welfare from sectoral reallocation of employment, weighted by adjusted Leontief forward-linkage coefficients). With scale economies the ACR effect is +0.22% (vs +0.10% without), but it is more than offset by a −0.15% specialization effect, netting +0.068% real income — about 30% below the constant-returns gain. The specialization effect is negative because PNTR shifted resources toward services (weaker scale economies; goods output −0.55%, services +0.11%) and, more importantly per Appendix D.5, toward sectors with weaker FORWARD input-output linkages; cross-sectoral heterogeneity in scale economies alone contributes negligibly.

It extends Krugman (1984)’s partial-equilibrium oligopoly mechanism to a class of quantitative GE trade models (love-of-variety, external economies, Melitz-Pareto, or endogenous innovation — shown equivalent in Appendix A.3). Unlike prior scale-economy estimates (Antweiler-Trefler 2002, Lashkaripour-Lugovskyy 2018, Bartelme et al. 2019) and home-market-effect tests (Davis-Weinstein 2003, Costinot et al. 2019), it uses TRADE POLICY variation (not factor content, market size, or exchange rates) for identification and performs an ex-post policy analysis (echoing Goldberg-Pavcnik 2016). Relative to the PNTR/China-shock literature (Pierce-Schott 2016, Handley-Limão 2017, Autor-Dorn-Hanson 2013), it adds a new outcome — US EXPORTS and comparative advantage — and argues the ‘surprisingly swift’ manufacturing decline would have been smaller absent scale economies. It complements Juhász (2018)’s infant-industry evidence (Napoleonic France) by quantifying the export-destruction cost while showing PNTR’s net effect on exports and welfare is positive. Dick (1994) tested the same hypothesis cross-sectionally for 1970 US data but found little support.

Q9. What are the policy implications and their scope conditions?

The findings support the existence of the scale-economies channel traditionally invoked to justify protection: pre-PNTR import protection shifted US comparative advantage toward the most-protected industries, and in the calibrated model targeted import protection CAN promote sector-level exports — but not under constant returns. However, the export-destruction effect is dominated, for most sectors and in aggregate, by export-promoting channels (input cost, foreign demand); total export growth is even greater WITH scale economies; and the negative specialization effect is more than offset by traditional gains from trade, so US gains from PNTR remain positive (+0.068% real income). Scope conditions: results rest on the calibrated output elasticity (0.821) and trade elasticity (5); the model assumes constant markups and full employment, so welfare excludes pro-competitive effects (Jaravel-Sager 2020, Amiti et al. 2020) and employment effects (Autor-Dorn-Hanson 2013); it studies a single liberalization episode; and the analysis cannot distinguish among alternative SOURCES of increasing returns. The authors stress accounting for scale economies (or their absence) is a prerequisite for correctly evaluating sector-level trade flows and welfare.

Q10. What other notable findings or caveats appear?

PNTR is calibrated as a reduced-form openness shock (α5=0.43; equation 15), equivalent to a 13% average trade-cost reduction on US imports from China (SD 6.6% across industries) given trade elasticity 5 — matching Handley-Limão’s 13-percentage-point estimate. The calibrated economy has 12 economies and 24 sectors (15 goods). Chinese gains exceed US gains more than tenfold (because the US was much larger in 2000, so PNTR was a bigger shock to China), and China’s nominal wage rose 6.0% relative to the US, contributing to factor-price convergence. For comparison, Caliendo-Parro (2015) find NAFTA raised US welfare 0.08% and Fajgelbaum et al. (2020) find the Trump trade war cut US real income 0.04%. The model in changes is solved via exact hat algebra, holding each country’s trade deficit as a constant share of global value-added (which induces the positive import-export link in the foreign-demand term).

Key Concepts

Means-Tested Transfers in the US: Facts and Parametric Estimates

Thu, 01 Jan 2026 00:00:00 +0000

Layer 1: Overview

Guner, Rauh, and Ventura document the scope, generosity, distributional impact, and time evolution of means-tested transfers to working-age US households, and provide parametric estimates of transfer functions for use in applied macroeconomics and public finance. The paper addresses three questions: How large are these transfers? How do they affect income inequality? How have they changed over time? The contribution is descriptive and empirical rather than structural; the paper does not estimate behavioral effects but rather characterizes the effective transfer schedule that households face.

The data source is the Survey of Income and Program Participation (SIPP), using five waves spanning 1998 to 2016. The benchmark analysis uses the 2014 wave (years 2013–2016). The sample is restricted to household-years in which the head is aged 25–54, is not self-employed, and does not switch marital status within the year — yielding 18,612 households and 38,375 household-year observations. Six programs are covered: TANF, SNAP, WIC, SSI, housing assistance, and Medicaid. For TANF, SNAP, WIC, and SSI, transfer values are observed directly. Medicaid values are imputed using regional HMO premium costs; housing values are imputed as the difference between Fair Market Rent and actual rent paid.

In the 2013–2016 benchmark period, approximately 35% of working-age households receive some means-tested transfer in a given year, and, conditional on receipt, the average household receives about $17,000 (in 2016 dollars), exceeding one-fourth of average household income. Unconditional total transfers decline steeply with income but in a non-monotone way: households with zero non-transfer income receive $7,500 in non-medical and $13,700 in Medicaid transfers ($21,000 total, or 26% of mean household income). Transfers dip for households with small positive incomes (creating a hump shape), then rise slightly before declining again. At the bottom income decile (0–10%), households receive on average $4,125 in non-medical transfers and $14,141 total. At the median income decile (50–60%), households receive $425 non-medical and $3,006 total. In the top decile, non-medical transfers are negligible ($169) and total transfers are $1,200. The decline in unconditional transfers with income is driven primarily by reduced coverage: conditional on receipt, transfer amounts are relatively stable across income levels, remaining above 15% of mean household income throughout the distribution. The extensive margin of coverage is 82% for zero-income households, 70% for the bottom decile, 29% at the median, and still 5% (non-medical) to 11% (including Medicaid) in the top decile.

Medicaid is the dominant program throughout. For zero-income households, Medicaid transfers are more than six times larger than the next-largest program (SNAP). Medicaid’s share of total transfers rises with income. As a single program, Medicaid reaches 31% of working-age households with an average conditional benefit of about $15,000 per recipient. SNAP covers 18% of households with conditional benefits of about $3,000.

Transfers substantially compress inequality. The pre-transfer Gini coefficient is 0.48 and falls to 0.42 when all transfers (including Medicaid) are included, and to 0.46 with non-medical transfers only. The pre-transfer 50-10 income ratio of 10.2 drops to 3.0 with all transfers and to 5.6 with non-medical transfers only. The variance of log income falls by nearly 36% (47 log points) with all transfers and by 21% with non-medical transfers. These equalizing effects are concentrated at the bottom of the distribution; for households at 10% of average pre-transfer income, total transfers more than double disposable income.

Between 1998–1999 and 2013–2016, total unconditional transfers per household quadrupled from approximately 2% to 7.3% of mean household income (from about $1,535 to $6,000). Household coverage rose from 19% to 35%. The expansion is driven almost entirely by Medicaid; non-medical transfers rose only marginally in magnitude (from about 1.3% to 1.8% of mean income), though their coverage increased from 16% to 24% of households. Notably, over this period the concentration of non-medical transfers shifted upward in the income distribution: households with zero income received a smaller relative share in 2013–2016 than in 1998–1999, while shares for households in the second, third, and fourth deciles increased. Pre-transfer income inequality rose substantially over the period, with the Gini increasing from 0.40 to 0.48; the post-transfer Gini rose more moderately, from 0.38 to 0.42, indicating that transfer growth largely offset rising market-income inequality at the bottom.

For the parametric section, the paper estimates a flexible four-parameter Ricker-style function T(I) = exp(alpha) * exp(beta_0 * I) * I^beta_1 for positive income I (normalized by mean income), with a separate level parameter gamma at I = 0. This captures the hump-shaped pattern at low incomes and the rapid decline thereafter. Implicit benefit reduction rates derived from these estimates are large: earning one additional dollar when starting from zero income reduces total transfers by more than $11,000, as crossing from zero into positive income sharply reduces program eligibility. A more realistic $10,000 income increase reduces total transfers by more than $5,000 — an implicit marginal tax penalty exceeding 50%. Non-medical transfer penalties are somewhat smaller: the first dollar earned reduces non-medical transfers by more than $4,500, and a $10,000 income increase reduces them by about $3,300.

In depth

Q1. What is the identification strategy and what are the main threats to it?

The paper is descriptive, not causal — there is no causal identification strategy in the traditional sense. The authors document reduced-form facts about transfer receipt by income level and demographic group using SIPP microdata. The main methodological choices and data limitations are: (1) Medicaid and housing assistance values are imputed rather than directly observed — Medicaid is valued at regional HMO premiums, which may not accurately reflect the value recipients place on coverage; housing benefits are valued at the difference between state Fair Market Rent and actual rent paid, which can produce negative values (2.7% of cases, set to zero). (2) SIPP is known to under-report income at the top of the distribution relative to the CPS; the paper documents that income shares of the top quintile differ by about five percentage points between SIPP and CPS, largely due to SIPP’s poor measurement of asset income. This means the effective transfer schedule at the top of the income distribution may be somewhat distorted. (3) The SIPP was overhauled after 2016, precluding analysis of more recent waves and meaning the trends analysis ends in 2013–2016. (4) Self-employed households are excluded (~7% of households) as their income measurement is noisier.

Q2. How does the paper handle the non-linear hump-shaped pattern in transfers at low income levels?

The paper documents a hump-shaped pattern: transfers are positive at zero income, fall sharply at very low positive income (around the bottom 1% of the distribution), then increase modestly before declining monotonically. This arises because crossing from zero income to any positive income can reduce eligibility for several programs simultaneously. The parametric functional form — the Ricker function from fisheries biology — is specifically chosen to capture this pattern: for I > 0, T(I) = exp(alpha) * exp(beta_0 * I) * I^beta_1, where the beta_0 term governs the initial decline/rise and beta_1 allows further curvature. The zero-income level gamma is estimated separately as a discontinuity. The tight confidence intervals around observed income-percentile averages confirm that the fitted function closely tracks the data.

Q3. What heterogeneity by demographic group is documented?

The paper documents heterogeneity along three dimensions — marital status, number of children, and age of children — in each case reporting both unconditional and conditional transfer amounts and coverage by income decile. Key findings: (a) Marital status: Single-woman households with zero income receive 12% of mean household income in non-medical transfers and about 31% in total transfers. Married households with zero income receive 27% total, and single men receive 17.9% total. At higher income levels, married households can receive more in total transfers than single women, because Medicaid coverage is broader for families. Single-woman households show the highest coverage at very low incomes (88% receive some transfer), but married households lead in coverage at middle income levels. Single men show surprisingly high coverage even at relatively high incomes. (b) Number of children: Transfers increase substantially with children. A first-decile married household without children receives about 1.7% of average income in non-medical transfers and 9% total; with two or more children, non-medical transfers rise nearly five-fold for single-woman households in the same decile. (c) Age of children: Transfers decline as children age, but the magnitude of the age gradient is smaller than the number-of-children gradient.

Q4. How do conditional and unconditional transfers compare across the income distribution?

Unconditional transfers (averaged over all households including non-recipients) decline steeply with income, driven primarily by falling coverage rates. Conditional transfers (among recipients only) are much more stable. For zero-income households, total conditional transfers average $26,500 (32% of mean income) versus $21,000 unconditionally. In the bottom decile, conditional total transfers are about $21,000 or 26% of mean income. After the third income decile, conditional transfer levels stabilize and remain above 15% of mean income throughout most of the distribution. This means that once a household is enrolled in the transfer system, the amounts received are relatively constant regardless of where in the distribution they fall; the intensive margin differences are largely accounted for by Medicaid, which has high conditional values even at middle income levels.

Q5. What role does Medicaid play relative to non-medical programs?

Medicaid dominates the transfer system for working-age households by every measure. It reaches 31% of households in the benchmark period (the next largest program, SNAP, covers 18%). For zero-income households, Medicaid transfers are more than six times larger than SNAP (the next largest non-medical program). Medicaid’s share of total transfers grows with income: for zero-income households, total transfers are less than three times non-medical transfers; for households in the 50–60th percentile, this ratio exceeds six. In terms of aggregate spending, Medicaid rose from below 1% of GDP in 1980 to more than 3% in 2022, while non-medical transfers declined from 1.6% to about 1% of GDP over the same period. Almost the entire growth in household transfers between 1998 and 2016 is attributable to Medicaid expansion. Medicaid is also the most important single contributor to measured inequality reduction.

Q6. How do transfers affect income inequality and how has this changed over time?

In the 2013–2016 benchmark, total transfers reduce the Gini coefficient by 6 points (from 0.48 to 0.42) and the variance of log income by nearly 36%. The 50-10 income ratio falls from 10.2 to 3.0. Non-medical transfers alone reduce the Gini by 2 points (to 0.46) and the 50-10 ratio to 5.6. The impact is concentrated at the bottom of the distribution: transfers more than double total income of households with pre-transfer income around 10% of the mean. Over time, pre-transfer inequality rose sharply, with the Gini going from 0.40 (1998–1999) to 0.48 (2013–2016) and the 50-10 ratio doubling from 4.19 to 10.2. Post-transfer inequality rose more mildly: the Gini increased from 0.38 to 0.42 (all transfers), and the 50-10 ratio remained stable at around 3 throughout. Excluding Medicaid, the moderating effect is weaker; the Gini rose from 0.39 to 0.46 on a post-non-medical-transfer basis.

Q7. How has the concentration of transfers across income groups evolved over time?

A notable distributional shift occurred between 1998–1999 and 2013–2016. For non-medical transfers, the share accruing to households with zero income declined substantially — from receiving about $9 per $100 of total transfers distributed in 1998–1999 to about $4 in 2013–2016. Similarly, the relative share for the bottom decile declined. In contrast, the share going to households in the second, third, and fourth income deciles increased. For total transfers including Medicaid, the pattern is similar but the shift is less pronounced, partly because Medicaid expansion was broad and reached middle-income working families. The authors interpret this as reflecting the design changes in the transfer system: TANF (which targeted the very bottom) declined sharply while Medicaid expansion (which reaches further up the distribution) grew.

Q8. What are the implicit benefit reduction rates and why do they matter?

The paper derives implicit benefit reduction rates from the estimated parametric transfer functions. At zero income, earning the first dollar of income triggers a very large decline in transfers because eligibility for several programs is lost simultaneously. Specifically, earning $1 reduces non-medical transfers by more than $4,500 and total transfers by more than $11,000. This enormous implicit marginal tax reflects the discontinuity at zero income. For more realistic income increments, earning an additional $10,000 when starting from zero income reduces total transfers by more than $5,000 (over 50% implicit tax rate) and non-medical transfers by about $3,300. These findings are directly relevant for quantitative macroeconomic models that study labor supply and welfare, since the effective marginal tax on low-income workers entering employment is substantially higher than the statutory rate.

Q9. How does the paper differ from prior work on parametric tax and transfer functions?

The closest antecedents are Gouveia and Strauss (1994), Heathcote, Storesletten, and Violante (2017) (who use the Benabou log-linear tax function), and Guner, Kaygusuz, and Ventura (2014) (who provide effective income tax estimates). Prior work either focused on taxes only or combined taxes and transfers into a single progressivity measure. This paper is the first to estimate effective transfer functions separately from the tax system, decomposed by program, by marital status, and by number of children. Relative to Guner et al. (2023), which assumed transfers decline linearly with income, this paper estimates a more flexible non-linear function that captures the hump at very low incomes. Relative to Ferriere et al. (2023), who propose a transfer function that increases then decreases with income, the current paper provides empirical estimates rather than a theoretical prescription. The functional form (a Ricker-style function with a separate parameter at zero income) is also more flexible than prior approximations.

Q10. What data limitations are noted and how do they affect comparability with other sources?

The paper compares SIPP income distributions with the CPS. Both surveys yield similar Gini coefficients and variance of log income, but SIPP shows higher income shares for the bottom quantiles and lower shares for the top quintile (a discrepancy of about five percentage points). This reflects SIPP’s weaker measurement of asset income, which is a larger component of total income as one moves up the distribution. The analysis excludes self-employed households (~7%) because their income is harder to measure. The SIPP was overhauled after 2016, making cross-wave comparisons infeasible for later years; this means the paper cannot characterize the effects of post-2016 Medicaid expansion, the COVID-19 pandemic transfer surge, or recent SNAP reforms. For Medicaid, the imputation using regional HMO costs does not capture the insurance value as households themselves perceive it, a standard limitation in this literature also noted by Ben-Shalom et al. (2012) and Scholz et al. (2009) whose methods the paper follows.

Q11. What are the policy implications of the findings?

Several implications follow with scope conditions: (1) The transfer system substantially reduces income inequality, but the lion’s share of the reduction comes from Medicaid. Policies that reduce Medicaid coverage would substantially raise measured inequality, particularly at the bottom of the distribution. (2) The implicit benefit reduction rates documented — above 50% for a $10,000 income gain at the bottom — generate large effective marginal taxes on low-income households entering employment, relevant for evaluating welfare-to-work policies and for calibrating labor supply elasticities in quantitative models. (3) Despite the large size of the system, the decline in TANF spending (from above 1% of GDP to 0.1%) means that unrestricted cash assistance to the very poorest has fallen sharply; the system has shifted toward in-kind and medical programs that provide less flexibility to recipients. (4) The shift in transfer concentration away from zero-income households toward the second through fourth deciles suggests that the system increasingly supports the working poor rather than the non-working poor — a structural change in the composition of welfare that quantitative models should incorporate. These implications pertain to households headed by working-age adults (25–54), are based on pre-2016 data, and exclude the institutionalized population and self-employed households.

Q12. What are the key features of the parametric function and how well does it fit the data?

The estimated function has the form T(I) = exp(alpha) * exp(beta_0 * I) * I^beta_1 for I > 0 and T(0) = gamma, estimated by non-linear least squares on income-percentile averaged data. The function is flexible enough to capture: (a) a strictly positive level at zero income; (b) an initial increase then decrease at very low positive incomes (the hump); (c) a decay toward zero at high incomes that can be faster or slower depending on beta_1. The fit is shown to be close — Figure 7 documents tight confidence intervals around mean transfers by percentile, confirming that a smooth function well approximates the data. Parameter estimates are provided for each individual program, for non-medical aggregates, for total transfers, and separately for married and single households and by number of children (in appendix tables C10–C12). The zero-income gamma parameter is notably small for TANF (0.00) and large for Medicaid (0.24) and total transfers (0.26), consistent with the descriptive findings on coverage.

Key Concepts

Means-tested transfer: In this paper, a government transfer program for which eligibility and benefit amounts are conditioned on household income and assets, targeting the non-retired working-age population. The six programs studied are TANF, SNAP, WIC, SSI, housing assistance, and Medicaid.

Intensive margin of coverage: The fraction of months in a given calendar year during which a household receives a positive transfer amount, as distinct from the extensive margin (whether the household receives any transfer at all during the year). The paper documents both margins separately.

Implicit benefit reduction rate (implicit penalty): The reduction in transfer payments associated with a marginal increase in non-transfer income, expressed as the derivative of the estimated transfer function with respect to income. In this paper the implicit penalty at zero income is very large because moving from zero to any positive income simultaneously triggers loss of eligibility in multiple programs.

Unconditional vs. conditional transfer: Unconditional transfers are averages computed over all households at a given income level, including non-recipients. Conditional transfers are averages computed only among households that actually receive a positive amount. The paper shows that the steep decline in unconditional transfers with income is almost entirely a coverage effect; conditional amounts remain relatively stable across the distribution.

Ricker transfer function: The parametric functional form T(I) = exp(alpha) * exp(beta_0 * I) * I^beta_1 adopted by the paper to fit the non-linear relationship between normalized household income and normalized transfer receipt for I > 0, with a separate parameter gamma for I = 0. Borrowed from the Ricker (1954) stock-recruitment model in fisheries biology and chosen for its flexibility in capturing the hump-shaped pattern at very low incomes.

Non-medical transfers: The aggregate of TANF, SNAP, WIC, SSI, and housing assistance — the programs that provide cash or in-kind support excluding health insurance. The paper distinguishes these from total transfers throughout to separate the role of Medicaid, which dominates all other programs in magnitude.

Medicaid imputation: The procedure used to assign a monetary value to Medicaid enrollment, following Scholz et al. (2009) and Ben-Shalom et al. (2012). Each enrolled household member is assigned the cost of a single HMO policy in their Census region (from the Kaiser Foundation Employer Health Benefits survey), with family policies or sums of individual policies used for multi-member households, and a 2.5× multiplier for elderly or disabled individuals to reflect higher medical needs.

Medical innovation and health disparities

Thu, 01 Jan 2026 00:00:00 +0000

Layer 1: Overview

This paper asks why medical innovation can widen health disparities even when it unambiguously improves health for everyone who takes it. The authors argue that the standard access-versus-preferences dichotomy is a false one: disadvantaged patients can rationally forgo effective medications because treatment side effects interfere with work, and the income cost of not working is particularly severe for low-education workers who hold physically demanding, inflexible jobs. Health-maximizing and welfare-maximizing behavior are therefore not the same thing, and the gap between the two is systematically larger for lower-education individuals.

The empirical setting is the introduction of Highly Active Antiretroviral Therapy (HAART) for HIV in the mid-1990s. HAART was substantially more effective than prior mono- and combo-therapy at preventing AIDS progression and death, but it produced harsh physical side effects (fatigue, diarrhea, headache, fever). Data come from the Multi-Center AIDS Cohort Study (MACS), a semi-annual panel of men who have sex with men in Baltimore, Chicago, Pittsburgh, and Los Angeles, covering 1991–2003. After sample restrictions, the analysis uses 11,290 person-visit observations for 1,201 HIV-positive individuals aged 30–64, approximately 63% of whom hold a college degree or more. The study dichotomizes education into less-than-college versus college-or-more and tracks treatment choices, labor supply, immune-system health (CD4 count, with AIDS threshold at 250), physical ailments, income, insurance, and out-of-pocket medical expenditures.

The structural model is a lifecycle discrete-choice dynamic programming framework in which forward-looking individuals simultaneously choose treatment (no treatment, monotherapy, combotherapy, and post-1995 HAART) and full-time work or non-work each half-year period to maximize expected lifetime utility. Health and survival evolve stochastically as functions of prior health, treatment, and age. Utility is a function of consumption (income minus out-of-pocket expenses), ailments, and labor supply, with utility parameters allowed to differ by education. The model is estimated via maximum likelihood using nested backwards induction; the quasi-experimental introduction of HAART as an unanticipated shock helps identify utility parameters.

Key quantitative results: (1) HAART drastically reduced mortality for both groups—six-month mortality fell from 9% to 2% for less-educated men and from 6% to 1% for college graduates—and raised the probability of maintaining a high CD4 count from 62% to 78% (less-educated) and 68% to 83% (college+). (2) Despite equivalent access (both groups face roughly 91-95% insurance coverage and similarly low out-of-pocket costs), lower-educated men adopted HAART at a lower rate (58% of post-HAART visits versus 66% for college graduates) and approximately five months later. (3) The structural utility parameters confirm that while the direct disutility of ailments is not significantly different across education groups, the disutility of working while experiencing ailments is substantially larger in magnitude for less-educated men (estimated parameter -2.73) than for college graduates (-1.97). (4) Measured as expected lifetime utility, HAART’s introduction increased value for low-CD4 men by 236.1% (less-educated) versus 176.6% (college+), but in absolute utility units the gains were larger for college graduates—establishing that HAART increased welfare inequality. (5) Decompositions show the largest single driver of the education gap in HAART value is the differential survival process; income differences also matter but financial access variables (insurance, out-of-pocket costs) explain little. (6) A simulated six-month HAART mandate improves health—by 1.7 percentage points more for less-educated men—but reduces expected lifetime value by 2.8% for the less-educated versus 1.4% for college graduates, and reduces employment by 4.1% versus 1.6%, as mandated HAART forces men into ailment-producing treatment whose side effects they cannot manage alongside work. (7) A counterfactual $10,000-per-six-months non-labor income subsidy (similar to COVID-19 transfer policies) reduces work by 31–49% for less-educated men and by 25–39% for college graduates, while inducing an 81.2% increase in HAART take-up among less-educated men in good health who were not previously on treatment (from 5% to 9% baseline probability), and a 44.5% increase for similar college graduates (8% to 11%). For men with AIDS-level CD4 counts not on treatment, the policy raises the probability of being healthy next period by 12.6% for less-educated men and 5.3% for college graduates.

The central mechanism is a wedge between health and welfare that is steeper for disadvantaged workers: occupational conditions make it harder to work while experiencing side effects, so the opportunity cost of HAART compliance is higher. This means effective medical innovation—precisely by creating more severe side effects than older regimens—can widen welfare inequality even as it compresses mortality gaps. Clinical trials that randomize assignment to treatment and measure health outcomes will register the innovation as a success while masking the distributional welfare costs. Policy interventions that reduce the cost of not working (income transfers, labor market restructuring) can simultaneously increase HAART take-up and improve health, with effects concentrated among the disadvantaged.

In depth

Q1. What is the main identification strategy and what are the key threats to identification?

The model is estimated by maximum likelihood using nested backwards induction over observable state variables. A key identifying variation is the quasi-experimental, unanticipated introduction of HAART in 1995, which shifts the choice set mid-panel and allows the authors to trace behavioral responses to an exogenous change in treatment efficacy and side-effect profiles. Disutility of ailments and work parameters are identified by conditional choice probabilities given state variables (health, ailment status, prior treatment) and by comparing behavior before and after HAART availability. The authors follow Magnac and Thesmar (2002) to establish that under the distributional assumptions (Type I EV shocks, fixed discount factor β=0.95) and the normalization imposed, the likelihood has a unique maximum. The main threats are: (a) the assumption that individuals were surprised by HAART (no forward-looking anticipation), which simplifies the model but is explicitly noted—Hamilton et al. (2021) show that incorporating individual expectations substantially complicates the framework; (b) the exclusion of unobserved heterogeneity in the utility function, though specifications including it produce very small probabilities of a second type (below 5%); (c) the absence of borrowing and saving, which could allow more educated individuals to smooth consumption across treatment cycles—the authors note this would bias downward the disutility of working with ailments for higher-educated individuals, meaning the estimated cross-education difference in that parameter is a lower bound; (d) the sample is restricted to white men in four cities, limiting external validity; and (e) the education dichotomy collapses heterogeneity within education groups.

Q2. What are the main mechanisms through which education moderates the health-welfare tradeoff, and how are they distinguished empirically?

The paper identifies two nested channels. First, the estimated structural utility parameter for working while experiencing ailments is larger in magnitude for less-educated men (θ = -2.73) than for college graduates (θ = -1.97), indicating greater disutility from combining work and side effects. The paper argues this reflects occupational sorting: lower-education men are significantly more likely to hold manual occupations (occupation score 5.12 versus 4.49 for college graduates, where higher scores indicate more manual tasks per Autor et al. 2003), making physical side effects especially incompatible with job performance. Second, lower-educated men have lower incomes ($15,373 versus $22,290 per half-year for less-educated versus college-educated, pre-HAART), so the income cost of not working is larger in relative terms, creating stronger incentives to maintain employment even at the cost of forgoing treatment. The authors decompose the relative contribution of these mechanisms in the non-labor income subsidy simulation: when they give lower-educated men the income process of higher-educated men (Appendix Figure A1), the gap in behavioral response narrows but does not close; when they give lower-educated men the disutility parameters of higher-educated men (Figure A2), similarly the gap narrows but remains. Both mechanisms are jointly operative.

Q3. What heterogeneity in HAART take-up and welfare value is documented?

Education is the primary heterogeneity dimension examined. Post-HAART, lower-educated men used HAART in 58% of observations versus 66% for college graduates, were slower to start (5 months later on average), and less likely to ever use it (67% versus 81%). Health status interacts with education: low-CD4 men gain more in percentage terms from HAART because they are more in need of its health-improving effects (236.1% gain for less-educated low-CD4 versus 176.6% for college-educated low-CD4; 85.7% versus 76.3% for high-CD4 men, with college graduates gaining more in absolute utility units throughout). The welfare cost of a treatment mandate is higher for less-educated men (2.8% lifetime value decline versus 1.4%), and the employment reduction induced by the mandate is also larger for them (4.1% versus 1.6%). In the income subsidy simulation, low-CD4 men not on any medication show the largest health response. The paper does not examine race/ethnicity heterogeneity, having excluded non-white individuals from the analysis due to sampling methodology concerns.

Q4. What does the value decomposition reveal about why HAART benefited more-educated men more?

Table A17 sequentially replaces the processes and parameters of lower-educated agents with those of higher-educated agents. Giving lower-educated men the income process of college graduates narrows but does not close the gap—income is not the primary driver. Replacing the insurance and medical expenditure processes slightly reduces value for less-educated men relative to giving them only the income process, because more-educated individuals actually have somewhat higher out-of-pocket costs. Changing the health and ailments processes has modest positive effects. The largest single contributor to closing the education gap is the survival process: less-educated men face much higher baseline mortality, which depresses the expected present value of all future flows including the gains from HAART. This suggests that policies targeting survival differentials (e.g., access to other health services) could partially close the HAART welfare gap. Finally, replacing the utility parameters mechanically closes the remaining gap, but preferences are less amenable to direct policy intervention than the survival process.

Q5. What do the treatment mandate simulations show, and why do they matter for evaluating clinical trials?

A six-month HAART mandate mimics randomized assignment to treatment in a clinical trial. It improves health—the probability of high CD4 rises by 1.7 percentage points more for less-educated men than baseline (reflecting a larger baseline gap in HAART use)—which would appear a policy success from a health-only perspective. However, expected lifetime utility falls by 2.8% for less-educated men and 1.4% for college graduates, because mandated HAART forces individuals into ailment-inducing treatment they would not have chosen, inhibiting labor supply. Employment falls by 4.1% for less-educated men versus 1.6% for college graduates. Appendix analyses removing the ailment-producing properties of treatment largely eliminate both the welfare cost and the employment effect, confirming that ailments are the mediating channel. This shows that clinical trials—which typically report health endpoints and do not measure welfare or distributional consequences—can mask the costs that effective but side-effect-heavy treatments impose, and that those costs fall disproportionately on less-advantaged patients.

Q6. What does the non-labor income subsidy simulation show, and which groups respond most?

A permanent $10,000-per-six-months increase in non-employment income (approximately 50% of median income, calibrated to COVID-era transfer policies) induces labor force exit across all groups but concentrates its health-promoting effects among disadvantaged men who were not already on HAART. Among relatively healthy (high-CD4) less-educated men not using any medication, HAART take-up rises by 81.2% (from 5% to 9%); the corresponding figure for college graduates is 44.5% (from 8% to 11%). Among men with AIDS-level (low) CD4 not on treatment, the probability of being healthy next period increases by 12.6% for less-educated men and 5.3% for college graduates. Men already on HAART—who are unlikely to change treatment regardless—show little response. The policy has small but positive health externalities beyond the immediate recipients, since people on antiretrovirals have lower viral loads and lower transmission risk. Decomposition simulations (Appendix Figures A1–A2) show that both the income-level channel and the disutility-of-work-with-ailments channel independently contribute to the larger lower-education response, with neither alone sufficient to fully explain the differential.

The paper is most closely related to Papageorge (2016, Quantitative Economics), which uses the same MACS data and setting to link non-uptake of HAART to labor supply and side effects. The key difference is scope: Papageorge (2016) focuses on individual-level mechanisms; the present paper’s goal is to characterize distributional differences in the health-welfare tradeoff across education groups and to show that innovation can exacerbate existing inequality. Chan, Hamilton, and Papageorge (2016, Review of Economic Studies) also use the MACS setting to study the value of medical innovation, and Hamilton, Hincapié, Miller, and Papageorge (2021, International Economic Review) examine the diffusion of HAART. Relative to the sociological fundamental cause theory literature (Link and Phelan 1995; Phelan et al. 2010), which documents that medical innovations tend to widen health disparities, the present paper provides a structural quantification of the specific mechanisms and their relative magnitude. Relative to papers attributing health disparities primarily to access barriers (insurance, cost), the paper provides evidence that for this sample—where insurance coverage exceeds 91% even for less-educated men and HIV drugs are inexpensive—access explains little of the educational disparity in HAART use or health outcomes.

Q8. What are the policy implications and their scope conditions?

The core implication is that policies reducing the cost of not working—income transfers, disability benefits, worker protections—can raise HAART adoption and improve health among disadvantaged patients, precisely the group for whom standard health-access policies have limited traction. The non-labor income subsidy simulation suggests that the health improvements are modest in absolute magnitude (a 0.2% rise in probability of being healthy next period for the best-responding group among high-CD4 non-HAART users, and 13% for low-CD4 non-HAART users), but there are unmodeled positive externalities through reduced transmission risk that would multiply the social return. Scope conditions: (1) The sample is white men who have sex with men in four U.S. cities during 1991–2003, enrolled in a prospective cohort study; generalizability to other populations (women, racial minorities, other diseases) is uncertain. (2) The income subsidy that triggers HAART take-up must be large enough to induce labor force exit; a $10,000 per-six-months transfer is needed to generate the simulated behavioral response, larger for higher-income workers. (3) The paper explicitly notes that drug costs and insurance are not binding constraints in this sample, and the policy conclusions may differ in settings with weaker drug coverage. (4) Mental health is excluded from the model; the paper shows depression variables have smaller effects on treatment choice than the physical mechanisms included, but mental health could independently affect some populations’ response. The paper’s conclusions extend to other conditions where effective treatment has disabling side effects and disadvantaged patients hold inflexible physical jobs—the authors invoke COVID-19 as a contemporary analog.

Q9. What robustness checks are conducted?

The authors report several robustness exercises. Treatment transition results are shown to be robust to defining the HAART introduction period as survey visit 23 or 25 rather than 24. Ailment specifications are noted to be robust to varying the type or frequency of ailments counted (citing Papageorge 2016 for this). Specifications including unobserved heterogeneity in the utility function produce very small second-type probabilities (below 5%), arguing against its inclusion. The treatment mandate simulations are run under three alternative shock-assignment methods (2 draws, 8 draws, and the preferred 2-draw approach), with results consistent across methods on the main welfare-versus-health asymmetry. Appendix Tables A19 and A20 remove ailments from all medications and from HAART only, respectively, confirming that the welfare cost of mandates is driven by treatment-induced ailments. Appendix Figures A1 and A2 mechanically decompose the education-differential response to the income subsidy by replacing income processes and disutility parameters separately, confirming that both channels are active. The model fit (Table A9) shows overall employment (66% model, 66% data) and HAART use (33% model, 36% data) closely matching, though the model slightly over-predicts medication use among low-CD4 individuals.

Q10. Why does the paper focus on white men only, and what does this imply for interpretation?

The authors drop 1,098 observations from 390 non-white individuals because of concerns about the sampling methodology used to recruit the refresher sample for those individuals—specifically, non-white participants entered the panel via a different selection process that could confound estimates. The paper does not investigate racial disparities in HAART take-up, which are also well-documented in the literature. This is a significant limitation because HIV/AIDS has disproportionately affected Black men in the United States, and the mechanisms the paper identifies—occupational sorting, income constraints, disutility of working with ailments—may operate differently or more intensely along racial lines. The authors acknowledge this limitation and note that the structural framework could in principle be applied to other groups if appropriate data were available.

Key Concepts

Health-welfare tradeoff: In this paper, the wedge between the action that maximizes health (taking effective medication despite side effects) and the action that maximizes lifetime utility (avoiding medication to remain employed and maintain income). The tradeoff is not a bias or error but a rational response to economic constraints, and it is wider for less-educated individuals whose occupational conditions make working with side effects especially costly.

HAART (Highly Active Antiretroviral Therapy): A combination antiretroviral HIV treatment introduced in the mid-1990s, far more effective than prior mono- or combo-therapy at improving CD4 count and preventing AIDS-level immune decline and death. In this paper’s model, HAART serves as the innovation whose adoption the authors study: it is more efficacious but produces harsher side effects than earlier treatments, and its introduction is treated as an unanticipated aggregate shock.

Disutility of working with ailments: A structural utility parameter (θ_2,f=0) capturing how much worse-off an agent feels from working while experiencing physical ailments (fatigue, diarrhea, headache, fever). Estimated at -2.73 for less-educated men and -1.97 for college graduates, this parameter is the primary driver of the differential health-welfare tradeoff across education groups and explains why side-effect-bearing treatments like HAART are disproportionately avoided by lower-education workers.

Treatment mandate simulation: A counterfactual in which all agents are assigned to HAART for six months (eliminating choice among other treatment options), used to mimic randomized assignment in a clinical trial. The simulation is designed specifically to illustrate that health improvements observable in a clinical trial coexist with welfare reductions and employment disruptions that would not be captured in standard trial endpoints.

Fundamental cause theory: A sociological framework (Link and Phelan 1995) arguing that socioeconomic status is a ‘fundamental cause’ of health disparities that persists despite or is even amplified by medical innovation, because more advantaged individuals are better positioned to adopt and benefit from new treatments. The paper provides structural economic microfoundations for this theory by quantifying the mechanisms through which HAART’s introduction widened the welfare gap.

Non-labor income subsidy: A counterfactual policy simulation in which non-employment income is raised by $10,000 per six months (approximately 50% of the median person’s income), modeled after COVID-19 transfer policies. In the paper’s model this policy reduces employment but increases HAART take-up and health improvements particularly for less-educated HIV-positive men who were previously forgoing treatment to maintain income from work.

Source text origin: Not a paper-specific concept but denoted here: the full working paper text was obtained from the NBER Working Paper (No. 28864), not from abstract-only, satisfying the GUARD requirement.

Mortgage securitization and information frictions in general equilibrium

Thu, 01 Jan 2026 00:00:00 +0000

Layer 1: Overview

This paper develops a quantitative general equilibrium model of the U.S. housing finance system that jointly determines mortgage credit and mortgage-backed security (MBS) issuance, with the aim of measuring how information frictions in the securitization market amplify aggregate credit cycles. The central motivation is the tight co-movement of mortgage credit and MBS issuance documented in HMDA data from 1990 to 2016: from 2000 to 2019, originators sold or securitized roughly 70 percent of all residential mortgages within the first year of origination, making securitization the dominant source of funding for new lending. When this source of liquidity collapsed during the Great Financial Crisis (GFC), aggregate residential mortgage credit contracted by roughly 41 percent and RMBS issuance contracted by roughly 37 percent on average from 2008 to 2013.

The model is a discrete-time, infinite-horizon DSGE framework with three types of agents: an impatient representative borrower household, a unit-mass continuum of heterogeneous lenders, and a government. Borrower households consume non-durables and housing services, take on long-term fixed-rate mortgages modeled as perpetuities with geometrically declining payments, and can endogenously default when idiosyncratic housing valuation shocks erode their equity. Lenders face stochastic loan origination costs drawn i.i.d. from a continuous distribution, can privately identify the quality of loans in their portfolios, and access a securitization market modeled after the to-be-announced (TBA) forward market for agency MBS — the largest liquid MBS market in the U.S. The TBA market features anonymous, non-exclusive trades at a single pooling price, and the “cheapest-to-deliver” convention gives sellers the incentive to offload their lowest-value loans, giving rise to a classic Akerlof-style adverse selection problem. The government captures GSE credit guarantees through a state-contingent subsidy to MBS buyers, financed by a distortionary fee on originators and lump-sum taxes on households. The model is calibrated to match key cross-sectional moments of the HMDA dataset for 1990 to 2006, including the distribution of lending: the top 1 percent of originators accounted for 62 percent of lending and the top 10 percent for 89 percent. These moments of market concentration are central to quantifying the amplification channel.

Two novel theoretical features distinguish this framework. First, the mortgage interest rate and the security price are jointly determined in equilibrium — a “joint price determination” property. Second, the severity of information frictions is itself an endogenous function of equilibrium prices, the household default rate, and lenders’ trading decisions. When household credit risk rises, more loans become low-quality, deteriorating the average quality of the pool offered by sellers. MBS buyers, aware of sellers’ incentives, demand a larger adverse selection discount; security prices fall; fewer lenders find it profitable to securitize; an endogenous liquidity shortage follows in the credit market; and tighter lending conditions further weaken household balance sheets. This feedback constitutes the adverse selection multiplier.

Quantitatively, when the calibrated model is fed the sequence of income and housing-valuation shocks observed from 2006 to 2016, it replicates two-thirds of the observed 41 percent contraction in mortgage lending and the full 37 percent contraction in MBS issuance from 2008 to 2013. A shock decomposition (Table 7) shows that, on average over 2008–2013, information frictions account for 40 percent of the model’s predicted decline in mortgage lending (52 percentage points from housing valuation shocks and 5 percentage points from income shocks make up the remainder; comparable shares hold in the securitization market). There is a 1.5 adverse selection multiplier: absent information frictions, credit would have contracted by 27 percent rather than 41 percent. Housing valuation shocks account for roughly half the total dynamics; income shocks account for about 5 percent.

Regarding the post-GFC structural changes, the paper evaluates the effect of GSEs expanding their market share to 100 percent (up from 69 percent in 1990–2006) and the threefold increase in the guarantee fee (from 20 to 60 basis points after 2012). These changes reduce the volatility of the mortgage spread from 6.3 to 4.7 percentage points and lower the unconditional probability of a securitization market collapse from 6.5 to near zero. However, the policy generates inefficiently high levels of liquidity, produces only small welfare gains for borrowers (0.06 percent in consumption-equivalent units), and distributes gains unequally — lenders gain approximately 1.3 percent. Households face higher interest rates (lenders pass through the guarantee fee) and higher taxes. The model corroborates other GE studies in finding that credit guarantees were underpriced before the GFC; the actuarially fair price is closer to the post-2012 fee.

In depth

Q1. What is the paper’s identification strategy and what is the nature of the quantitative exercise?

The paper does not use a reduced-form empirical identification strategy; it is a structural DSGE model. The quantitative exercise feeds the calibrated model the observed sequences of aggregate household income shocks and housing valuation shocks from 2006 to 2016, with the model calibrated to match pre-GFC (1990–2006) moments of the U.S. mortgage market. The decomposition of information frictions is accomplished by simulating a complete-information counterfactual for the same shock sequence: the difference between the benchmark model and the complete-information economy quantifies the contribution of private information.

Q2. What is the securitization liquidity channel, and how does it operate mechanically in the model?

The securitization liquidity channel is the transmission mechanism from the securitization market to mortgage credit supply. In normal times, lenders with low origination costs (sellers) securitize their loan portfolios, freeing up funds to originate new loans, while high-cost lenders purchase securities rather than originate, effectively specializing their roles through the market. A shock that increases household default risk worsens pool quality. Buyers face a larger adverse selection discount, security prices fall, and the wedge between the market price and a seller’s valuation of high-quality loans widens. Many lenders switch from selling to holding, reducing the supply of liquidity in the securitization market. Constrained by limited access to debt markets, lenders cut new mortgage origination. The resulting tightening in credit further deteriorates household balance sheets, creating an amplification loop.

Q3. What are the three types of lenders in the model, and what determines their trading decisions?

Lenders endogenously sort into three groups based on their idiosyncratic origination cost draw z relative to two equilibrium cutoffs. Sellers (low-cost lenders, z below the first cutoff) find origination sufficiently profitable to sell their inventory of loans into the securitization market and originate new ones. Buyers (high-cost lenders, z above the second cutoff) find origination too costly and instead buy securities from sellers. Holders (lenders with z between the two cutoffs) neither sell at the prevailing adverse-selection-discounted price nor buy at the effective cost grossed up by the information wedge; they retain their illiquid loan portfolios and originate fewer new loans. The information wedge — the distance between the two cutoffs — is a decreasing function of the subsidy coverage and an increasing function of the adverse selection discount.

Q4. How is the adverse selection discount endogenously determined, and why does it amplify shocks?

The per-unit adverse selection discount mu_t is defined as the aggregate fraction of low-quality loans traded in the securitization market: mu_t = S_B_t / S_t, where S_B_t is the aggregate supply of low-quality loans and S_t is total loans traded. This fraction is endogenous: it depends on which lenders sort into the seller category and what quality distribution their portfolios have, which in turn depends on the household default rate and the equilibrium price. When household credit risk rises, the default rate increases, more loans become low-quality, and sellers selectively offload bad loans while retaining good ones. The endogenous deterioration in mu_t raises buyers’ required discount, further reducing the security price, which causes additional holders to switch away from selling, compounding the adverse selection problem. This self-reinforcing dynamic is the multiplier.

Q5. Under what conditions can the securitization market shut down entirely, and what happens to credit in that case?

Proposition 2 establishes that a sufficient condition for market shutdown in the steady state is that the market effective cost of buying securities exceeds the origination cost of the highest-cost lender in the economy. When this condition holds: (1) the securitization market does not operate; (2) every lender originates using only her own technology; and (3) the mortgage rate is higher than when the market operates. Critically, even when the securitization market collapses, the credit market continues to function, but with higher interest rates and lower intermediation volumes. The economy can transition between states with and without an active securitization market.

Q6. What role does market concentration of mortgage originators play in the quantitative results?

Market concentration is crucial for the magnitude of amplification. From 1990 to 2016, the top 1 percent of originators accounted for 62 percent of lending and the top 10 percent for 89 percent (from HMDA data). The model is calibrated to match these moments. Because large originators specialize as securitization sellers, their decision to switch from selling to holding — triggered by rising adverse selection discounts — produces very large contractions in aggregate credit supply. The calibrated lending-cost distribution shows a large discontinuity: the last marginal securitization seller originates a volume four times larger than the next marginal holder. When the most efficient, high-volume lenders exit the securitization market, the aggregate effect is disproportionately large.

Q7. How does the government subsidy policy interact with adverse selection, and what are its theoretical properties?

The GSE credit guarantee is modeled as a state-contingent subsidy tau_t = alpha_G * mu_t, where alpha_G in [0,1] represents the degree of insurance provided. Any positive subsidy reduces the adverse selection wedge by moving the second cutoff leftward, expanding the mass of security buyers. A full subsidy (alpha_G = 1) completely offsets buyers’ losses from default risk, stabilizing security demand regardless of household credit risk and minimizing the probability of market collapse. However, Proposition 3 establishes that a full subsidy generates inefficiently high levels of liquidity compared to the complete information benchmark: it expands the volume of MBS at lower average quality relative to an economy where low-quality loans are screened out. A full subsidy also fails to replicate complete-information allocations because the guarantee fee distorts lenders’ origination decisions and raises borrowers’ mortgage rates.

Q8. What are the welfare implications of the post-GFC policy changes?

The welfare analysis (Table 9) finds small positive but unequal welfare gains. The overall post-GFC policy changes (full subsidy plus higher guarantee fee) yield borrower welfare gains of 0.06 percent and lender welfare gains of 1.3 percent in consumption-equivalent units. Decomposing the changes: the increase in the subsidy (alpha_G from 69 to 100 percent) generates borrower welfare losses of -0.16 percent (due to higher taxes and interest rates, offset partially by lower volatility) and lender gains of 3.01 percent (from improved lending efficiency). The increase in the guarantee fee reverses some of this by generating borrower gains of 0.18 percent and lender losses of -1.53 percent. The paper characterizes these as upper bounds because the full subsidy may generate moral hazard by weakening originators’ incentives to screen loan quality.

Q9. How does this paper relate to and extend Justiniano et al. (2015, 2019) and Landvoigt (2016)?

Justiniano et al. (2015, 2019) argue that credit supply constraints — limits on the funds available to lenders — are quantitatively more important than credit demand forces in explaining mortgage credit fluctuations. This paper provides a microfoundation for those constraints by modeling securitization as the dominant source of liquidity for lenders and deriving endogenously how adverse selection limits that liquidity. Landvoigt (2016) introduces securitization in a DSGE housing model in reduced form. This paper goes further by modeling an endogenous securitization market where lenders optimally trade off liquidity benefits against information friction costs, so security prices and mortgage rates are jointly determined rather than imposed exogenously.

Q10. How does this paper relate to the Kurlat (2013) and Bigio (2015) models of adverse selection in asset markets?

The securitization design combines Kurlat (2013)’s framework of asset creation and reallocation with two additional features specific to the TBA market: (1) the cheapest-to-deliver convention, which means sellers can select the lowest-value loans in their inventory satisfying trade terms; and (2) the non-exclusive, anonymous nature of TBA trades, which ensures a pooling price. Bigio (2015) models endogenous liquidity and the business cycle through information frictions in interbank markets. This paper extends the adverse selection approach to the mortgage market specifically and provides an equilibrium linkage between the securitization market and the credit market rather than modeling them as a single market.

Q11. What are the non-targeted moments and how well does the model fit the data?

Three non-targeted moments are reported (Table 5). The model generates a fraction of loan sales of 73.9 percent (data: 61.8 percent from HMDA), a correlation between loan sales and new lending of 0.86 (data: 0.90), and a mortgage spread of 178 basis points (data: 330 basis points). The loan sales fraction is somewhat above data and the spread is substantially below. For targeted cross-sectional moments (Table 6), the model closely matches the distribution of lending by quartile, with Q4 market shares of 0.957 in the model versus 0.959 in the data. For the dynamic GFC episode, the model replicates two-thirds of the 41 percent contraction in mortgage lending and the full 37 percent contraction in MBS issuance.

Q12. What are the sources of aggregate shocks and how are they calibrated?

The two exogenous aggregate state variables are household income Y_t and the variance of idiosyncratic housing valuation shocks sigma_omega_t (the proxy for mortgage credit risk). They follow a first-order joint Markov process. Income is identified using the cyclical component of disposable personal income from the flow-of-funds accounts. The variance of housing shocks is calibrated to match the national delinquency rate for loans 90+ days delinquent or in foreclosure from the National Mortgage Database (FHFA). The calibrated states produce default rates of 1.8 percent in the low-risk state and 7.9 percent in the high-risk state, with an unconditional default rate of 2.6 percent.

Q13. What are the key limitations and caveats of the analysis?

Several limitations are noted. First, the welfare analysis of the full subsidy is characterized as an upper bound because moral hazard — the impact of guaranteed insurance on originators’ incentives to screen loan quality — is not modeled. Second, the model abstracts from other consequences of default for borrowers, such as reputation concerns and long-term credit market exclusion. Third, the paper focuses on information frictions between lenders and investors (the securitization chain), not between borrowers and lenders. Fourth, the non-targeted mortgage spread (178 bps in model versus 330 bps in data) suggests some quantitative limitations in matching all features of the credit market simultaneously. Fifth, the exercise is a structural model exercise and not empirically identified through exogenous variation.

Key Concepts

Securitization liquidity channel: The mechanism by which mortgage originator funding capacity depends on their ability to sell loan portfolios in the securitization market; when securitization demand falls, originators face an endogenous liquidity shortage and reduce new mortgage lending, transmitting shocks from the MBS market to the credit market.

Adverse selection multiplier: The amplification factor arising from private information in the securitization market: as household credit risk rises, sellers’ incentives to offload low-quality loans worsen pool quality, causing buyers to demand a larger discount, which causes more lenders to withdraw from selling, creating a feedback loop that magnifies the initial shock to credit supply. Quantified at 1.5 for the GFC episode.

TBA (to-be-announced) forward market: The dominant trading venue for agency MBS in the U.S., accounting for over 90 percent of MBS trading volume, where the specific securities to be delivered are not identified at the trade date and sellers can deliver the cheapest eligible pool (‘cheapest-to-deliver’), institutionalizing adverse selection incentives.

Cheapest-to-deliver convention: A TBA market practice by which a seller selects and delivers the lowest-value mortgage pools in its inventory that satisfy the terms of trade, giving sellers a systematic informational advantage and incentivizing selective retention of high-quality loans.

Adverse selection discount (mu_t): In this paper, the per-unit discount arising from adverse selection, defined as the endogenous equilibrium fraction of low-quality loans in the aggregate supply of traded loans (S_B_t / S_t); this fraction is determined jointly with prices and lenders’ trading decisions, and rises when household default risk increases.

Mortgage credit risk (sigma_omega_t): The standard deviation of idiosyncratic housing valuation shocks to household members, which is the exogenous aggregate state variable that drives default rates; when sigma_omega_t rises, more households fall below the default threshold, increasing the aggregate default rate and degrading the quality composition of lenders’ portfolios.

Joint price determination: A novel equilibrium property of the model in which the mortgage interest rate (in the credit market) and the price of securities (in the securitization market) are simultaneously determined; this interdependence means that adverse selection dynamics in the securitization market directly affect the cost of credit and vice versa.

GSE credit guarantee (subsidy policy): A state-contingent subsidy tau_t = alpha_G * mu_t paid to MBS buyers, representing the credit guarantees of Fannie Mae and Freddie Mac; financed by a guarantee fee (distortionary tax on originators) and lump-sum taxes on households; alleviates adverse selection by stabilizing security demand but generates inefficiently high liquidity and fails to deliver meaningful household welfare gains.

Pricing-to-market in business cycle models

Thu, 01 Jan 2026 00:00:00 +0000

Layer 1: Overview

This paper evaluates five microfounded pricing-to-market (PTM) mechanisms and one reduced-form aggregator in a two-country DSGE model with volatile exchange rates driven by financial shocks (following Gabaix and Maggiori 2015) and real productivity shocks. The central question is whether existing open-economy theories can jointly achieve three empirically mandated targets — low exchange-rate pass-through to import prices, muted expenditure switching (low short-run trade elasticity), and plausible producer markups — when exchange rates are volatile and act as a major independent source of fluctuations. The paper’s main contribution is to show analytically and quantitatively that no existing microfounded PTM model fully escapes a structural tension among these three targets, which the authors call the parameterization trilemma.

The models evaluated are: (i) the Kimball Aggregator (KA; reduced-form, Itskhoki-Mukhin application); (ii) the Distribution Cost model (CD; Corsetti-Dedola 2005); (iii) the Price Dispersion model (PD; Alessandria 2009); (iv) the Nested CES/Cournot model (NCES; Atkeson-Burstein 2008); (v) the Deep Habits model (DH; Ravn-Schmitt-Grohe-Uribe 2007); and (vi) the Customer Capital model (CC; Drozd-Nosal 2012). The encompassing framework uses the Backus-Kehoe-Kydland (1995) two-country structure augmented with a financial sector that generates UIP deviations via a capacity-constrained arbitrageur segment and exogenous noise-trader positions. The model is estimated/calibrated to quarterly U.S. data (1981Q1–2009Q4 for prices, 1980Q1–2004Q1 for quantities), HP-filtered with lambda = 1,600.

The baseline markup target is 50%, consistent with BEA input-output tables for U.S. tradable sectors (ranging 45–50% across 2007, 2012, 2017); listed-firm SEC data imply higher values around 73–75%, which the authors treat as an upper bound. The empirical pass-through target is 0.4 (midpoint of a 0.2–0.6 range estimated by Campa-Goldberg 2005 and others; Gopinath-Itskhoki 2022 estimate 0.2–0.3). The short-run trade elasticity target is 0.7, measured using the volatility ratio of quantities to prices, which yields an upper-bound estimate. Real exchange rate volatility is targeted at 3.97 (standard deviations relative to GDP). Imports-to-GDP ratio is targeted at 12%.

The central analytic finding — the parameterization trilemma — is characterized precisely for each model. For the KA model, the demand elasticity parameter gamma(1) simultaneously pins down both the markup and the trade elasticity, so matching 50% markups implies trade elasticity of approximately 1.5 (above the desired range of less than 1) and any value below TE = 1 is simply unattainable. For the CD model, pass-through of 0.4 requires a distribution cost markup wedge of 150% above the producer’s markup, which is inconsistent with the 50% markup target. For the PD model, the structural formula links PT and markups but less severely, so the trilemma is partially mitigated. For the NCES model, the trade elasticity equals the firm-level elasticity theta, which is also the main driver of pass-through, recreating a binding version of the KA trilemma on the quantity side. For the CC model, the market-expansion friction (captured by adjustment-cost parameter psi) provides an additional degree of freedom that allows trade elasticity to be set independently of pass-through and markups; at symmetric bargaining power eta = 0.5 and 50% markups, the model delivers PT = 0.33 analytically, close to the data target.

Quantitative results confirm the analytic predictions. The KA model fails on quantity statistics because it implies trade elasticity far above target, generating counterfactually negative international comovement of consumption, investment, and employment. The CD model delivers only moderately incomplete pass-through (substantially above the 0.4 target), underperforming on price statistics, and implies a counterfactual correlation of net exports with the terms of trade. The PD model delivers pass-through of approximately 0.70 — better than CD but still above target — and performs well on quantities. The NCES model achieves pass-through of 0.63 (close to but above the 0.4 target) but at the cost of large, negative international comovement in general equilibrium, including a counterfactual positive correlation of net exports with output. The DH model generates more-than-complete pass-through in the presence of persistent exchange rates, failing on prices. The CC model delivers PT = 0.36, closest to the empirical target, achieves correct signs for international quantity comovement, and generates a positive terms-of-trade/net-exports correlation — but requires assumed productivity shock correlation of 0.75 to match measured TFP correlation of 0.3 due to endogenous marketing investment affecting measured TFP, and fails to deliver a positive correlation between terms of trade and the exchange rate.

The paper concludes that further research is needed into frictions that simultaneously dampen the price and quantity responses to volatile exchange rates without violating markup discipline. The reduced-form KA model neither nests nor outperforms the microfounded alternatives. The CC and PD search-based models perform best overall but introduce frictions that are harder to identify and measure directly.

In depth

Q1. What is the parameterization trilemma and how is it characterized analytically?

The trilemma is the structural impossibility of jointly satisfying three empirically necessary targets: (a) plausible steady-state producer markups (calibrated at 50%), (b) low short-run trade elasticity (targeted at 0.7 or below), and (c) low exchange-rate pass-through to import prices (targeted at 0.4). The authors derive closed-form expressions for pass-through (PT), trade elasticity (TE), and markups (mu) for each model and show that satisfying any two targets forces a violation of the third. For the KA model, the key parameter gamma(1) satisfies TE = gamma(1) and mu = (gamma(1) - 1)^{-1}, so targeting 50% markups forces TE = 3 and targeting TE = 1.5 forces markups of 200%. For the CD model, PT = 0.4 requires the distribution-cost wedge xi/(theta-1) = 1.5, implying markups more than 150% above the friction-free level, incompatible with a 50% target. For the PD model the formula is PT = 1 - mu/(1+mu), which is less restrictive. For the NCES model, TE = theta (the firm-level elasticity) and theta also drives pass-through, recreating the KA-type trilemma on the quantity side. For the CC model, the friction parameter psi in marketing capital accumulation independently controls TE, providing an extra degree of freedom that lets the model partially escape the trilemma.

Q2. What is the identification strategy for pass-through and trade elasticity, and what are its main assumptions?

The theoretical pass-through coefficient (PT) is defined as the partial equilibrium, on-impact elasticity of the import price with respect to the exchange rate, computed at the steady state while holding constant marginal costs (v, v*), the stochastic discount factor, and the domestic price of the home good. This mimics what regression-based pass-through estimates do (controlling for local costs). Trade elasticity (TE) is defined analogously as the PT-scaled elasticity of the import/domestic quantity ratio with respect to the exchange rate, under a one-time shock that reverts to the steady state next period (except for the DH model, where a permanent shock is considered). A key assumption is that importers take aggregate price indices as consistent with all importers behaving the same way (a rational-expectations fixed point). General-equilibrium co-movements between exchange rates and marginal costs are abstracted from in the analytic section, consistent with the goal of isolating each model’s intrinsic PTM mechanism.

Q3. Why does the KA model fail on quantity statistics despite being able to match any degree of pass-through?

The KA model can match pass-through of 0.4 by freely choosing the curvature of the demand aggregator g’’(1) (independently of gamma(1)). However, the steady-state demand elasticity gamma(1) simultaneously determines both the markup (mu = (gamma(1)-1)^{-1}) and the trade elasticity (TE = gamma(1)). Matching 50% markups forces gamma(1) = 3 and therefore TE = 3, far above the target of 0.7. This excessive trade elasticity generates counterfactually large expenditure switching in response to exchange-rate shocks, leading to counterfactual negative international comovement of consumption, investment, and employment. A modified Kimball aggregator with a convex adjustment cost (equation 62) does not resolve the problem because the convex cost parameter also enters the steady-state markup formula, so targeting 50% markups still forces high effective trade elasticity.

Q4. Why does the Deep Habits model generate more-than-complete pass-through when exchange rates are persistent?

In the DH model, producers internalize the law of motion for habits: by lowering prices today they accumulate more customer habits, which allows them to raise prices later. When the exchange rate appreciates persistently (from the foreign exporter’s perspective), exporters expect their foreign sales and thus foreign habit stocks to fall over time. This reduces the shadow value of habit (Delta_f), so producers let prices fall by more than the exchange rate movement, generating pass-through greater than one. The authors derive analytically that, for a permanent shock, PT > 1 because dlog(gh)/dlog(x) < 0 (habit falls upon appreciation), and this dominates the direct pricing effect. For a purely transitory shock, the sign reverses (PT < 1), but since exchange rates are highly persistent in the data, the first property dominates. The quantitative section confirms this: the DH model generates PT > 1, marked as 1.00 in Table 4, disqualifying it on prices.

Q5. How does the Customer Capital (CC) model partially escape the trilemma?

The CC model introduces two key elements absent from other frameworks: (1) Nash bargaining over prices within bilateral matches, which directly ties pass-through to the sharing of exchange-rate-driven surplus rather than to demand elasticity; and (2) a convex adjustment friction on marketing capital (psi) that controls the pace of trade-share adjustment, independently setting the short-run trade elasticity. Because prices are determined by bargaining (equation 53: pf = eta*P_d + (1-eta)*v), they depend on the retail marginal value of the foreign good (P_d) and the foreign marginal cost (v), but not on quantity within the match. This decouples PT from TE. Analytically, at static steady state, PT = (1-eta)(1 + mu - (TE/gamma)(eta+mu)*omega)^{-1}; for eta = 0.5 and 50% markups and TE/gamma approaching zero, PT approaches (1-eta)/(1+mu) = 1/3. The psi parameter then tunes TE separately from markups and PT. However, a high long-run elasticity gamma (= 7.9) is required to generate sufficient retail-price responsiveness.

Q6. What does the NCES model achieve on prices and why does it fail on quantities?

The NCES (Nested CES with Cournot competition) model generates incomplete pass-through of 0.63, the second-best performance on prices after the CC model. The mechanism is that non-atomistic (Cournot) firms internalize the impact of their pricing on the sectoral price index; when the exchange rate moves, foreign exporters’ market share changes, altering the endogenous demand elasticity they face and dampening their pass-through. To calibrate the model with only one exporting firm (NX=1 out of N=5), the authors maximize the Cournot effect. However, this calibration implies TE = theta (the firm-level elasticity, set at 7.9 in calibration), far exceeding the target of 0.7. A quantity adjustment cost cannot remedy this because it would simultaneously constrain import-share movements, which are the source of the endogenous demand elasticity variation that generates incomplete pass-through. Consequently, the model implies large negative international comovement of output, consumption, employment, and investment — a worse quantity performance than most other models.

Q7. How does the paper measure markups and what data sources does it use?

The paper equates markups with gross margins under the maintained assumptions of Cobb-Douglas production and static cost minimization (Hall 1988; De Loecker et al. 2020). Under Cobb-Douglas, marginal cost v = wl/y, so markup mu = Py/(wl) - 1 = sales/(cost of goods sold) - 1. Three data sources are used, all for U.S. data 2007-2017: (1) BEA 402 Industry Input-Output Use Tables, which give gross margins of approximately 39-41% for all sectors and 45-50% for traded sectors (import share > 3%). (2) S&P 500 Compustat with BEA sector value-added adjustment, yielding approximately 73-74% for all non-FIRE/GOV/NGO firms. (3) Unadjusted Compustat, yielding 43-49%. The paper adopts 50% as the baseline calibration target, treating it as conservative given the data range, and noting that the BEA I-O measure is the broadest and likely most accurate. The paper explicitly holds that models must respect profit and margin accounting within their own structure.

Q8. How does the paper’s conclusion differ from Itskhoki and Mukhin (2021) regarding the Kimball Aggregator?

Itskhoki and Mukhin (2021) use indirect inference and treat producer margins/markups as a free parameter, implicitly allowing for a much higher markup value — substantially above 50%. Under their calibration approach, the KA model can reconcile low pass-through with better quantity performance. Drozd, Kolasa, and Nosal instead impose a markup discipline: models must match empirically observed gross margins of 50% (for tradable sectors from BEA I-O tables) in their steady state. Under this discipline, the KA model’s trilemma becomes binding, and the model fails on quantity statistics. The authors argue that higher markup assumptions change the effective structure of the model and should be treated as a separate research agenda rather than a free calibration choice.

Q9. What is the role of financial shocks in the model and how are they implemented?

Financial shocks generate exchange-rate volatility that is largely decoupled from real fundamentals — mimicking the observed ’exchange rate disconnect’ from output and consumption. They are modeled following Gabaix and Maggiori (2015): a global financial sector with short-lived arbitrageurs and noise traders. Arbitrageurs face a capacity constraint (parameterized by Gamma) that prevents them from fully exploiting UIP violations, resulting in a distorted UIP condition where the interest rate differential includes a term proportional to the arbitrageur’s position. Noise traders take exogenous positions n(t) that follow an AR(1) process (persistence rho_n = 0.97 in calibration) with standard deviations ranging from 21.2 (CC model) to 114.9 (NCES model) across calibrations. These shocks generate real exchange rate volatility of 3.97% (standard deviations relative to GDP), matching the data target. The paper notes that the precise implementation (Gabaix-Maggiori vs. Itskhoki-Mukhin) has little impact on exchange-rate properties in a linearized setting.

Q10. What robustness checks and extensions does the paper consider?

The paper considers a modified Kimball aggregator with a convex adjustment cost on the ratio of imported to domestic quantities (equation 62) as a potential fix for the KA model’s high trade elasticity. This is shown not to resolve the trilemma because the convex cost parameter also enters the steady-state markup formula, keeping the binding constraint in place. Results for this modified model are reported in the Online Appendix. The paper also notes that the DH model’s pass-through is analyzed under both permanent and transitory shocks, with the sign reversal for purely transitory shocks documented analytically. The paper abstracts from nominal rigidities throughout, justifying this by citing Gopinath-Itskhoki (2011) evidence that conditioning pass-through on price adjustments versus non-adjustments makes little difference in observed pass-through patterns, suggesting limited pass-through is largely a real phenomenon.

Q11. What are the paper’s main implications for the DSGE modeling of open economies?

The paper implies that the standard toolkit for generating incomplete exchange-rate pass-through and muted expenditure switching is inadequate when exchange rates are volatile and act as a major shock. All models face tension among the three targets; the best performers (CC and PD) do so by introducing search frictions that are intrinsically difficult to identify and measure directly. The paper does not claim to provide a solution; rather, it performs a clean diagnostic showing that more research is needed into real frictions that simultaneously insulate import prices and trade quantities from exchange-rate volatility. The finding that the Kimball reduced-form aggregator neither nests nor outperforms microfounded alternatives has implications for monetary-policy DSGE models that frequently use the KA for tractability, suggesting that researchers should be aware of the high implicit markup that is required for the KA to work well in open-economy settings with volatile exchange rates.

Q12. What moments from the data are targeted in calibration and what is the quantitative approach?

The model is calibrated quarterly and HP-filtered (lambda = 1,600). Common targets include: imports/GDP = 12%; 50% producer markups; 30% work hours relative to time endowment; investment volatility relative to GDP = 2.79; short-run trade elasticity (volatility ratio) = 0.7; cross-country TFP correlation = 0.3; TFP volatility = 0.8% and autocorrelation = 0.72; real exchange rate volatility = 3.97%. The pass-through target of 0.4 is used only as an additional degree of freedom for the KA model; for all others, pass-through is an outcome of the structural parameterization. The financial shock persistence is set arbitrarily at rho_n = 0.97 for lack of a target. When a model cannot satisfy all targets (as with KA and NCES on trade elasticity), that target is dropped in favor of best performance on prices. Pass-through is measured in the quantitative section by running regressions analogous to Campa-Goldberg (2005) on model-generated data, rather than using the analytic partial-equilibrium formula.

Q13. What is the sign of the terms-of-trade and exchange-rate correlation, and what does it imply for model evaluation?

In model-generated data (without noise), the correlation of terms of trade (tot = pf/px) with the exchange rate (x) is either -1 (when PT < 0.5) or +1 (when PT > 0.5). The empirical target from U.S. data is approximately -1. This means matching PT < 0.5 and a negative tot-x correlation are equivalent predictions. In the quantitative results, only the KA and CC models achieve PT < 0.5 and thus generate the correct negative correlation; all other models (CD, PD, NCES, DH) generate PT > 0.5 and thus positive tot-x correlation. The authors note that the strict 0.4 target may be too aggressive for aggregate data — PT slightly above 0.5 would be consistent with a positive (near zero) correlation — pointing to Gopinath et al. (2020) who find small, statistically insignificant tot-x coefficients ranging from positive to negative.

Key Concepts

Parameterization Trilemma: The structural impossibility of jointly achieving three empirically necessary targets in standard PTM models: (1) plausible producer gross margins (~50%), (2) low short-run trade elasticity (~0.7 or below), and (3) low exchange-rate pass-through to import prices (~0.4). Each PTM model can satisfy at most two of the three targets simultaneously under quantitative discipline; the third is either infeasible or inconsistent given the model’s internal constraints.

Pricing-to-Market (PTM): The practice by which internationally active firms set different prices in home and foreign markets as a function of the bilateral exchange rate, rather than uniformly passing exchange-rate changes through to import prices. In this paper, PTM is measured by the degree of incomplete pass-through (PT < 1) and is generated by specific microfounded frictions (distribution costs, search, habits, market power, customer capital) rather than by nominal rigidities.

Exchange-Rate Pass-Through (PT): The elasticity of the import price (in the importing country’s currency) with respect to the bilateral real exchange rate, computed in partial equilibrium at the steady state, controlling for local costs. Values used in calibration: empirical short-run range 0.2–0.6; paper target 0.4. Models in which PT = 1 satisfy the law of one price; models with PT < 1 exhibit pricing-to-market.

Short-Run Trade Elasticity (TE): The elasticity of import quantities relative to domestic quantities with respect to the exchange rate (equivalently, the expenditure-switching response to import price changes), measured at business-cycle frequencies. The paper measures this using the volatility ratio of trade-flow quantities to prices (an upper-bound estimate abstracting from correlations), targeting a value of 0.7. Long-run elasticity estimates based on trade liberalization episodes are much higher (typically 6 and above) and are used as the long-run elasticity parameter gamma in search-based models.

Customer Capital (CC) Model: A PTM model (Drozd-Nosal 2012) in which firms build market-specific customer relationships through costly, time-consuming investment in marketing capital, and within-match prices are set by Nash bargaining. The combination of a capacity constraint on quantities traded within each match and bargaining-determined prices decouples the short-run trade elasticity from pass-through, allowing the model to partially escape the parameterization trilemma via the adjustment-cost parameter psi.

Kimball Aggregator (KA): A reduced-form, implicitly defined demand aggregator (Kimball 1995) that generates variable demand elasticity through the curvature of the function g(·) around the steady state. In the open-economy application of Itskhoki-Mukhin (2021), two curvature parameters (g’(1) and g’’(1)) can independently control markup and pass-through — but not trade elasticity simultaneously, which is bound to the steady-state demand elasticity gamma(1) and hence to the markup. The paper shows this model neither nests nor outperforms microfounded alternatives under markup discipline.

Financial Shock: An exogenous disturbance to the position of noise traders in the international bond market (following Gabaix-Maggiori 2015), which drives deviations from Uncovered Interest Parity via the capacity constraint on arbitrageurs. These shocks generate exchange-rate volatility that is largely disconnected from real fundamentals (productivity), calibrated with persistence rho_n = 0.97 to match U.S. real exchange rate volatility of 3.97% relative to GDP.

Gross Margin / Producer Markup: In this paper, defined as (price - marginal cost) / marginal cost = (sales - cost of goods sold) / cost of goods sold, where under Cobb-Douglas production and static cost minimization, the markup equals the gross margin. The paper targets 50% for U.S. tradable-sector firms based on BEA 402 Industry I-O Use Tables (which yield 45–50% for tradable sectors across 2007–2017), treating this as a hard empirical constraint that models must satisfy in the steady state.

Resource Misallocation in European Firms: The Role of Constraints, Firm Characteristics and Managerial Decisions

Thu, 01 Jan 2026 00:00:00 +0000

Layer 1: Overview

This paper investigates why firms in the European Union exhibit wide dispersion in marginal revenue products (MRP) of capital and labor — a direct indicator of resource misallocation — and asks how much aggregate productivity the EU forfeits as a result. The research question is motivated by the persistent productivity gap between the EU and the United States, by evidence that within-country MRP dispersion in Europe has been trending upward since the mid-1990s, and by an institutional context in which the EU single market (launched in 1993) has not eliminated cross-country factor market frictions even three decades later.

The primary data source is the EIB Investment Survey (EIBIS), a stratified random survey of non-financial enterprises conducted annually since 2016 across all 28 EU member states, covering manufacturing, services, utilities, and construction (NACE categories C–J). The analysis uses three waves (2016–2018), with approximately 12,500 firms per wave and a panel component of roughly 2,000 firms appearing in all three waves. Survey responses are matched to Orbis administrative data; the correlation between log employment in EIBIS and Orbis is 0.91, confirming data quality. MRP of capital (MRPK) is measured as the capital cost share times revenue divided by fixed assets; MRP of labor (MRPL) is the labor cost share times revenue divided by employment. Cost shares are calibrated from OECD STAN and Eurostat national accounts at the country–year–industry level.

The theoretical framework is a dynamic model of a profit-maximizing firm with Cobb-Douglas production, isoelastic demand, and quadratic adjustment costs. Under the assumption that pure economic profits are small and that the labor output distortion is negligible (following Hsieh-Klenow 2009), the model implies that log MRPK and log MRPL can be approximated by observable average revenue products. The empirical strategy is a Mincerian regression of log MRPK (and log MRPL) on a rich vector of firm-level characteristics — firm demographics, input quality, capacity utilization, investment constraints, dynamic adjustment variables, and financing sources — plus country, industry, and year fixed effects (and their interactions). Because regressors are endogenous, the R² from OLS is interpreted as an upper bound on the share of MRP variance attributable to each factor (formally shown to dominate the IV R²). Marginal R² increments when a variable block is added identify the contribution of that block to the variance in MRP, which is then mapped into productivity gains via the Hsieh-Klenow formula.

The main quantitative findings are as follows. Raw dispersion is large: the standard deviation of log MRPK is 1.43 and of log MRPL is 1.19 (and 1.63 for log MRPL minus log MRPK), all substantially exceeding comparable US figures (0.98 for capital and 0.58 for labor from Asker et al. 2014 and Bartelsman et al. 2013). The R² in the full regression is 0.14 (without fixed effects) and 0.49 (with country × industry × year fixed effects) for MRPK, and 0.29 and 0.74 respectively for MRPL. Among firm-characteristic blocks, the “adjustment” (dynamic investment and employment growth) and “demographics” (firm size, age, subsidiary and exporter status) blocks carry the largest marginal R² contributions; the “obstacles to investment” block (direct reports of constraints) contributes modestly by comparison. Country fixed effects alone explain R² = 0.052 for MRPK and R² = 0.445 for MRPL, while industry fixed effects alone explain R² = 0.239 for MRPK and R² = 0.268 for MRPL. The combined country–industry–year fixed-effects R² reaches 0.275 for MRPK and 0.611 for MRPL; adding the full interaction yields 0.492 and 0.736 respectively.

Treating the “distortions” block of variables as genuine frictions, removing them would raise EU aggregate productivity by more than 40 percent (computed as 1.5 × 1.42 × 0.186 + 0.13 × 2.66 × 0.134 = 0.442). If all variables in X are treated as distortions, the implied gain is approximately 72 percent (0.715 in log points). Removing cross-country inequality in average MRPs (equalizing country fixed effects) would imply a 102 percentage log-point gain in productivity under the Hsieh-Klenow formula; removing barriers between industries and countries could raise productivity by at least 143 percentage log points.

A Machado-Mata distributional decomposition comparing Germany (σ(log MRPK) = 0.92, σ(log MRPL) = 0.61) and Greece (σ(log MRPK) = 1.64, σ(log MRPL) = 0.91) reveals that the primary driver of Greece’s higher dispersion is the “prices” (regression coefficients reflecting institutional and policy environment), not the “endowments” (firm characteristics). Giving Greece German institutional “prices” reduces the counterfactual standard deviation of Greek MRPK from 1.66 to 0.94. This pattern generalizes across EU countries: German b (coefficients) tends to reduce MRPK dispersion for most countries, while German X (firm characteristics) tends to increase it, because Germany has more heterogeneous firms but an environment that prices those characteristics in a way that equalizes returns. This finding constitutes large-scale microeconomic evidence that institutions matter — cross-country differences in MRP dispersion reflect how business, institutional, and policy environments translate firm heterogeneity into outcomes, more than they reflect differences in firm characteristics per se.

The policy implication is that deep institutional reform — not merely changes in firm composition — is required to narrow EU resource misallocation. The scope condition is that these estimates are upper bounds, and some observed MRP dispersion likely reflects compensating differentials (e.g., higher-quality capital commanding a higher MRPK) rather than pure distortions.

In depth

Q1. What is the identification strategy, and what are the main threats to it?

The paper does not attempt causal identification. Instead, it uses OLS to estimate equilibrium (Mincerian-type) regressions of log MRPK and log MRPL on firm characteristics plus fixed effects. The key insight is that OLS R² provides an upper bound on the share of MRP variance causally attributable to each regressor, because simultaneity or omitted variables can only inflate OLS R² above the true IV R². The main threats are: (1) endogeneity of regressors — a growing firm facing red tape will have high MRPK and a binding constraint simultaneously, inflating the R² attributed to constraints; (2) classical measurement error in survey responses, which attenuates R² toward zero (so OLS actually understates causal effects in this direction); (3) omitted variable bias via unobserved firm quality (managerial talent, etc.); (4) use of same variables (employment, fixed assets) on both left and right sides, addressed by cross-checking with Orbis data as instruments. The authors argue these threats are mostly conservative — they overstate, not understate, the upper bound.

Q2. What is the theoretical justification for using average revenue products to measure marginal revenue products?

Under the assumption that the share of pure economic profits is small (following Basu and Fernald 1997), the optimality conditions of the dynamic model imply that MRPK ≈ (capital cost share) × (revenue / capital) and MRPL ≈ (labor cost share) × (revenue / employment). These are average revenue products scaled by factor cost shares, matching Hsieh and Klenow (2009). The distortion framework further implies that the variance of log MRPK and log MRPL, when distortions are log-normally distributed and uncorrelated, maps directly into the Hsieh-Klenow productivity-loss formula, linking the regression R² to quantitative welfare calculations.

Q3. What is the role of compensating differentials versus true distortions in interpreting the results?

The paper emphasizes that not all dispersion in MRPs reflects inefficient distortions. Some dispersion — particularly from ‘quality of capital,’ ‘capacity utilization,’ and ‘dynamic adjustment’ — may reflect compensating differentials: firms that invest in higher-quality capital rationally face higher costs, demanding a higher MRPK in equilibrium, analogous to how more educated workers earn higher wages in a Mincerian framework. If these variables reflect compensating differentials rather than frictions, using ‘raw’ MRP dispersion overstates misallocation. Conversely, if all variables proxy for distortions, the productivity gains from reform are even larger (72 percent versus 40 percent). The paper presents both interpretations explicitly, making the framework ‘highly portable’ for different views of what drives observed dispersion.

Q4. What heterogeneity in MRP dispersion is documented across EU countries and industries?

Dispersion is notably lower in Germany (σ(log MRPK) = 0.92, σ(log MRPL) = 0.61) than in Greece (1.64 and 0.91) or smaller countries such as Malta, Luxembourg, and Cyprus. Country fixed effects explain R² = 0.445 of MRPL variation but only R² = 0.052 of MRPK variation, meaning labor is more segmented across countries than capital. Industry fixed effects explain R² = 0.239 for MRPK versus R² = 0.268 for MRPL, indicating capital is more segmented across industries than across countries. Core EU countries (France, Denmark) are relatively insensitive to counterfactual substitution of German coefficients, while periphery countries (Portugal, Ireland) show large movements. Romania, which resembles Slovenia in raw MRPK dispersion, looks much more like the Netherlands after controlling for firm characteristics — illustrating that observed dispersion rankings can be misleading without adjustment.

Q5. What does the Machado-Mata decomposition reveal, and how is it implemented?

The Machado-Mata (2005) decomposition separates the distribution of MRP into an ’endowments’ component (due to the values of firm characteristics X) and a ‘prices’ component (due to the regression coefficients b, which capture how the institutional and policy environment translates X into outcomes). The decomposition draws B = 10,000 bootstrap samples from the empirical distribution of X for each country, combines them with quantile regression coefficients estimated separately for each country, and constructs counterfactual distributions. Applying Greek X with German b reduces Greece’s counterfactual σ(log MRPK) from 1.66 to 0.94 — close to Germany’s actual 0.92 — while applying German X with Greek b increases dispersion. The main finding is that differences in ‘prices’ (institutional environment) dominate differences in ’endowments’ (firm characteristics) in explaining cross-country variation in within-country MRP dispersion. This pattern holds generally across EU countries: gains from ‘importing’ German institutions are correlated with poor World Bank Governance Indicators and International Country Risk Guide scores.

Q6. How do the paper’s estimates of EU misallocation compare to US benchmarks?

The EU standard deviations of log MRPK (1.43) and log MRPL (1.19) substantially exceed comparable US figures of 0.98 for capital (Asker et al. 2014) and 0.58 for labor (Bartelsman et al. 2013). The paper discusses three caveats for this comparison: (1) EIBIS uses revenue rather than value added, which affects dispersion (approximately +0.16 log points for MRPL, -0.21 for MRPK) — insufficient to explain the full gap; (2) survey measurement error is present but small — averaging over multiple waves reduces the standard deviation of log MRPK by only 8–12 percent; (3) EIBIS measures firms (not plants), and since about two-thirds of within-firm MRPK variance occurs across plants within firms (Kehrig and Vincent 2017), the EU–US comparison likely understates the true difference. Qualitatively, the greater EU dispersion is consistent with lower EU aggregate TFP relative to the US.

Q7. What specific regression results are reported for individual variable blocks?

The full R² (without / with country × industry × year fixed effects) is 0.14 / 0.49 for MRPK and 0.29 / 0.74 for MRPL. Among variable blocks, the ‘adjustment’ (investment, employment growth, past and planned investment) and ‘demographics’ (size, age, subsidiary, exporter) blocks have the largest marginal R². The ‘obstacles to investment’ (direct constraint reports) block contributes modestly, with some coefficients not statistically significant. Within regression coefficients (from Table A.4): older, exporting, high-utilization firms have higher MRPK and MRPL; investment is strongly negatively associated with MRPK (movement down the MRPK curve as capital rises) and positively with MRPL (labor becomes relatively scarcer); employment growth is positively associated with MRPK and negatively with MRPL (symmetric logic); credit-constrained status is negatively correlated with both MRPK and MRPL.

Q8. What robustness checks are run?

The paper reports: (1) ‘between’ regressions on multi-year firm averages to reduce transitory variation and measurement error — results are qualitatively similar with slightly larger productivity gains; (2) restricting the sample to firms appearing in all three survey waves (Appendix Table A.5) — qualitatively similar results; (3) estimating equation (4) for each wave separately — similar results; (4) using Orbis employment and investment as regressors instead of EIBIS responses to address mechanical measurement-error correlation — nearly identical results (Appendix Table A.17); (5) replacing log(1+investment) with an indicator for positive investment (Appendix Table A.7) — similar results; (6) using industry-specific rather than country–year–industry cost shares — similar results; (7) confirming that measurement error can account for only a portion of the EU–US dispersion difference (8–12 percent reduction in standard deviation when averaging over waves). The paper also reports separate coefficient estimates for three blocs of EU countries (North/West, South, Center/East) in Appendix Tables A.10–A.16.

The paper extends Hsieh and Klenow (2009) in several directions. First, while Hsieh-Klenow use administrative census-type data for India and China restricted to manufacturing, this paper uses a consistent cross-country survey covering all sectors in 28 EU countries, enabling direct cross-country comparison. Second, Hsieh-Klenow implicitly assume all MRP dispersion reflects distortions; this paper explicitly distinguishes distortions from compensating differentials and shows the distinction matters quantitatively. Third, this paper develops the Mincerian regression approach to apportion the variance in MRPs across observable factors — analogous to labor economists decomposing wage dispersion — and shows OLS R² provides a valid upper bound without requiring exogenous variation. Fourth, unlike country-level distortion measures (Gamberoni et al. 2016), tight theoretical restrictions (David and Venkateswaran 2017), or specific reforms (Rotemberg 2019), this paper draws on firm-level survey data with minimal restrictions and maintains high external validity. Fifth, the Machado-Mata distributional decomposition adds a new dimension absent from Hsieh-Klenow: decomposing cross-country differences into endowments vs. institutional ‘prices.’

Q10. What are the policy implications and their scope conditions?

The primary policy implication is that EU productivity could rise by more than 40 percent if distortions to resource allocation were removed — and up to 72 percent if all observed MRP variation is attributed to distortions. A more modest goal of equalizing within-industry MRP dispersion across countries (i.e., making Germany and Greece similar within industries) implies gains of approximately 31–53 percent depending on interpretation. The decomposition evidence implies that institutional reform (changing how environments price firm characteristics) is more important than directly changing firm composition. The scope conditions are: (1) these are upper bounds derived from OLS; (2) some dispersion reflects compensating differentials that should not be counted as losses; (3) the EIBIS covers firms with at least 5 employees, so very small firms are excluded; (4) the framework assumes log-normal, uncorrelated distortions and constant returns to scale — relaxing these can increase estimated losses further (Jones 2011); (5) the estimates do not account for firm-level markup heterogeneity, which could overstate or understate other channels.

Q11. What does the paper contribute to the literature on measurement error in MRP studies?

The paper shows formally (Appendix D) that classical measurement error in regressors attenuates OLS R² toward zero, so OLS provides a conservative upper bound from this direction. It also shows that averaging across multiple survey waves reduces measurement error while also attenuating transitory adjustment-cost variation, so multi-year averages likely overstate the role of measurement error. Crucially, the paper validates EIBIS against Orbis administrative data, finding a 0.91 correlation for log employment, similar standard deviations of log MRPK (1.44 in Orbis vs. 1.37 in EIBIS) and log MRPL (1.07 in Orbis vs. 1.30 in EIBIS) for matched firms, and a mean absolute log difference in standard deviations of approximately 2 percent across countries. This contributes to the debate initiated by Bils et al. (2017) on whether measured MRP dispersion reflects mismeasurement, and corroborates that surveys can be reliable substitutes for census-type administrative data in cross-country analysis.

Q12. What does the paper find about the role of credit constraints specifically?

Credit constraint status (defined as loan rejection, discouragement from applying, or receiving a loan that was too small or too expensive) is negatively correlated with both MRPK and MRPL in the full regression. This is consistent with credit-constrained firms being unable to invest to the point where MRPK is equalized with the cost of capital, but the negative sign also raises the interpretive caveat noted by the authors: cross-sectional equilibrium relationships can have signs inconsistent with causal priors because constraints may be more binding for firms that are already performing poorly. The ‘source of funds’ block (share of investment from internal vs. external sources, and credit constraint) is grouped with ‘distortions’ in the paper’s preferred decomposition.

Key Concepts

Marginal Revenue Product (MRPK/MRPL): In this paper, the marginal revenue product of capital (MRPK) and labor (MRPL) are measured as observable average revenue products — the capital or labor cost share times revenue divided by the stock of capital or employment. Under the paper’s model assumptions, these approximate the shadow cost of inputs and serve as the primary measure of firm-level resource allocation efficiency. A firm with a high MRPK relative to its cost of capital is under-capitalized; dispersion of MRPK across firms signals misallocation.

Compensating differentials (in the MRP context): The paper adapts the Mincerian concept of compensating differentials from labor markets to the firm side: some observed dispersion in MRPK and MRPL may reflect optimal responses to heterogeneity in input quality, capital utilization, or adjustment dynamics — not inefficient distortions. For example, a firm with state-of-the-art machinery may face a higher MRPK reflecting the quality premium, not a barrier to investment. Because such dispersion is rational, it should be subtracted from productivity-loss calculations rather than counted as welfare-reducing misallocation.

Machado-Mata decomposition: A distributional decomposition technique (Machado and Mata 2005) applied here to attribute cross-country differences in the dispersion of MRPK and MRPL to two components: ’endowments’ (the empirical distribution of firm characteristics X in a given country) and ‘prices’ (the regression coefficients b, which capture how the country’s business, institutional, and policy environment translates those characteristics into marginal revenue products). The decomposition constructs counterfactual MRP distributions by combining one country’s X with another country’s b.

Mincerian productivity regression: The paper’s core empirical framework, modeled explicitly on Mincer’s (1958) wage regression: just as wages are regressed on worker characteristics (education, experience) to decompose earnings dispersion, log MRPK and log MRPL are regressed on firm characteristics (demographics, quality, utilization, adjustment, constraints, financing) to decompose MRP dispersion. OLS R² in this regression is an upper bound on the share of MRP variance attributable to each regressor.

EIB Investment Survey (EIBIS): An annual firm-level survey administered by Ipsos MORI on behalf of the European Investment Bank since 2016, covering all 28 EU member states with a stratified random sample of approximately 12,500 non-financial enterprises per wave (minimum 5 employees, NACE C–J). Unique features include consistent cross-country design, merger with Orbis administrative data, and questions on investment plans, capital quality, capacity utilization, perceived obstacles, and financing sources — all directly informative about sources of MRP variation.

Institutional ‘prices’ on firm characteristics: In the Machado-Mata framework as applied here, ‘prices’ refer to the country-specific regression coefficients b in the MRP regression — how steeply a country’s environment (regulations, institutions, policies) translates a given unit of firm heterogeneity in X into a difference in marginal revenue products. Countries with smaller b magnitudes (like Germany) achieve more equalization of MRPs across heterogeneous firms, reflecting an efficient institutional environment; countries with large b (like Greece) amplify firm-level heterogeneity into large MRP dispersion.

Upper-bound R² approach to productivity gains: The paper’s portable method for quantifying productivity gains from removing a friction: the marginal R² increment in an OLS regression of log MRPK (or log MRPL) when a friction variable is added is an upper bound on the share of MRP variance attributable to that friction. This bound, multiplied by the variance of log MRP and the Hsieh-Klenow productivity-loss formula parameters, gives an upper-bound estimate of the aggregate TFP gain from eliminating that friction. The method does not require exogenous variation or tight structural assumptions.

Self-Fulfilling Prophecies in the Transition to Clean Technology

Thu, 01 Jan 2026 00:00:00 +0000

Layer 1: Overview

This paper by Smulders and Zhou challenges the standard lock-in narrative for the slow green transition. The conventional explanation — path dependency in directed technical change (DTC) — is hard to reconcile with forward-looking investors who anticipate an eventual move to clean technology. The authors propose an alternative: strategic investment complementarities among innovators can produce self-fulfilling prophecies that delay the low-carbon transition even when all agents foresee it will ultimately occur.

The framework is a continuous-time general equilibrium DTC model in the tradition of Acemoglu et al. (2012), modified in two key ways: patents last forever (rather than one period), and labor is mobile between production and R&D. The economy has a clean and a dirty final-goods sector with substitution elasticity σ between them. A continuum of monopolistic intermediate goods suppliers in each sector invest in R&D to improve product quality. The key mechanism is a demand externality: when goods are gross substitutes (σ > 1), innovation in a sector reduces the relative price of that sector’s output, shifting consumer expenditure toward it. This raises the return to all innovation in the sector. For σ > 2, this demand externality outweighs the intra-sector business-stealing effect, making within-sector innovations strategic complements — each firm’s R&D raises the payoff to R&D for all others in the same sector. The threshold σ > 2 is necessary and sufficient for a coordination problem to arise in the unregulated economy.

The paper establishes three steady states: two saddlepath-stable corner steady states (one with innovation only in the clean sector, one only in the dirty sector) and an unstable interior steady state with simultaneous R&D. When σ > 2, there exists a range of initial clean market shares θc,0 (the “overlap”) from which both corner steady states are reachable under rational expectations. The overlap grows with σ and shrinks with impatience ρ (Proposition 3). Furthermore, for any initial condition within the overlap, multiple transition paths to the same corner steady state exist: a “fast” path with immediate concentration of R&D in one sector, and “delayed” paths in which firms temporarily innovate in the competing sector before finally converging. For higher σ values, these delays may involve regime switches between the clean-only and dirty-only innovation regimes (σ ∈ [σ-bar, σ-bar-bar)) or even stagnation periods with zero R&D (σ > σ-bar-bar), producing non-monotonic patterns of clean innovation — rises followed by falls before eventual clean dominance (Proposition 4).

The welfare-maximizing path always leads to the clean steady state: a dirty steady state violates the transversality condition on the carbon stock because unbounded climate damages accumulate. The paper calibrates to 2019 data: initial clean sector share θc,0 = 0.177 (matching the 17.7% renewable energy share in global final energy consumption), world GDP per capita of $11,019 (constant 2015 USD), per capita carbon emissions of 1.22 metric tons, emission intensity ad = 0.198 tonnes per thousand USD, and σ = 1.5. Under this calibration, three distinct equilibrium paths coexist under an optimal Pigouvian carbon tax — one with clean-only innovation from the start and two involving temporary dirty R&D — all converging to the clean steady state but at different speeds and with different amounts of stranded dirty assets.

The central policy finding (Proposition 7) is that a Pigouvian carbon tax set equal to the social cost of carbon at all times eliminates the dirty steady state but does not pin down a unique transition path. Multiple equilibria with different durations of dirty innovation persist under the first-best carbon tax. Effective coordination requires a second instrument that directly controls relative innovator profitability: a minimum clean revenue guarantee, an emission cap, a dirty R&D tax, or a contingent super-Pigouvian carbon tax all qualify. A clean R&D subsidy works but is an inferior device because it distorts labor allocation between production and research. Crucially, commitment is required: unless the government commits to maintaining the coordination instrument until the economy exits the multiple-equilibria region, delayed transitions remain possible.

In depth

Q1. What is the core mechanism generating multiple equilibria, and why does it require σ > 2?

Intermediate good monopolists in each sector earn profits proportional to their sector’s expenditure share, which rises with relative quality when σ > 1 (demand shift effect). But a firm’s share of sector profits falls as rivals innovate (business-stealing effect). From equation (24), the relative marginal profit of clean versus dirty innovation scales as (Qc/Qd)^(σ-2). The demand shift effect dominates the business-stealing effect if and only if σ > 2. When σ > 2, innovations within a sector are strategic complements: any firm’s R&D raises all other firms’ marginal return to R&D in the same sector. This complementarity means beliefs about which sector will be large in the future become self-reinforcing: if investors expect the clean sector to grow, clean innovation is profitable, and the expectation is validated.

Q2. How do the two modifications from Acemoglu et al. (2012) affect the results?

First, infinite (rather than one-period) patents allow future expected profits to influence innovation decisions, giving expectations a more direct role. Second, labor mobility between production and R&D makes the speed of innovation endogenous alongside its direction. However, the paper shows (OA3.2 and Section 3.3) that neither modification is necessary for the qualitative result: the overlap and strategic complementarity arise even with finite patent length and segmented labor markets. Longer patent length has an effect similar to lower impatience — it increases the overlap. OA4 shows that a segmented labor market model has essentially identical dynamics but requires a third state variable (an effective savings-rate proxy), so it is no simpler than the baseline.

Q3. What types of transition delays are possible and how do they depend on σ?

Proposition 4 identifies three regimes of delay: (a) for 2 < σ < σ-bar, only temporary simultaneous R&D is possible as a delay; (b) for σ ∈ [σ-bar, σ-bar-bar), delay must include temporary regime switches between the clean-only and dirty-only innovation regimes; (c) for σ > σ-bar-bar, delay must include a stagnation period with no R&D at all. The numerical example shows that for σ = 2.5 and σ = 3, delayed paths involve a flat simultaneous-research segment (mc = 1/2). For σ = 5 and σ = 7, equilibrium paths involve switches between clean-only and dirty-only regimes. For σ = 8 and σ = 9, paths contain vertical stagnation sections and multiple regime switches, with clean innovation peaking, falling, then rising again before converging to the clean steady state.

Q4. What does the welfare analysis reveal about the costs of delayed transition?

Under the calibrated model (σ = 1.5, θc,0 = 0.177), three equilibrium paths coexist under the Pigouvian carbon tax, corresponding to no delay, short delay, and long delay in clean innovation. Paths with delay accumulate more dirty capital (Qd,∞ > Qd,0), creating more stranded assets in the long run. Figure 4 shows that, at calibrated emission intensity (ad = 0.198), the clean-only path dominates in welfare whenever multiple equilibria arise. However, at a counterfactually low pollution intensity (ad = 0.0198, one-tenth of calibrated), the planner may prefer some temporary dirty innovation when the clean sector starts small, because investment complementarities in the (larger) dirty sector generate higher short-run consumption growth that outweighs the smaller pollution cost.

Q5. Why does a Pigouvian carbon tax fail to coordinate the transition, and what instruments can succeed?

A Pigouvian tax changes the marginal cost of emissions and affects relative profitability, but it does not fully control relative innovation profitability because strategic complementarities within a sector persist: total innovation in a sector still raises marginal returns for all firms in it, and the complementarity can dominate the tax effect. An emission cap, by contrast, fixes the quantity of dirty output (given the Leontief emissions-to-output structure), which mutes the complementarity: expanding dirty productivity no longer pays if the quantity cap is binding. A minimum clean revenue guarantee sets a floor on clean firms’ profits that controls relative profitability directly without taxing the dirty sector. A dirty R&D tax raises the marginal cost of dirty research, shifting the innovation regime border and eliminating dirty equilibrium paths. A contingent super-Pigouvian carbon tax (above the social cost of carbon) that activates only when the economy innovates in the dirty sector also works. All of these require policy commitment over the duration of the multiple-equilibria region; without commitment they fail.

Q6. How does the paper relate to and differ from Acemoglu et al. (2012)?

The model starts from Acemoglu et al. (2012) but reaches a qualitatively different policy conclusion. Acemoglu et al. (2012) acknowledge the multiplicity of equilibria in their appendix but restrict their analysis to initial conditions and policies that make equilibrium unique, concluding that a Pigouvian tax combined with an R&D subsidy is sufficient for the optimal transition. This paper shows that when forward-looking expectations and investment complementarities are fully accounted for, the coordination failure is separate from the pollution and monopoly externalities, and a Pigouvian tax — even when optimal — does not resolve it. The paper also differs by using infinite patent length (vs. one-period) and an integrated labor market (vs. segmented), though Appendices OA3.2 and OA4 show the qualitative conclusions are robust to these modeling choices.

Q7. How does the paper relate to the stranded asset literature?

Van der Ploeg and Rezai (2020) and Kalkuhl et al. (2020) explain asset stranding through policy uncertainty, distributional effects, or disordered transition. This paper provides a complementary explanation: excess dirty investment and asset stranding can occur even under a committed, fully optimal Pigouvian tax — not because of uncertainty, but because of rational coordination failure. Firms continue investing in polluting technologies, knowing a clean steady state is inevitable, because strategic complementarities make the dirty sector temporarily attractive when the dirty sector is larger. The amount of stranded assets varies across equilibria: the longer the delay in clean innovation, the larger the accumulated stock of ultimately worthless dirty technology capital (Qd,∞ > Qd,0).

Q8. What role do knowledge spillovers and cross-sectoral knowledge externalities play?

The baseline model assumes knowledge spillovers within sectors (quality in sector j benefits from sector-wide average quality Qj). The Online Appendix (OA3) shows that inter-sectoral knowledge spillovers (parameter χ) do not affect complementarities at all, because knowledge stock is predetermined and current rival innovation cannot affect one’s own value through the knowledge channel. Learning-by-doing production spillovers (parameter ε) strengthen complementarities. The general condition for self-fulfilling prophecies in the extended model is ψ > max{0, -η}, where ψ = (1+ε)(σ-1)(1-α)/(1-ωα) - 1 and η measures own-sector knowledge advantage in innovation productivity. The baseline model (ε=0, ω=1) gives ψ = σ-2, recovering the σ > 2 condition.

Q9. What are the policy implications and their scope conditions?

The main policy implication is that a single Pigouvian carbon tax is insufficient for the optimal green transition even if credibly committed to; a coordination device is necessary as a second instrument. Scope conditions: (1) This conclusion holds whenever σ > 1 under optimal industry policy (which internalizes monopoly and spillover externalities) — the threshold is lower than σ > 2 in the unregulated economy. (2) The preferred coordination device (revenue guarantee, emission cap, dirty R&D tax, or contingent super-Pigouvian tax) depends on institutional constraints. (3) All coordination devices require policy commitment for the duration of the multiple-equilibria region. (4) The conclusion that the clean-only path is welfare-superior when multiple equilibria arise holds at calibrated emission intensity; at very low pollution intensity the planner might prefer some temporary dirty innovation. (5) The analysis abstracts from uncertainty, heterogeneous beliefs, large players, multiple abatement options, and physical capital — directions for future quantitative work.

Q10. What is the role of impatience (ρ) and patent length in the size of the coordination problem?

Proposition 3 shows that the overlap (the range of initial conditions admitting multiple equilibria) decreases with impatience ρ. When ρ is large, investors discount future profits heavily, limiting how far ahead expectations can drive current investment choices. In the limit of infinite impatience, only current profit matters and the game collapses to a static one-period coordination problem (Section 3.3). Shorter patent length, modeled as a Poisson patent infringement risk ι (OA3.2), acts identically to higher ρ in the equilibrium dynamics: the dynamics of the model with infringement risk ι are identical to the baseline with ρ replaced by ρ + ι. Hence shorter patents shrink the overlap, and policy must subsidize R&D to compensate for the excessively short investment horizon.

Key Concepts

Strategic investment complementarity: Within-sector R&D is a strategic complement when σ > 2: one firm’s innovation raises the return to other firms’ innovation in the same sector, because the demand shift effect (innovation increases sector expenditure share) outweighs the business-stealing effect (innovation dilutes rivals’ profit share). This is not a knowledge spillover but a demand externality operating through the market size of the innovating sector.

Overlap: The range of initial clean market shares θc,0 from which both the clean and dirty corner steady states can be reached in a rational expectations equilibrium. The overlap exists if and only if σ > 2 in the unregulated economy (σ > 1 under optimal industry policy), grows with the substitution elasticity σ, and shrinks with impatience ρ or shorter patent length.

Market valuation share (mc): The share of the clean sector in the total marginal value of innovation across sectors, defined as mc = Qcλc / (Qcλc + Qdλd). When mc > 1/2, the economy is in the clean-only innovation regime; when mc < 1/2, in the dirty-only regime; when mc = 1/2, simultaneous research is active. Because mc is a forward-looking, continuous variable, it captures investors’ collective expectation about future market conditions and directly determines the direction of technical change.

Self-fulfilling prophecy (in innovation): An equilibrium in which investors’ shared belief about the future direction of innovation is rational precisely because all investors, acting on that belief, make it come true. If all investors expect the dirty sector to remain large, they concentrate R&D there, the dirty sector grows, and the belief is confirmed. The same logic applies to clean beliefs. In the paper’s context, self-fulfilling prophecies extend to the speed of transition: even if firms agree the economy will eventually go clean, pessimistic beliefs about timing can rationally support periods of dirty innovation before the switch.

Delayed transition: An equilibrium path in which the economy ultimately converges to the clean steady state but investors temporarily concentrate R&D in the dirty sector before switching permanently to clean. The delay generates more stranded dirty assets (a higher terminal dirty technology stock Qd,∞) and higher short-run growth (via dirty-sector complementarities) relative to the fast-transition path. Multiple delayed paths may coexist, distinguished by the length of the dirty innovation period and the amount of accumulated dirty capital.

Coordination device: A policy instrument that directly controls the relative profitability of clean versus dirty innovation, thereby eliminating the undesired equilibrium paths without relying solely on price incentives. The paper identifies four classes: (1) minimum clean revenue guarantee, (2) emission cap (quantity-based), (3) dirty R&D tax or clean R&D subsidy, and (4) contingent super-Pigouvian carbon tax. All require government commitment for the duration of the multiple-equilibria region. A clean R&D subsidy is inferior because it distorts labor allocation toward innovation.

Stranded assets: In this paper, the dirty technology capital that becomes economically worthless in the clean steady state. The amount of stranding is determined by the dirty technology stock at the moment the economy permanently switches to clean innovation (Qd,∞). Different equilibrium paths — fast vs. delayed transitions — imply different terminal dirty stocks and hence different quantities of stranded assets. Excess stranding relative to the social optimum is a welfare cost of coordination failure.

Unconventional Monetary Policies and Inequality

Thu, 01 Jan 2026 00:00:00 +0000

Layer 1: Overview

This paper asks whether the Federal Reserve’s unconventional monetary policies (UMP) — specifically quantitative easing (QE) and forward guidance — exacerbated income and welfare inequality in the United States during the effective lower bound (ELB) episode following the Great Recession (2009–2015). The question is empirically and theoretically contested: QE raises profits and equity prices, benefiting wealthy households who hold most equity, while simultaneously reducing unemployment, which benefits poorer households who rely almost entirely on labor income. Resolving the net effect requires a unified framework that captures both channels simultaneously, with empirically realistic responses of profits, wages, and unemployment to monetary policy.

The paper builds a medium-scale Heterogeneous Agent New Keynesian (HANK) model that incorporates: (i) a two-asset structure (liquid deposits and illiquid equity) with portfolio adjustment costs; (ii) three working statuses — employed, unemployed, and business owner — with endogenous job-finding rates determined by a search-and-matching labor market; (iii) a banking sector modeled after Gertler and Karadi (2011), with a moral-hazard leverage constraint; (iv) a substantial fixed cost in production that, combined with wage rigidity, generates procyclical profit responses to monetary policy shocks — a feature absent from standard New Keynesian models and critical for capturing benefits to wealthy households; and (v) an occasionally binding ELB constraint with QE modeled as central bank asset purchases and forward guidance modeled as exogenous expected ELB durations following Jones (2017). The model is calibrated to match the 2007 Survey of Consumer Finances (SCF), targeting the top decile’s share of wealth (~70%), income composition across wealth groups, and standard labor market and financial sector moments. Remaining parameters are estimated using Bayesian methods on U.S. quarterly data from 1992 Q1 to 2018 Q4, using ten observables (output, consumption, investment, inflation, nominal interest rate, real wage, unemployment, lump-sum transfers, profits, and Federal Reserve assets), with the ELB regime handled via an inversion filter and the Kulish-Jones method for exogenous ELB durations.

At the posterior mode, the model attributes the Great Recession primarily to a series of large negative risk premium shocks around 2008–2009, causing investment to fall by more than 20% relative to the pre-crisis level. The central counterfactual compares the actual ELB episode (with UMP) against a scenario where the central bank held its balance sheet constant and allowed ELB durations to be determined endogenously by fundamentals. Between 2009 and 2015, UMP on average produced: a 3.3% increase in profits, a 0.9% increase in equity prices, a 1.5 percentage-point reduction in the unemployment rate, and only a 0.1% increase in real wages (reflecting high estimated wage rigidity). Output and investment were higher by approximately 1% and 3% respectively on average, with profits rising as much as 8% during the ELB episode.

These aggregate effects translated into non-linear distributional outcomes. For the Gini index, lower unemployment reduced the income Gini by up to 0.6 percentage points, but this was offset by about 80% by the increase in profits and equity prices — leaving only a marginal net Gini reduction of 0.04 percentage points on average. When computed for the bottom 90% alone, the Gini reduction was more pronounced because that group relies overwhelmingly on labor income. However, the income share of the top 10% rose by an average of 0.17 percentage points, driven mainly by higher profits and equity prices. Thus the answer to whether UMP raised inequality is measure-dependent: UMP reduced within-bottom-90% inequality while widening the top-decile income gap.

Welfare gains (consumption equivalents over the ELB episode) were U-shaped across the wealth distribution: the average gain was 0.27% of lifetime consumption, but households at both extremes gained more than the middle. The bottom 10% benefited from higher job-finding rates (gaining ~0.3%), the top 10% from profits and equity prices (also ~0.3%), and the top 1% gained ~0.33%. The middle 60% gained only ~0.26%. By working status, business owners gained the most (0.82%), followed by the unemployed (0.35%) and the employed (0.27%).

Decomposing UMP into QE and forward guidance, the paper finds that forward guidance accounted for approximately 55% of total UMP stimulus. Forward guidance amplified both the aggregate and distributional effects of asset purchases: QE alone raised the top 10% income share by about 0.1 percentage point, and forward guidance added a further 0.09 percentage point increase. Forward guidance lowered the overall Gini by about 0.05 percentage points more than QE alone around 2013, and reduced the bottom-90% Gini by an additional 0.2 percentage points during the same period. The interaction intensified what the paper calls a “hollowing out” of the middle class: forward guidance further reduced middle-60% income shares while leaving bottom-10% shares nearly unchanged, because the additional stimulus disproportionately raised profits and equity prices (by about 2% and 1%, respectively, between 2011 and 2014).

Comparing QE with a hypothetical conventional monetary policy (CMP) that would have allowed the nominal rate to drop to approximately -1%, the paper finds that CMP would have produced larger aggregate stimulus than QE but more adverse distributional effects. Under CMP, lower financing costs disproportionately boosted bank net worth, indirectly raising profits and benefiting wealthy households even more than QE did. Under QE, central bank asset purchases crowded out private bank investment by reducing expected equity returns even as they raised equity prices, partially dampening the profitability gains to the financial sector. Consequently, CMP would have delivered above-average welfare gains only to the bottom 1% (debtors benefiting from lower real rates) and the top 10% (through larger bank profit effects), while the broad middle class would have fared no better and in some dimensions worse.

The paper’s key methodological contribution is the first Bayesian estimation of a HANK model with an occasionally binding ELB constraint. Its key substantive finding is that standard NK models, which generate countercyclical profits, systematically understate the benefits that expansionary monetary policy delivers to wealthy households, producing a misleading or incomplete picture of the distributional effects of monetary policy.

In depth

Q1. What is the model’s identification strategy and how is the ELB period handled in estimation?

The model is estimated with Bayesian methods using an inversion filter (following Guerrieri and Iacoviello 2017 and Cuba-Borda et al. 2019) on ten quarterly observables from 1992 Q1 to 2018 Q4. The key identification challenge is the occasionally binding ELB constraint. The paper follows Kulish et al. (2014) and Jones (2017), treating the ELB as a temporary alternative regime with exogenous expected durations. These expected durations are themselves estimated as latent variables, with priors informed by the New York Fed’s primary dealer survey. The Metropolis-Hastings algorithm is used for structural parameters (treating ELB durations as fixed in each draw), while ELB durations are drawn separately using a discrete uniform proposal density. To make estimation computationally feasible given the large idiosyncratic state space, the paper follows Bayer and Luetticke (2020) and updates only the subset of the model Jacobian corresponding to ‘aggregate’ and ‘summary’ equations during each iteration, leaving the ‘idiosyncratic’ blocks fixed across estimated parameters.

Q2. What are the main mechanisms by which UMP affects inequality and how does the model distinguish them empirically?

The paper identifies four main channels: (1) Profit and equity price channel — QE raises equity prices and reduces financing costs, increasing profits and the dividend rate on illiquid assets. Because the top decile holds ~70% of total wealth overwhelmingly in the form of equity, with capital and business income accounting for ~50% of their income, this channel benefits the wealthy disproportionately. (2) Unemployment channel — lower interest rates stimulate demand and raise the job-finding rate. Because households at the bottom of the wealth distribution are more likely to be unemployed at the onset of the ELB episode (8.75% of the bottom decile vs. 6.54% in the middle quintile in 2009 Q1), this channel is progressive. (3) Wage channel — nominal and real wage rigidity (only one-fifth of the real wage adjusts to labor productivity changes) means that the wage channel is very weak; average real wages rose by only 0.1% due to UMP. (4) Inflation/redistribution channel — forward guidance generates inflationary expectations that compress real rates, redistributing from savers to debtors. The empirical decomposition is performed by first isolating QE alone (endogenizing ELB durations) and then comparing to the full UMP scenario (exogenous ELB durations), attributing the residual effect to forward guidance.

Q3. What is the key modeling innovation regarding profits, and why does it matter for inequality?

Standard New Keynesian models generate countercyclical profit responses to monetary policy shocks: when demand rises, price rigidity keeps prices sticky while factor prices (wages) adjust upward, squeezing markups and reducing profits. This contradicts empirical evidence from structural VARs, which show procyclical profits. The paper introduces three interacting features that resolve this: (a) a substantial fixed cost of production calibrated to roughly 20% of steady-state output, so that average production cost falls even as marginal cost rises, boosting net profits; (b) wage rigidity with search-and-matching frictions, so that real wages respond very weakly to monetary shocks; and (c) a banking sector with a financial accelerator, so that rising equity prices boost banks’ net worth and their investment demand, further amplifying profits. Without procyclical profits, the model would understate the benefits wealthy households (whose income depends heavily on profits and equity returns) gain from expansionary monetary policy, producing an incomplete picture of distributional effects.

Q4. What heterogeneity in households’ balance sheets and income composition is documented, and how does it shape distributional results?

Using the 2007 SCF, the paper documents stark composition differences. The bottom 80% of the wealth distribution derives ~80% of income from labor, with transfer income making up most of the rest. The top 10% derives about 50% from labor and 50% from capital (equity and business income). For the top 0.1%, labor income is only 16% and capital/business income is about 83–85%. In the model, the top 10% hold about 70% of total wealth, overwhelmingly in illiquid equity. These composition differences mean that any policy raising profits and equity prices is strongly progressive at the top and neutral-to-mild at the bottom, while any policy reducing unemployment is strongly progressive at the bottom. The interplay of these two forces explains why UMP simultaneously reduces bottom-90% inequality (through the unemployment channel) and widens the top-vs.-rest gap (through the profit and equity channel), and why welfare gains are U-shaped rather than monotone.

Q5. What is the welfare accounting methodology and what are the key welfare findings?

Welfare gains are measured as consumption equivalents — the fraction of lifetime consumption that a household in the counterfactual (no UMP) scenario would be willing to forgo to enjoy the UMP outcome. Households are sorted into wealth groups based on their 2009 Q1 wealth position (so group composition is not affected by UMP), and the same households are followed throughout the episode. Beyond the sample end (2018 Q4), no further shocks are assumed. The average welfare gain at the posterior mode is 0.27% of lifetime consumption. Bottom 10%: ~0.3% (driven by higher job-finding rates). Top 10%: ~0.3% (driven by profits and equity gains). Top 1%: ~0.33%. Middle 60%: ~0.26%. Business owners: 0.82%. The unemployed: 0.35%. The employed: 0.27%. Critically, the welfare gaps between extremes and middle are smaller than the income gaps, because anticipated tapering after the sample implies lower future profits and equity prices for wealthy households, narrowing their long-term advantage.

Q6. How do the contributions of QE and forward guidance compare in aggregate and distributional terms?

Forward guidance accounted for approximately 55% of the total UMP stimulus at the posterior mode. Exogenous expected ELB durations exceeded endogenous (fundamentals-based) durations by 1–2 quarters on average, and sometimes by up to 8 quarters, with the divergence widening from 2011 onward. In distributional terms, QE alone initially reduced the bottom-90% Gini and raised the top 10% income share by about 0.1 percentage point. Forward guidance amplified both effects: it lowered the overall Gini by an additional ~0.05 pp and the bottom-90% Gini by an additional 0.2 pp around 2013, but also added a further ~0.09 pp to the top 10% income share between 2011 and 2014. The amplification occurred because forward guidance raised profits and equity prices by about 2% and 1% respectively during that window, intensifying the income concentration at the top while also stimulating job creation at the bottom. The middle class saw its income share further compressed.

Q7. How does QE compare with conventional monetary policy in terms of aggregate and distributional effects?

In the counterfactual CMP scenario, the nominal policy rate drops to approximately -1% and remains negative for an extended period. CMP produces larger aggregate stimulus than QE: the stimulus effects of QE were partly crowded out by general equilibrium effects, specifically QE reduced banks’ expected return on equity even as it raised equity prices, discouraging private bank investment. Under CMP, lower nominal rates instead benefit banks through lower financing costs, boosting bank net worth via an accelerator mechanism more strongly than under QE. This difference has distributional consequences: CMP would have delivered higher welfare gains only to the bottom 1% (low-wealth debtors benefiting from lower real rates on their liabilities) and the top 10% (benefiting from larger bank profits). Households in the broad middle — already employed, holding limited equity, neither heavy borrowers nor large business income recipients — would have been no better off and in some dimensions worse off under CMP. The paper thus concludes that QE had less adverse distributional effects than CMP would have had, absent the ELB constraint.

Q8. What robustness checks and sensitivity analyses are conducted?

The paper checks results against: (a) the full 10th–90th percentile range of the posterior distribution for all key findings on aggregate effects, income inequality, welfare gains, and QE vs. CMP comparisons, showing that qualitative findings are robust to parameter uncertainty; (b) a comparison between rigid-wage and flexible-wage model variants (Table A1), showing that the flexible-wage version generates countercyclical profits, a weak unemployment response, and a strong real wage response — inconsistent with empirical SVAR evidence — validating the modeling choice of high wage rigidity; (c) a structural VAR analysis on U.S. data confirming procyclical profits, weak real wage responses, and significant unemployment responses to monetary policy shocks; (d) a comparison of the OccBin method (endogenous ELB durations, Guerrieri and Iacoviello 2015) vs. the Kulish-Jones method (exogenous durations) for solving the occasionally binding constraint; (e) a check that wages implied by the calibrated wage function always remain in the bargaining set, validating the equilibrium wage assumption.

Q9. What are the key differences between this paper and the closest prior work?

Kaplan, Moll, and Violante (2018) and Bayer et al. (2020) have two-asset HANK models but omit frictional labor markets, so they cannot capture how monetary policy affects employment and thus the progressive unemployment channel. Gornemann et al. (2016) include search-and-matching labor markets but only one asset, so they cannot capture the capital income benefits to wealthy households. Broer et al. (2019) and Auclert et al. (2023) identify the countercyclical profit problem but their solutions (wage rigidity alone) produce procyclical profits that are too weak quantitatively. This paper combines fixed costs, wage rigidity, and a banking sector to produce procyclical profits quantitatively consistent with SVAR evidence. On unconventional policy specifically, Lenza and Slacalek (2018) and Casiraghi et al. (2018) study ECB QE with partial equilibrium methods and find inequality-reducing effects; Bivens (2015) and Montecino and Epstein (2015) reach opposite conclusions for U.S. QE. This paper is the first to study both QE and forward guidance jointly in a Bayesian-estimated HANK model with an explicitly binding ELB, and is to the author’s knowledge the first to estimate a HANK model with an occasionally binding ELB constraint.

Q10. What are the main policy implications and their scope conditions?

First, UMP’s inequality effects are measure-dependent: policies that simultaneously stimulate employment and profits can reduce within-bottom-90% inequality while widening the top-vs.-rest gap. Policymakers who cite Gini reductions and those who cite rising top-income shares are both correct, pointing to different parts of the distribution. Second, forward guidance amplifies inequality effects as much as it amplifies aggregate effects, so its use carries a distributional cost concentrated at the top of the distribution. Third, QE had less adverse distributional effects than conventional monetary policy would have had, suggesting that concerns about QE’s inequality effects should be placed in context of the ELB constraint — the relevant comparison is not QE vs. no policy but QE vs. CMP with the ELB absent. Fourth, models that generate countercyclical profits will systematically understate benefits to the wealthy and potentially reach qualitatively different conclusions about whether monetary policy raises or reduces inequality. These findings are scoped to the U.S. Great Recession ELB episode, estimated with the specific HANK model structure and Bayesian posterior; findings may differ for different financial structures, more generous unemployment insurance, or different asset price dynamics.

Q11. What drives the Great Recession in the model and how is UMP modeled mechanically?

At the posterior mode, the Great Recession is primarily attributed to a series of large negative risk premium shocks (shocks to banks’ discount factor) around 2008–2009, which caused banks to sharply contract their investment, leading to the investment collapse (>20% below pre-crisis). QE is modeled following Gertler and Karadi (2011): the central bank issues bonds (sold to the private sector) and uses proceeds to purchase equity directly, converting non-productive asset demand into productive capital demand and raising equity prices and investment. Forward guidance is modeled as setting exogenous expected ELB durations longer than would be implied endogenously by the Taylor rule fundamentals, effectively mimicking future negative interest rate shocks and inducing inflationary pressure via intertemporal substitution. The expected ELB durations at the posterior mode range from 6 to 8 quarters through 2013, falling sharply to 1–2 quarters by late 2014–2015.

Key Concepts

Heterogeneous Agent New Keynesian (HANK) model: As used in this paper, a DSGE model where households differ ex-post in idiosyncratic productivity, asset holdings (liquid deposits and illiquid equity), and employment status; combined with search-and-matching labor markets, a banking sector with leverage constraints, and a zero lower bound on the policy rate. The heterogeneity in wealth composition and income sources determines how aggregate policy shocks translate into distributional outcomes.

Procyclical profits: The property, established empirically via SVAR and reproduced in the model, that firm profits rise in response to expansionary monetary policy shocks. Standard New Keynesian models generate the opposite (countercyclical profits) because price rigidity compresses markups when demand rises. In this paper, the combination of large fixed costs in production, wage rigidity, and a banking sector financial accelerator is required to generate quantitatively realistic procyclical profit responses.

Effective lower bound (ELB) episode: The period from 2009 Q1 to 2015 Q4 during which the Federal Reserve’s policy rate was constrained at zero. In the model, this is treated as a temporary alternative regime with exogenous expected durations; when the policy rate hits the ELB, the central bank can only affect the economy through asset purchases (QE) and forward guidance.

Forward guidance (as exogenous expected ELB durations): In this paper’s framework, forward guidance is operationalized as the central bank committing to maintain the policy rate at zero for a longer period than the endogenous (fundamentals-based) Taylor rule would prescribe. This is parameterized as an exogenous expected ELB duration that exceeds the endogenous one, creating anticipations of future negative interest rate shocks and thus stimulating activity through intertemporal substitution.

Consumption equivalent welfare gain: The fraction of lifetime consumption that a household in the counterfactual scenario (no UMP) would be willing to forgo in order to instead experience the outcomes under UMP. Used to compare welfare across heterogeneous households in a cardinal, utility-based metric rather than income alone.

Business owner working status: A third working status (alongside employed and unemployed), following Bayer et al. (2019), in which households receive a fixed fraction of aggregate profits as income without supplying labor. Business owners transition into and out of this status exogenously and are the highest-income group in the model, calibrated to match the top-decile’s share of liquid assets and the income composition data showing that capital and business income dominate the very top of the wealth distribution.

Inversion filter: The likelihood evaluation method used in this paper for Bayesian estimation, following Guerrieri and Iacoviello (2017). Rather than running a Kalman filter, structural shocks are backed out directly by inverting the linear solution of the model given the observed data and a given set of expected ELB durations. This avoids continuously updating the large state-transition matrix and makes estimation computationally feasible.

University Research and the Market for Higher Education

Thu, 01 Jan 2026 00:00:00 +0000

Layer 1: Overview

This paper proposes that university R&D is determined endogenously by competition for tuition and talented students in the market for higher education, and asks why universities fund research internally with tuition despite negligible returns to patenting. Motivation: between 2000 and 2018 U.S. universities accounted for 13% of aggregate R&D spending and 53% of all basic-research spending, yet in 2018 over 25% of university research was internally funded (25.54% in 2018; federal government 52.97%) while between 1991 and 2018 the median university earned patent licensing revenue totaling less than 2% of its R&D expenditure. Internal funds therefore come essentially from tuition.

Approach: (1) four stylized facts from administrative microdata (IPEDS, NSF HERD survey covering 916 universities / 99.1% of sector R&D, AUTM patent-licensing survey, Web of Science / Leiden bibliometrics); (2) a causal natural experiment; (3) a general-equilibrium model of the higher-education sector with heterogeneous universities choosing teaching and research, calibrated to U.S. data; and (4) policy counterfactuals.

Causal evidence: the authors exploit the 1998-2003 doubling of the NIH budget (from $13.6bn to $27.1bn) using a Bartik shift-share instrument built from each university’s pre-period (1993-1997) share of federal life-science grants, regressing the change in net tuition (1993-1997 to 2004-2008) on the instrumented change in R&D per student, with state-clustered standard errors and state-specific trends. The benchmark estimate is that a $1.00 increase in R&D spending per student raises tuition by $0.15 (s.e. 0.05) — universities recoup up to 15% of R&D through higher tuition. Across specifications the effect ranges $0.10-$0.15; it is driven by research universities (non-liberal-arts), is statistically insignificant for liberal arts colleges, and a placebo using student-amenities spending shows no significant effect. The point estimate is about 60% larger at private non-profits than publics, but that difference is not statistically significant.

Model and mechanism: education quality q = k^ωk * z̄^ωz * eT^ωe depends on intangible knowledge capital k (accumulated via research, k’ = k^γk * eR^γe), peer ability z̄, and teaching spending. Universities maximize discounted education quality, funding research from tuition. Equilibrium features an endogenous college hierarchy with two-dimensional sorting by ability and family income. The research share sR rises with the steepness of the college quality-ladder Σq/Σk; when students are highly stratified or tuition rises sharply with rank, universities invest in research even if the direct contribution to teaching (ωk) is small — research persists even as ωk→0 (acting as a pure signal). Incentives fall when intangible capital is highly dispersed across colleges.

Calibration matches the joint distribution of research, tuition, and student ability, plus untargeted R&D dispersion; simulated NIH expansion yields $0.18 per $1 in steady state and $0.11 along the transition, bracketing the empirical $0.10-$0.15.

Policy findings (long-run, vs baseline): removing all need-based federal tuition subsidies cuts university research by 8.1% (replacing progressive with revenue-neutral flat tuition subsidy: -2.2%); progressive aid compresses revenue dispersion, steepens the quality-ladder, and raises the research share (+0.8 pp). Removing all federal research grants cuts research by 69.1% — only 6.9 pp below the government’s 76% funding share, implying crowding-out: the meritocratic grant structure concentrates funds at top schools, flattening the ladder and cutting the research share by 16.4 pp. A revenue-neutral flat research subsidy would instead raise research by 14.8%, human capital by 9.6%, and output by 11.1%.

In depth

Q1. What is the identification strategy and what are the main threats to it?

A Bartik/shift-share IV exploiting the 1998-2003 NIH budget doubling. Each university’s change in R&D is instrumented by its pre-period (1993-1997) share of all federal life-science research grants. Relevance: NIH was the bulk of federal life-science funding before the shock and did not substantially change award criteria, so high-share schools received mechanically larger funding increases. Exogeneity requires that universities did not systematically invest in life-science research in the pre-period in anticipation of the expansion. The estimation is in long-differences comparing steady states; standard errors are clustered at the state level with state-specific tuition trends. Threats: the NIH expansion occurs at a common point in time, so it may correlate with other contemporaneous market changes; initially larger or higher-quality research universities might have raised tuition for reasons unrelated to R&D. The authors address this with group-specific time trends (public/private, pre-existing life-science status, school size, initial quality via faculty-student ratio) and pre-trend controls (1987-1992 faculty-student ratio, FTE size, life-science status). A limitation the authors acknowledge: they cannot test the effect on subsequent student ability because ability proxies are only available after the intervention.

Q2. What are the main mechanisms and how are they distinguished?

The college quality-ladder Σq/Σk (the cross-sectional elasticity of education quality with respect to intangible capital) is the sufficient statistic for research incentives. Equation (14) decomposes it into three channels: (i) the direct teaching contribution of research ωk; (ii) attracting better students, ωz × Σz̄/Σk; and (iii) charging higher tuition, ωe × ΣR/Σk. Channels (ii) and (iii) flow from competition for talented students and tuition and can dominate even when ωk is tiny. Empirically, Σz̄/Σk maps to the cross-sectional elasticity of student ability w.r.t. research (Figure 3) and ΣR/Σk to the elasticity of tuition w.r.t. research (Figure 4), so the calibration disciplines these channels with observable cross-sectional relationships.

Q3. What heterogeneity is documented?

The tuition effect is concentrated in research universities (non-liberal-arts), with a larger, highly significant point estimate; for liberal arts colleges the NIH shock has no statistically significant effect on tuition (the authors caution the LAC sample is smaller — ~32% of institutions, ~24% of FTE — and more heterogeneous, so power may be insufficient). The effect appears ~60% stronger at private non-profits than publics, but the difference is not statistically significant. Across the model, top schools and bottom schools both invest less in research when intangible capital is highly dispersed (top schools face weak incentives to improve already-secure rank; bottom schools find climbing too costly).

Q4. What robustness checks are run?

Empirically: adding pre-trend controls (column 3) leaves estimates intact; splitting by NLA vs LAC; and a placebo replacing R&D with student-services (amenities) spending, which yields no significant effect, rejecting spurious cross-category correlation. In the model: (1) the limiting case ωk→0 where research is a pure signal — the research share falls from 8.8% to 2.4% of tuition but stays strictly positive, and policy effects retain 50% (tuition-subsidy removal: -0.4 pp vs -0.8) and 66% (research-subsidy removal: +10.8 vs +16.4 pp) of their magnitude; (2) allowing some teaching expenditure to also enter intangible-capital production (γT>0), where the research share falls from 8.8% to 4.7% and policy effects moderate (-0.4 pp and +7.1 pp). In both, existing tuition policies still boost research and federal research grants still crowd it out.

Q5. How does this relate to and differ from prior work?

It builds on equilibrium higher-education models — Epple, Romano & Sieg (2006) (quality maximization, exogenous endowment hierarchy, finite universities with market power) and Cai & Heathcote (2022) (competitive, constant-returns technology) — but endogenizes university R&D alongside teaching. A theoretical contribution is proving existence of a unique dynamic equilibrium with quality maximization and an endogenous college-quality hierarchy with a continuum of colleges; Cai & Heathcote argued no quality-maximization equilibrium exists when colleges are ex-ante identical (all want to be at the top), which this paper resolves via the endogenous knowledge hierarchy. It contributes to the economics of science / university-R&D literature by adding market-driven incentives, and to the basic-research-subsidy literature (Akcigit et al.) by showing universities have private incentives to do basic research, implying the need for government subsidy may be smaller than the standard Nelson/Arrow/Rosenberg view holds.

Q6. What are the policy implications and their scope conditions?

Two main implications. First, a novel complementarity between equity and innovation: progressive need-based tuition aid compresses revenue dispersion across colleges, makes them more similar, steepens the quality-ladder, and raises research (+8.1% relative to a no-subsidy world; flat subsidy gives only ~one-quarter of that, +2.2%). Second, current meritocratic federal research grants partially crowd out internal research and raise educational inequality by concentrating resources at top schools; removing them cuts research by 69.1% (only 6.9 pp below the 76% federal share, the gap being the crowding-out). A revenue-neutral flat research subsidy would raise research by 14.8%, human capital 9.6%, and output 11.1%, eliminating the equity-innovation trade-off because it lowers research cost without altering market structure. Scope conditions: these are long-run steady-state comparisons in a calibrated model of 4-year public and private non-profit U.S. institutions; magnitudes depend on the hard-to-measure ωk and on the research-technology specification, as the robustness exercises show.

Q7. Why do universities fund research from tuition rather than patents, and does the model rationalize it?

Because patent licensing is too small (median <2% of R&D, 1991-2018) to fund the >25% of R&D that is internal, and unrestricted operating funds are composed almost entirely of tuition (much of it from unrecovered facilities-and-administration costs on sponsored projects — roughly $7bn in 2018). The model rationalizes diverting tuition to research because research raises education quality and thus students’ willingness to pay, so in a competitive sector students accept it. The model also replicates the joint pattern that higher-R&D universities are higher-ranked, attract wealthier and abler students, and charge higher tuition.

Q8. What are the sources of inefficiency in the model?

Two. First, borrowing constraints prevent efficient sorting of students by ability (a social planner would send the ablest to the best colleges, but students are limited by parental capacity to pay). Second, university knowledge has positive spillovers to the real economy (calibrated ιk = 0.1) that colleges do not internalize, causing under-investment; however, quality-maximizing colleges face extra competitive incentives to do research, so net under- or over-investment is ambiguous and depends on stratification relative to spillover strength.

Key Concepts

College quality-ladder (Σq/Σk): The equilibrium cross-sectional elasticity of education quality with respect to a university’s intangible knowledge capital — a sufficient statistic for a university’s private incentive to invest in research. Steeper ladder (more stratification, tuition rising more with rank) means stronger research incentives.

Intangible (knowledge) capital k: Institution-specific intangible capital accumulated by investing in research (k’ = k^γk eR^γe). It is primarily frontier knowledge and ideas exposed to students, but also networks, recruiting, labs, and methods; it can act purely as a reputation signal in the limiting case ωk→0.

Research share (sR): The share of a university’s tuition revenue allocated to research in equilibrium (≈8.8% under existing policies). It increases with college forward-lookingness (βc) and the steepness of the quality-ladder, and decreases with the dispersion of intangible capital across colleges.

Crowding-out of internal research: In the paper’s sense, the phenomenon whereby federal grants, by concentrating funds at top schools, raise the dispersion of research (Σk), flatten the quality-ladder (Σq/Σk), lower the research share, and thereby reduce universities’ internal research spending — so total research rises less than the government’s funding share (69.1% decline vs 76% share on removal).

Equity-innovation complementarity: The model’s finding that progressive need-based tuition aid, by compressing revenue dispersion and making colleges more similar, steepens competition and raises university research — so equity-promoting policy also boosts basic research, rather than trading off against it.

Education-innovation gap (ωk calibration): Biasi & Ma’s (2021) measure of how frontier-current a university’s curriculum is, interpreted in the model as log(k). A one-unit decrease is associated with a 0.011% rise in graduate income; normalized by its school-level standard deviation of 0.85, it is used to pin down ωk via ωk·α = .011/.85·Σk.

(Not) Thinking About the Future: Financial Information and Maternal Labor Supply

Mon, 01 Jan 0001 00:00:00 +0000

This paper investigates whether information constraints — rather than fully forward-looking choices — contribute to mothers’ reduced labor supply after childbirth, a key driver of gender inequality. The authors deploy two complementary methods in Switzerland: a representative descriptive survey of Swiss mothers aged 25–50, and a large-scale randomized controlled trial (RCT) among approximately 2,400 female public school teachers with children who work part-time.

The descriptive survey first establishes that long-term financial factors are not top of mind for mothers making labor supply decisions: only about 11% of mothers spontaneously mention pensions or long-term career considerations when asked about their post-childbirth employment choices, compared to roughly half who mention child or own well-being. Beyond salience, the survey documents substantial misperceptions: 62% of women over-estimate pension receipt under part-time work by more than 10%, and a similar share believes wage growth under low part-time hours (40% FTE) is at least as high as under 80% employment. The authors label mothers with overly optimistic beliefs on both dimensions “cost-unaware”; 42% of the sample qualifies. Cost-unawareness is more prevalent among less-educated mothers and correlates with less financial interest and more gender-conservative attitudes.

The RCT tests whether providing objective, individualized information shifts financial planning and labor supply. Teachers in treatment schools (two-thirds of all schools) were individually randomized into a treatment group viewing an informational video about the long-run earnings, pension, and life-event consequences of sustained part-time employment, plus access to a Future Calculator tool, or a placebo video on unrelated financial topics. The two-stage randomization (school-level first, then individual within treated schools) allows identification of both direct treatment effects and spillovers. Outcomes are measured in a Wave 1 post-video survey, a follow-up survey two months later, and linked administrative personnel records from the Department of Education one year post-intervention.

Main findings: treated teachers are 31.26 percentage points (58% over the pure control mean) more likely to correctly rank the relative magnitude of long- versus short-term financial factors. Demand for financial planning tools rises by 0.39 standard deviations (SD) overall and by 0.31 SD among cost-unaware women specifically. In terms of stated labor supply plans, the treatment raises planned employment for the next academic year by 1.69 percentage points (ppt) in the full sample and by 4.95 ppt (9% over the pure control mean) among cost-unaware women. These plan effects persist two months later for cost-unaware women but fade for the full sample.

Critically, stated plans translate into verified behavior: linked administrative data one year post-intervention show that cost-unaware teachers increase their contracted employment level by 3.87 ppt, or 7% over the pure control mean of 53.30% FTE. Cost-aware and overly pessimistic women do not reduce their labor supply upon learning they are better off than feared, an asymmetry consistent with agents responding more to perceived losses than gains. If the 3.87 ppt increase were sustained from age 40 onward, cost-unaware teachers would accumulate an additional 130,000 CHF in lifetime income and 40,000 CHF in pension wealth, shrinking the gender gap in lifetime income and pension receipt among teachers by approximately 18% each.

The paper is scoped to Swiss female public school teachers — a population with linear pay scales, no part-time promotion penalty, and relatively low adjustment barriers — meaning the measured lifetime earnings and pension losses likely represent a lower bound relative to other occupations. Short-term RCT findings replicate among a sample of pregnant women in the general Swiss population, and the paper argues that similar labor supply adjustment magnitudes are feasible for a broader segment of part-time working mothers.

Q: What is the central research question and why does it matter? A: The paper asks whether mothers’ post-childbirth reduction in labor supply is partly driven by information constraints — specifically, whether mothers fail to account for the full long-term financial consequences of working reduced hours. This matters because if the child penalty partly reflects uninformed choices rather than deliberate tradeoffs, standard policy tools (parental leave, childcare subsidies) may underperform precisely because their long-term financial benefits are not internalized.

Q: How prevalent is cost-unawareness among Swiss mothers? A: 62% of mothers in the descriptive survey over-estimate pension receipt under part-time work by more than 10%, a similar share believes wage growth under low part-time (40% FTE) is at least as high as under 80% employment, and 42% are overly optimistic on both dimensions simultaneously. Cost-unawareness follows an education gradient: 77% of low-education women over-estimate pension receipt versus 51% of high-education women.

Q: What share of mothers spontaneously considers long-term financial factors when deciding on their labor supply? A: Only about 11% of mothers mention any long-term financial factor (pensions, financial independence, long-term career considerations) in open-ended responses; the share is similarly low across education groups (6% low, 12% mid, 13% high). About 50% mention child or own well-being; roughly 30% raise short-term financial factors such as current childcare costs.

Q: What are the actual long-term financial stakes of the average female teacher’s part-time employment pattern in Switzerland? A: Compared to full-time employment, the average female teacher’s employment trajectory produces a 35% reduction in potential lifetime earnings (approximately 3.34 million CHF versus 5.12 million CHF). Monthly pension receipt under the part-time scenario is 31% lower overall and 43% lower from the occupational second-pillar scheme specifically — a gap comparable to the average 47.5% gender pension gap observed in the second pillar in Switzerland in 2024.

Q: How was the RCT designed and what populations were included? A: The study recruited 2,359 part-time working mothers employed as public school teachers in a German-speaking Swiss canton. A two-stage randomization assigned two-thirds of schools to treatment schools (within which teachers were individually randomized 50/50 to treatment or spillover control) and one-third to pure control schools. This design allows estimation of direct treatment effects and spillover effects. The intervention was timed to precede December–January, the period when teachers communicate their preferred employment levels for the next school year.

Q: What was the treatment intervention? A: Treated teachers watched an informational video following a representative female teacher considering an employment-level increase, covering the impact of part-time work on lifetime earnings, monthly pension receipt, and financial exposure after adverse events such as divorce; it also benchmarked these magnitudes against childcare costs. Treated teachers additionally received individualized access to the Future Calculator, an online projection tool developed with a Swiss bank, calibrated to teachers’ deterministic salary and pension schedules.

Q: Did treated teachers understand and retain the treatment information? A: Yes. Treated teachers were 31.26 ppt (58% over the pure control mean) more likely immediately after the intervention to correctly rank long- versus short-term financial factors in a vignette. Two months later, the treatment group remained significantly more likely to apply the information correctly (22.63 ppt higher), indicating the knowledge was not short-lived.

Q: How did demand for financial planning tools respond to the treatment? A: The treatment raised a financial information/tools index by 0.39 SD overall. For cost-unaware women specifically, demand for financial tools rose by 0.31 SD; cost-aware and pessimistic women showed no significant change. There was no significant average treatment effect on sign-up for an incentivized financial consultation.

Q: How large were the labor supply plan effects in the survey, and did they persist? A: For the full sample, treated teachers planned a 1.69 ppt higher employment level for the next school year immediately after the treatment, and 3.13 ppt higher in 10 years. For cost-unaware women, the short-run planned increase was 4.95 ppt (9% over the pure control mean of about 55%), and plans for 5 and 10 years into the future rose by approximately 4 ppt (6–7% over the mean). The short-run effects for cost-unaware women persisted to the two-month follow-up, while full-sample short-run effects faded.

Q: What do the linked administrative data show about actual labor supply one year post-intervention? A: Cost-unaware women in the treatment group increased their contracted employment level by 3.87 ppt relative to the pure control group (7% over the pure control mean of 53.30% FTE), closely matching the planned increase stated immediately after the treatment. Cost-aware women and the full sample showed no statistically significant shift in actual hours.

Q: What asymmetry did the authors observe between cost-unaware and cost-aware women? A: Cost-unaware (overly optimistic) women increased their labor supply upon learning the true financial costs; cost-aware and overly pessimistic women did not reduce their labor supply upon learning they were better off than expected. The authors interpret this as consistent with agents responding more to perceived losses (bad news for cost-unaware women) than to gains (good news for pessimistic women), and with cost-aware women already having incorporated the financial logic into their decisions even without precise estimates.

Q: What is the estimated lifetime impact of the observed labor supply adjustment? A: If cost-unaware teachers maintain the 3.87 ppt employment increase from age 40 to retirement, they accumulate an additional 130,000 CHF in lifetime income and 40,000 CHF in pension wealth on average. This would reduce the gender gap in both lifetime income and pension receipt among teachers by approximately 18% each.

Q: What emotional and social mechanisms did the paper document? A: The treatment initially produced significantly negative emotional responses (−0.41 SD on an emotions index overall; −0.68 SD for cost-unaware women), consistent with cognitive dissonance from information conflicting with prior beliefs. Two months later, the treatment group reported feeling more in control and less stressed, and cost-unaware women returned to a neutral emotional baseline. Treated women were also 19.61 ppt more likely to have discussed the topic with anyone, with the largest effect on conversations with partners or family.

Q: Did the treatment affect household-level labor supply — specifically, did partners reduce their hours? A: No. The authors found no evidence that partners of cost-unaware women planned to work less in response to the treatment, and women did not plan to adjust future fertility. This suggests the observed hours increase by treated cost-unaware women was not offset by partner adjustments within the household.

Q: Were there social spillover effects within schools? A: Treated teachers were 11.59 ppt more likely to report having discussed the video with colleagues. Two months later, cost-unaware control teachers in treated schools (the spillover group) showed some evidence of absorbing the general treatment message and adjusting short-term labor supply plans upward, and a noisy increase in actual employment of roughly one-third the magnitude of the direct treatment effect, though these estimates were imprecise.

Q: Why might cost-unaware women be uninformed in the first place? A: In both the descriptive survey and the RCT sample, cost-unaware women lean more gender-conservative in their attitudes and report less interest in financial topics. The authors interpret this as suggesting a lack of information (rather than mere salience or forgetting) drives cost-unawareness, implying that passive information delivery through employers or pension funds could be effective.

Q: What constraints to labor supply adjustment did the authors explore? A: In a hypothetical scenario exercise, the scenario producing the largest desired employment increase for both treatment and control groups was if the partner were more engaged (roughly double the adjustment relative to a scenario of higher pay for additional hours). The treatment group adjusted their desired employment level by an additional 0.62–2.03 ppt relative to pure control across all scenarios except relaxing conservative gender norms.

Q: How generalizable are the findings beyond the teacher sample? A: The short-term RCT findings replicated among a sample of pregnant women in the general Swiss population. The authors also document that potential net gains from increasing labor supply — net of additional childcare costs — are large for the broader population of part-time working Swiss mothers, supporting feasibility of similar-magnitude adjustments outside teaching. The teaching context likely represents a lower bound for lifetime earnings and pension losses in other professions due to the absence of a part-time promotion penalty in teaching.

Q: What are the policy implications? A: The findings suggest that default exposure to individualized financial information about the long-term costs of part-time work — delivered by employers, pension funds, or the state — could improve decision quality and labor supply. More broadly, the results imply that policies designed to increase female labor supply (parental leave reforms, childcare subsidies) may underperform if mothers do not fully internalize the financial benefits of additional hours; ensuring that families solve the correct optimization problem is a precondition for unlocking the full potential of such policies.

Child Penalty: The large and persistent reduction in women’s labor force participation and income following the birth of a first child, identified in the paper as the key driver of remaining gender inequality in the labor market in industrialized countries and a source of profound life-cycle financial consequences including reduced lifetime earnings and pension savings.

Cost-Unaware: The authors’ term for women who hold overly optimistic expectations about the financial consequences of part-time work — specifically, who over-estimate pension receipt under low part-time employment by more than 10% and who believe wage growth under low part-time is at least as high as under higher employment levels. In the descriptive survey 42% of mothers qualify on both dimensions.

Future Calculator: An online individualized projection tool developed by the authors in cooperation with a Swiss bank, calibrated to teachers’ deterministic salary and pension schedules, allowing users to estimate the long-term financial implications of different employment levels. Used both in the descriptive survey vignette and as part of the RCT treatment.

Second Pillar (Occupational Pension Scheme, PP): Switzerland’s occupational pension scheme, the pillar most heavily affected by part-time work because contributions are directly proportional to earnings above a minimum annual earnings threshold. The paper documents an average gender pension gap of 47.5% in this pillar in 2024 and a 43% lower monthly pension receipt for the average female teacher’s part-time trajectory relative to full-time employment.

Two-Stage Randomization: The experimental design used to separate direct treatment effects from spillover effects within schools. One-third of schools are assigned to a pure control group; in the remaining two-thirds, teachers are individually randomized into treatment or spillover control (untreated teachers in treated schools), enabling identification of both causal treatment impacts and social learning channels.

Information Constraint: The paper’s central mechanism — mothers’ failure to spontaneously account for the full long-term financial implications of reduced labor supply when making employment decisions, distinct from deliberate forward-looking tradeoffs. The authors document this both through the absence of long-term financial factors in open-ended decision narratives (only 11% of mothers mention them) and through systematic misperceptions of pension and wage outcomes.

Cognitive Dissonance (as used in the paper): The authors use this term to describe the initial negative emotional response (−0.41 SD overall, −0.68 SD for cost-unaware women) when treated women learn that the true financial costs of part-time work are higher than they expected — information that conflicts with prior beliefs and prior choices, producing unpleasant emotions that subsequently reverse into lower stress levels two months later.

A choice-based approach to the measurement of inflation expectations

Mon, 01 Jan 0001 00:00:00 +0000

Standard survey-based measurement of inflation expectations relies on density forecasts in which respondents assign probabilities to pre-specified inflation bins; this method has been found to induce biases through its bin structure (suggesting that values near zero are more likely), to impose cognitive demands that raise dropout rates, and to become uninformative during high-inflation episodes when responses cluster in open-ended extreme bins—making cross-time and cross-country comparisons unreliable. This paper proposes a new choice-based elicitation method rooted in decision theory that uses a bisection process: respondents first state a minimum and maximum inflation level for which they see almost no chance of actual inflation falling outside the range, avoiding external anchors, and then answer a series of binary choices from which the relevant percentiles of their subjective distribution can be inferred. Two large surveys (UK and US) and a laboratory experiment demonstrate that the method leads to well-defined expectations that fulfil both subjective and objective quality criteria, that it is neither perceived as more difficult nor more time-consuming than the density forecast standard, and that—unlike density forecasts—it is robust to differences in the state of the economy, enabling comparisons across time and countries. The method is portable and can be applied to elicit distributions over other macroeconomic variables beyond inflation.

In depth

Q1. What specific failures of density forecasts motivate the new method?

The paper identifies four problems with the standard density forecast format: (i) bin-structure bias—narrower bandwidths around zero may lead respondents to infer that near-zero inflation is more likely by design, biasing responses toward zero; (ii) cognitive demands that raise dropout rates and may introduce selection bias; (iii) sensitivity to question wording and response-scale changes; and (iv) loss of informativeness during high-inflation episodes when responses bunch in extreme open-ended bins, compounded by the incompatibility of adjusted bin structures across survey waves. During the recent surge in inflation these problems became especially visible, motivating a method more robust to the state of the inflation environment.

Q2. How does the choice-based bisection method work?

The method, building on Baillon (2008), elicits respondents’ subjective inflation distribution via a series of binary choices structured as a bisection algorithm that partitions the state space into equally likely subevents, allowing the relevant percentiles of the distribution to be recovered without imposing external anchors. The procedure begins by asking respondents for a minimum and maximum inflation level for which they believe there is “almost no chance” actual inflation falls outside the interval—avoiding the bin-structure bias of the density forecast by letting respondents define their own relevant range. Subsequent binary choices then narrow down the median, quartiles, and further quantiles according to a strict algorithm.

Q3. What do the field surveys and laboratory experiment show?

Two large surveys—one in the UK testing feasibility across multiple protocol variants, one in the US—and a laboratory experiment demonstrate that the choice-based method produces well-defined expectations fulfilling both subjective and objective quality criteria, and is neither perceived as harder nor more time-consuming than the standard density forecast. The UK survey compared two variants of the proposed “Midpoint method” against existing density forecast formats. The convergence on quality criteria across different samples and settings supports the method’s potential for adoption in large-scale central bank surveys.

Q4. What makes the method robust to the state of the economy, and why does that matter for monetary policy?

In contrast to density forecasts, the choice-based method is robust to differences in the level and volatility of inflation because respondents define their own relevant range rather than choosing among fixed pre-specified bins, so the method does not become uninformative when actual inflation is far from the bins’ central mass. This robustness allows comparisons of inflation expectations distributions across time (including across high- and low-inflation regimes) and across countries—a feature density forecasts cannot deliver without adjusting bin structures in ways that compromise comparability. For monetary authorities that use survey expectations as both an indicator and a policy tool, this portability is a key advantage.

Key concepts

density forecast : the standard survey format in which respondents assign subjective probabilities to pre-specified inflation intervals (bins); the format used by the Federal Reserve Bank of New York’s Survey of Consumer Expectations and widely adopted by central banks.

choice-based elicitation (Midpoint method) : the paper’s proposed alternative; a bisection procedure in which respondents first report a subjective min/max range and then answer binary choices, yielding quantiles of the subjective inflation distribution without imposing an external bin structure or anchors.

bisection process : the algorithmic structure in which each binary choice partitions the remaining probability mass so that successive responses identify the median, quartiles, and further quantiles of the respondent’s subjective distribution.

bin-structure bias : the distortion introduced by the density forecast’s pre-specified bins when narrower intervals near zero suggest to respondents that near-zero inflation is considered more likely by the survey designers, biasing their reported probabilities toward zero.

A Goldilocks Theory of Fiscal Deficits

Mon, 01 Jan 0001 00:00:00 +0000

This paper develops a tractable continuous-time model to study the fiscal sustainability of government deficits and the joint dynamics of public debt, with two main ingredients: an endogenous interest rate R that rises with the debt level through a convenience yield mechanism (savers value holding government bonds), and a potentially binding zero lower bound (ZLB) on the nominal interest rate. The paper’s central theoretical contribution is deriving the correct free-lunch condition: not the commonly cited $R < G$, but the stricter condition $R < G - \varphi$, where $\varphi$ captures the sensitivity of $R - G$ to debt. Even when $R < G$, accumulating more debt raises R through reduced convenience yields, and this endogenous feedback tightens fiscal sustainability. The paper maps the full deficit-debt space with a hump-shaped locus, analyzes ZLB dynamics where the deficit-debt relationship can invert, and studies the role of income inequality and tax policy. Calibrating to U.S. and Japan as of December 2019, the paper finds little room for free-lunch policies in the U.S. — a maximum permanent deficit of just over 2% of GDP at a stable debt-to-GDP ratio of 110% — while Japan is in the “inverted” ZLB regime where deficit increases can reduce debt through higher nominal growth.

Layer 1: Overview

Mian, Straub, and Sufi construct a tractable deterministic continuous-time model with savers who derive convenience utility from holding government bonds, hand-to-mouth spenders, and a monetary authority that targets inflation (except at the ZLB), to systematically analyze when deficits can be “free lunches.” The core insight is that the standard r < g analysis treats interest rates as exogenous to the debt level, but if R rises as debt accumulates — through the declining marginal convenience yield of bonds — then the condition for a free-lunch policy is not R < G but R < G − φ. This matters empirically: the paper estimates φ (the debt-to-interest-rate sensitivity) from empirical estimates of the convenience yield elasticity, and calibrates the model to U.S. and Japan December 2019 conditions. The U.S. calibration finds a maximum free-lunch deficit of just over 2% of GDP at a stable debt ratio of 110%, implying the U.S. was barely inside the free-lunch region pre-Covid. By contrast, the paper finds ample free-lunch space for Japan and an “inverted” ZLB regime in which higher deficits can reduce the debt-to-GDP ratio by stimulating nominal growth. The analysis is extended to incorporate aggregate risk, capital, debt maturity structure, and inequality — each with distinct implications for the size and location of fiscal space.

In depth

Q1. What is the deficit-debt diagram, and what is the free-lunch condition?

The deficit-debt diagram is the locus of steady-state combinations of the primary deficit z and the debt-to-GDP ratio b, derived from the government budget constraint $\dot{b} = -(G^ - R^(b))b + z$ at steady state; this locus is hump-shaped, with the maximum sustainable permanent deficit z occurring at the debt level b where $R^(b^) = G^* - \varphi(b^)$.** The hump shape arises because at low debt levels the convenience yield is high (R is low relative to G, allowing large deficits), while at high debt levels the convenience yield is saturated (R rises toward G, leaving little deficit room). The left branch of the locus — where debt levels are below b — is the free-lunch region: any permanent increase in the deficit to a value below z* raises the steady-state debt level but requires no future tax increases. The right branch — debt above b* — is the conventional region: any deficit increase must eventually be accompanied by higher taxes. The key departure from the standard r < g analysis is that R is endogenous; Proposition 1 and Corollary 1 formally establish that the correct free-lunch threshold is $R^(b_0) < G^ - \varphi(b_0)$, not simply R < G.

Q2. Why is R < G insufficient as a free-lunch condition, and what does φ capture?

The condition R < G fails as a free-lunch criterion because, when the government borrows an additional dollar and rolls it over forever, it faces two opposing budget effects: a positive cash flow of G − R from rolling over the existing debt, and a tightening of the budget constraint from the endogenous rise in R on all infra-marginal outstanding debt; the parameter φ measures the magnitude of this second effect as the semi-elasticity of R − G with respect to the log of debt. When φ is positive — as it is empirically because convenience yields are declining in debt supply — the net fiscal benefit of rolling over additional debt is G − R − φ, not G − R. An economy can exhibit R < G yet be in the conventional debt region if φ is sufficiently large that R > G − φ at the current debt level. The U.S. calibration illustrates this: the traditional R < G condition holds up to a debt ratio of 220% of GDP, but the stricter R < G − φ condition breaks down already at 110%, which is the actual boundary of the free-lunch region for the U.S.

Q3. How does the analysis change at the zero lower bound, and what is the “inverted” fiscal regime?

At the ZLB, the direction of causality reverses: instead of the debt level determining the interest rate, the debt level determines the nominal growth rate G (via aggregate demand and the Phillips curve), creating an “inverted” regime in which higher deficits can reduce rather than increase debt by stimulating nominal growth and inflating away the debt. The mechanism is: when the nominal rate is constrained at zero, fiscal expansion raises aggregate demand, which via the Phillips curve (slope κ) raises inflation, which raises nominal growth G, which accelerates the inflation of the debt ratio. If the fiscal multiplier times κ times the debt level exceeds one — a sufficient statistic condition — then higher deficits reduce the debt ratio. The paper finds this condition plausible for Japan (debt ratio ~225%, estimated κ = 0.1-0.3, multipliers of 1.5-2) but not for the U.S. in 2019. The deficit-debt locus in this regime is “backward-bending”: as the ZLB binds more tightly (lower debt), the locus can curve back and eventually allow the inverted relationship between deficits and debt.

Q4. How does income inequality affect fiscal space, and why does the ZLB reverse the sign?

Outside the ZLB, greater income inequality (a larger income share of savers relative to hand-to-mouth spenders) expands fiscal space, because savers have a higher propensity to save, which reduces the natural interest rate R and thus raises G − R and allows larger sustainable deficits; at the ZLB, greater inequality shrinks fiscal space, because it reduces aggregate demand and hence nominal growth G rather than R.* Formally, outside the ZLB: $z(b) = (v’(b)(1-x-\mu) - \rho)b$, which increases as the spender share μ falls (Corollary 3). At the ZLB, nominal growth G becomes demand-determined via equation (20), and lower μ reduces demand, lowering G and hence z(b). The policy implication is a potential conflict between redistributive policies and deficit finance: redistribution (raising μ) reduces fiscal space outside the ZLB but expands it at the ZLB. The paper notes that roughly 69% of U.S. government debt held by households is directly or indirectly held by the top 10% of the wealth distribution, making savers’ saving propensity the primary driver of the convenience yield.

Q5. What are the U.S. and Japan calibration results for fiscal space as of December 2019?

For the U.S. in December 2019, the model calibrates a maximum permanent primary deficit z of just over 2% of GDP at a stable debt-to-GDP ratio of b ≈ 110%, implying the U.S. was just inside the free-lunch region; for Japan, the model finds the economy in the inverted ZLB regime where higher deficits reduce debt by raising nominal growth.** The calibration uses empirical estimates of φ from the literature on convenience yield demand elasticities (Krishnamurthy and Vissing-Jorgensen 2012, Laubach 2009, Presbitero and Wiriadinata 2020). For the U.S., the standard r < g condition holds up to a debt ratio of 220% (the upper bound), but the binding free-lunch condition R < G − φ limits fiscal space to 110%. Deficits beyond the 2%-of-GDP limit must be financed by future tax increases or spending cuts, even though R < G throughout the range. The Japan calibration illustrates the ZLB regime: with a debt ratio already above 200%, the fiscal multiplier effect on inflation is large enough that the backward-bending locus applies, and Japan’s economy lies in the inverted region.

Q6. How does the analysis extend to aggregate risk, capital crowding-out, and debt maturity?

With aggregate risk, the free-lunch condition R < G − φ remains informative: when the condition holds on average, free-lunch policies can be designed with probability approaching one; when it fails on average, no free lunch is possible. The risk extension follows Mehrotra and Sergeyev (2020) and confirms numerically that the deterministic condition provides a valid signal for the stochastic case. Adding capital and crowding-out (Section 7.2) yields a counterintuitive finding: greater crowding-out of capital actually increases fiscal space by reducing the sensitivity of interest rates to debt (lower φ), because each additional unit of government debt displaces private capital rather than reducing convenience yields as sharply. Regarding debt maturity: issuing long-term debt reduces fiscal space at low debt levels (locking in higher interest costs), but increases it at high debt levels; this suggests that QE-style maturity shortening may constrain fiscal space as debt rises. These extensions confirm that the φ parameter — and the R < G − φ condition — is robust to a range of model ingredients, making it a practically useful criterion beyond the baseline model.

Key Concepts

free-lunch fiscal policy : a permanent increase in the primary deficit that raises steady-state debt to a new higher level without requiring any future tax increases or spending cuts; feasible only when $R^(b_0) < G^ - \varphi(b_0)$, which is strictly tighter than the standard r < g condition when φ > 0.

debt-rate sensitivity (φ) : the semi-elasticity of R − G with respect to the log of debt, capturing how much the endogenous convenience yield on government bonds falls (and hence interest rates rise) as the debt supply increases; the paper’s addition to the standard r < g framework that tightens the sustainability condition from R < G to R < G − φ; estimated empirically from convenience yield demand curves.

deficit-debt diagram : the hump-shaped locus of sustainable steady-state combinations of the primary deficit and the debt-to-GDP ratio; the left (increasing) branch is the free-lunch region where fiscal expansion is self-sustaining, and the right (decreasing) branch is the conventional region where fiscal expansion requires future tax increases.

inverted ZLB fiscal regime : the case where the nominal interest rate is zero and the deficit-debt locus bends backward, so that higher deficits reduce rather than increase the debt ratio; occurs when the fiscal multiplier is large enough that deficit-induced nominal growth more than offsets the direct debt accumulation effect; found to apply to Japan as of December 2019 but not the U.S.

convenience yield : the non-pecuniary benefit savers derive from holding government bonds (capturing liquidity, safety, and regulatory premia), modeled as the utility function v(b) for savers; the mechanism making R endogenous to debt: as debt supply rises, the marginal convenience yield v’(b) falls, pushing R toward G and shrinking fiscal space.

A model of expenditure shocks

Mon, 01 Jan 0001 00:00:00 +0000

A common observation from account-level bank data is that low-income, low-liquidity households often use additional income to repay debt rather than consume, and that household-level consumption is extremely volatile even though aggregate consumption is smooth. This paper formalizes these patterns using four new facts from the PSID: household consumption is as volatile as income (contradicting PIH); the correlation between household consumption and income growth is only about 0.2 (low); consumption growth is negatively autocorrelated (contradicting both PIH and habit models); and—a finding new to the literature—the cross-sectional correlation between consumption and income growth is far smaller among households experiencing high consumption episodes than in the full sample. The paper proposes an explanation based on stochastic consumption thresholds: unanticipated shocks such as medical expenses or vehicle repairs create time-varying minimum-consumption floors whose violation incurs large utility costs, inducing households to prioritize expenditures on these needs over income-responsive consumption and to rebuild savings after the shock. This mechanism increases the welfare cost of income fluctuations by an order of magnitude relative to standard models.

In depth

Q1. What are the four empirical facts and why do they challenge standard models?

Fact 1: for the average PSID household, consumption is as volatile as income; Fact 2: the correlation between consumption growth and income growth is about 0.2; Fact 3: household consumption growth is negatively autocorrelated; Fact 4 (new): the cross-sectional correlation between consumption and income growth is far smaller among households with high consumption than in the full sample. Fact 1 contradicts the permanent income hypothesis (PIH), under which consumption should be smoother than income. Facts 1 and 2 together cannot both be explained by liquidity constraints (which would tie consumption to current income, producing a high correlation) or by very persistent income shocks (same problem). Fact 3 contradicts habit models (which generate positive autocorrelation) and is inconsistent with PIH (which implies zero autocorrelation). Fact 4 is novel: in standard models the level of consumption barely affects the income-consumption growth relationship, so this fact requires a new explanation.

Q2. What is the expenditure shock mechanism, and how does it rationalize the four facts?

The model introduces stochastic, time-varying consumption thresholds—representing unavoidable expenditures such as medical emergencies, vehicle breakdowns, or appliance repairs—that, if violated, incur large utility costs; this forces households to prioritize meeting these minimum needs over income-proportional consumption. When a threshold shock hits, consumption jumps to meet it regardless of current income (explaining volatile, income-disconnected consumption). After the shock the household rebuilds savings, reducing consumption below its long-run level (generating negative autocorrelation). During high-consumption episodes (threshold shocks), income and consumption growth are decoupled (explaining Fact 4). Meanwhile, without a threshold shock, households are saving to self-insure against future shocks (explaining why low-income households save rather than consume when income rises).

Q3. What does the model imply for the welfare cost of income fluctuations?

The stochastic thresholds increase the welfare cost of income fluctuations by an order of magnitude relative to standard consumption models, because households must maintain precautionary buffers against the risk of hitting a threshold and being unable to meet it. The large welfare cost arises from two sources: the direct cost of violating a threshold (large utility penalty), and the precautionary motive it creates, which forces households to save at the expense of current consumption utility even when no threshold shock is present.

Q4. What empirical evidence does the paper use and what is the scope of the findings?

The PSID (post-1999 comprehensive consumption module) provides panel data on total household consumption and income; the authors use this to document all four facts, including the novel Fact 4. The negative autocorrelation of consumption growth (Fact 3) is documented in the prior literature (Blundell et al. 2008) as indicative of preference shocks or measurement error, but the paper’s model gives it a structural interpretation as evidence of expenditure shocks. The finding that consumption is volatile yet disconnected from income (Facts 1 and 2) is robust to restricting attention to nondurable consumption, ruling out durable goods as the driver. The results hold at the household level; aggregate consumption is smooth because household threshold shocks are largely idiosyncratic and average out.

Key concepts

stochastic consumption threshold : a time-varying, unanticipated minimum consumption level (representing unavoidable expenditures like medical emergencies or vehicle repairs) whose violation incurs large utility costs; the paper’s key modeling innovation.

expenditure shock : an unanticipated increase in the required minimum consumption level, representing events that force households to spend on necessities regardless of current income or savings; the proposed explanation for the four empirical facts about household consumption dynamics.

A Monetary-Fiscal Theory of Sudden Inflations

Mon, 01 Jan 0001 00:00:00 +0000

Overview

Research Question. Why do sudden inflations and currency crises occur, while symmetric sudden deflations never do? The paper asks whether treating nominal government bonds as analogous to ordinary corporate bonds — with an asymmetric payoff structure capped at face value on the upside but exposed to real losses when fiscal surpluses are insufficient — can generate a unified theory of these crises endogenously from a single model.

Intellectual Lineage and Approach. The paper sits at the intersection of two literatures. The first is the Fiscal Theory of the Price Level (FTPL), originating with Leeper (1991), Sims (1994), and Sargent and Wallace (1985), which links the real value of nominal government debt to expected future surpluses. The second is the safe-asset literature, where Holmstrom (2015) and Gorton (2017) explain that assets can circulate as safe stores of value precisely because their backing is costly to investigate and consumers rationally remain uninformed. The paper applies this information-economics logic to nominal government bonds, so that consumers normally hold bonds without investigating the government’s true fiscal capacity, and only pay the cost to investigate when real repayment doubts become sufficiently severe.

Model Structure. The model is a two-period reduced-form general equilibrium. In period 1, a representative consumer buys nominal government bonds at an interest rate set by the monetary authority. In period 2, the government must repay those bonds. The fiscal authority attempts to hit a price-level target P* by raising tax revenue, but faces a hard ceiling τ_max on the surplus it can collect — arising from Laffer limits on taxation, political constraints on austerity, or the need to fund financial-sector bailouts. The consumer has prior beliefs that τ_max is low (L) with probability π and high (H) with probability 1−π, and can pay a fixed utility cost γ to learn τ_max before deciding how many bonds to purchase.

Bond Payoff Structure and Asymmetry. The key mechanism is the asymmetric, bond-like real payoff of nominal government debt. If τ_max ≥ B1/P*, the government raises enough surplus to repay bonds fully in real terms at the price-level target; the real payoff is flat at face value (the “in-the-money” region). If τ_max < B1/P*, the government sets taxes to the ceiling τ_max and the price level rises above P* to balance the budget constraint, reducing the real payoff proportionally (the “default” region). Critically, because the nominal payoff is capped at face value, there is no upside region: governments will not run surpluses large enough to deliver a windfall to bondholders, so sudden deflations — analogous to a corporate bond being worth more than face value — cannot occur. This asymmetry is the direct source of the one-sided nature of crises.

Two Illustrative Mechanisms for Sudden Inflations. The paper numerically and analytically characterizes two triggering scenarios:

Lower surplus expectations (fiscal stress narrative, corresponding to Burnside et al. 2001 on the 1997 Asian crisis): As the probability π of a low future surplus (e.g., from a prospective banking-sector bailout) rises, the value of information about τ_max increases. In the numerical example (i = 0.05, γ = 0.13, L = 0.1), the value of information equals the cost γ at π = 0.15. For π above 0.15, consumers pay to investigate, learn τ_max = L, and refuse to purchase bonds beyond what will be repaid in real terms (B1 = τ_max = L = 0.1). The price level in period 1 rises discontinuously as a function of π at this threshold.
Interest rate increases (speculative attack narrative): As the monetary authority raises the interest rate to defend a currency, consumers demand more bonds. Larger bond quantities increase the risk that surpluses will be insufficient, raising the value of fiscal information. In the numerical example (π = 0.5, γ = 0.24, 1+i ∈ [1, 1.2]), the value of information equals γ at 1+i = 1.1 (i.e., i = 10%). For interest rates above this threshold, consumers learn τ_max = L, restrict bond purchases to what will be repaid, and the price level in period 1 jumps discontinuously. Further interest rate increases above the threshold produce only upward drift in the price level, not additional monetary tightening effects — illustrating the limits of monetary policy in fiscally stressed environments.

Theoretical Results. Two formal theorems establish generality. Theorem 1 shows that, given bond demand B1(π) such that L < B1 for all π ∈ (0,1), there exist thresholds k and γ > 0 such that the period-1 price level P1 is discontinuous as a function of π on (0, k]. Theorem 2 establishes an analogous discontinuity in P1 as a function of the interest rate i, given that B1(i) > L for all i in the relevant range.

Scope Conditions. The model is a two-period reduced form that abstracts from dynamics, multiple maturities, and secondary market trading. The informational friction is a fixed binary cost γ, not a richer signal structure. The results depend on the existence of a binding surplus ceiling τ_max; when the government is far from this ceiling (i.e., consumers’ beliefs are far from the “default boundary”), shocks produce only small, smooth price-level changes. Large discontinuous price-level jumps require the economy to be near the kink point of the bond payoff curve.

In depth

Q1. What is the fundamental analogy that drives the paper’s theory, and what economic literature does it build on?

The paper analogizes nominal government bonds to corporate bonds (following Sargent 1982’s advice that “government debt is valued according to the same economic considerations that give private debt value”). Like a corporate bond, the nominal government bond pays its face value if the underlying project (government fiscal capacity) delivers a surplus at least equal to the face value, but pays only a share of the realized surplus if the surplus falls short. This bond-like payoff — flat on the upside, proportional to outcomes on the downside — is the direct source of asymmetric crisis dynamics. The paper combines this with Holmstrom (2015) and Gorton (2017)’s framework in which safe assets function because their backing is costly to investigate, so consumers rationally remain uninformed in normal times.

Q2. What is the key information friction, and how does it generate the switch between “normal times” and crisis?

In normal times, consumers are confident that the government’s future maximum surplus τ_max is sufficient to repay bonds in real terms. The fixed utility cost γ of investigating the true surplus exceeds the benefit, so consumers remain uninformed and bonds trade at a price reflecting only uninformed prior beliefs. A crisis arises when the value of information V(.) rises above γ — either because the probability of a low surplus state rises (fiscal stress) or because the interest rate rises and consumers demand more bonds, bringing them closer to the repayment boundary. Once V > γ, consumers investigate and, upon learning τ_max = L (low surplus), refuse to hold bonds that will not be repaid in real terms, triggering a discrete upward jump in the price level.

Q3. How does the bond payoff structure explain the absence of sudden deflations?

The real payoff of a nominal government bond cannot exceed its face value: the bond is capped at face value on the upside because the government will not voluntarily raise tax surpluses to deliver a windfall to bondholders. In the event that surpluses turn out to be higher than needed (τ_max ≥ B1/P*), the government simply sets taxes to exactly repay the bonds at P* and returns no additional real value to bondholders. This is the flat portion of the payoff curve. Because there is no upside kink — no region where learning that τ_max is unexpectedly large causes the price level to fall sharply — there is no mechanism for sudden deflations symmetric to sudden inflations. The 1933 U.S. episode (Jacobson et al. 2019) is cited: when deﬂation from leaving gold would have required fiscal austerity for full real repayment, Roosevelt chose to exit the gold standard rather than allow deflation.

Q4. How does the first numerical example (lower surplus expectations) work quantitatively?

The baseline parameters are: i = 0.05, γ = 0.13, L = 0.1, H ≈ ∞, P* = 1, e1 = e2 = 1, B0 = 1, τ1 = 0.8, β = 1. The analysis is restricted to π ∈ (0, 0.3]. As π (probability that τ_max = L) rises, the value of information V(.) rises. At π = 0.15, V equals the cost γ = 0.13. For π > 0.15, consumers pay to investigate and, upon learning τ_max = L, purchase only B1 = L = 0.1 in bonds — the amount that will be repaid — causing the period-1 price level P1 to jump discontinuously from approximately 0.95 to approximately 1.13. For π ≤ 0.15, consumers remain uninformed and P1 rises only smoothly from below 1 as π increases (fewer bonds demanded as repayment risk rises, even without investigation).

Q5. How does the second numerical example (interest rate increase) work quantitatively, and what does it imply for monetary policy?

With π = 0.5, γ = 0.24, and 1+i ∈ [1, 1.2], as the monetary authority raises the interest rate, consumers demand more bonds, increasing real repayment risk and the value of information. At 1+i = 1.1 (i.e., i = 10%), V equals γ. For 1+i > 1.1, consumers investigate and learn τ_max = L; they then only purchase bonds up to the repayment limit, causing P1 to jump discontinuously to approximately 1.15. For interest rates above the threshold, further increases yield only a smooth upward slope in P1 (bond purchases are fixed in real amount but nominal revenue falls). This illustrates that the monetary authority’s ability to use higher interest rates to lower the price level is limited by the surplus constraint: once the interest rate is high enough to trigger consumer investigation and a fiscal crisis, raising rates further is inflationary rather than deflationary.

Q6. What are the two regions of the deterministic model and how do they differ in fiscal and price-level dynamics?

In the deterministic version (1-π = 0, so τ_max = L with certainty, and there is no uncertainty), the model produces two distinct regions. In the “insufficient surplus” region where τ_max < B1/P*, the fiscal authority sets taxes to their maximum τ_max, the real payoff of bonds is τ_max/B1 < 1, the period-1 price level P1 = B0/(βτ_max), and real bond revenue Π = βτ_max (constant in τ_max). Selling additional bonds does not raise additional real revenue because any extra bonds lead to a proportional rise in P2 and a fall in Q. In the “sufficient surplus” region where τ_max ≥ B1/P*, the government meets its fiscal target (τ2 = B1/P*), P2 = P* is hit, P1 = βB1/(B0P*), and Π = βB1/P* (increasing in B1). In this region, selling additional bonds does raise real revenue and lowers P1 as the government absorbs more money.

Q7. What are the two interest rate regions in the deterministic model, and what is their implication for monetary policy effectiveness?

Using B1 = B0(1+i) (debt rolled over at the chosen rate), the monetary authority has two interest-rate regions. In the “constrained” region where 1+i > τ_max P*/B0 (the surplus ceiling binds), raising i does not change the period-2 surplus (τ2 = τ_max), does not change real revenue (Π = βτ_max), and does not affect P1 — but raises P2 above the target P*. In the “unconstrained” region where 1+i ≤ τ_max P*/B0, raising i increases bond demand, increases real surplus backing, raises real revenue, and lowers P1 while P2 = P* is maintained. The boundary between these regions determines the limit of monetary policy: the monetary authority can reduce P1 by raising i only up to the point where the surplus ceiling would be hit.

Q8. How does the paper relate to and extend prior FTPL literature?

The paper is grounded in the FTPL of Leeper (1991), Sims (1994), and Cochrane (2005, 2020), in which the price level is determined by the requirement that real government liabilities equal the present value of future surpluses. The paper’s contribution is to make the information structure endogenous: consumers’ beliefs and their decision to acquire fiscal information determine whether or not the FTPL logic is operative. In normal times (consumers uninformed), the price level does not respond to changes in the maximum surplus — a result that resembles the “Ricardian” or non-FTPL regime. When consumers investigate and learn the surplus is insufficient, the connection between the surplus and the price level is restored, reproducing FTPL-type dynamics. This provides an endogenous, single-model rationale for the regime-switching behavior between FTPL and non-FTPL environments documented empirically in Bianchi and Melosi (2013, 2017) and Davig and Leeper (2006).

Q9. What is the welfare role of consumer ignorance in this framework?

Consumer ignorance of the government’s true surplus plays a dual role. On one hand, ignorance is individually rational in normal times because the cost γ of investigating exceeds the benefit V (.) when beliefs are comfortably away from the default boundary. On the other hand, following Dang et al. (2017), informed knowledge of the safe asset’s backing destroys the symmetric ignorance that supports the asset’s role as a safe store of value, reducing welfare. In this model the concern is repayment risk rather than adverse selection: the consumer fears not being repaid in real terms and chooses to investigate when that risk is sufficiently high, potentially triggering the very crisis they feared.

Q10. What are the scope conditions and limitations of the model?

The model is explicitly a two-period reduced form designed to illustrate the bond-payoff mechanism in the simplest possible setting. It abstracts from: multi-period bond maturities and secondary market trading; rich heterogeneity among consumers; endogenous monetary and fiscal policy responses beyond the simple rules specified; and the general equilibrium interactions between inflation, output, and labor markets. The information cost γ is modeled as a fixed binary cost rather than a continuous or richer signal structure. The results on discontinuous price-level jumps hold when bond demand is sufficiently large relative to L (i.e., L < B1), ensuring genuine repayment risk; when surpluses are very large relative to bond liabilities, no crisis dynamics arise.

Key Concepts

Maximum Surplus (τ_max). The paper’s name for the hard ceiling on the net tax revenue (taxes minus money transfers) the government can collect in the second period. This ceiling can arise from a Laffer limit on taxable income, political-economy constraints on austerity, or from a banking crisis requiring government transfers to bail out the financial sector. It is the paper’s analogue of a project’s liquidation value: the maximum the “project” (the government) can deliver to bondholders.

Bond-Like Payoff of Nominal Government Debt. The paper’s central structural claim: the real payoff to holding a nominal government bond is capped at face value on the upside (the government will not raise surpluses beyond what is needed to repay bonds at the price-level target) but falls proportionally below face value when τ_max is insufficient for full real repayment. This is precisely the payoff structure of a standard corporate bond — flat on the upside, proportional to recovery on the downside — and it is the source of the asymmetry between sudden inflations and the absence of sudden deflations.

Value of Information (V(.)). Defined as the difference in expected utility between a consumer who learns the true τ_max before making bond-purchase decisions and one who remains uninformed and acts only on prior beliefs π, 1−π. The consumer investigates if and only if V(.) > γ. V is zero when beliefs are certain (limπ→0 and limπ→1), can be hump-shaped in π, and is increasing in the interest rate i (through its effect on bond demand). The threshold condition V = γ defines the boundary between “normal times” (no investigation) and crisis (investigation and possible sudden inflation).

Endogenous Information Structure. The paper’s term for the property that whether consumers choose to learn the government’s fiscal capacity is itself determined within the model by the parameters of the economy (the interest rate, prior beliefs, the cost of investigation). This contrasts with models that exogenously specify whether agents are informed or not. The endogenous information structure is the mechanism by which the paper generates the two apparent regimes (FTPL-active vs. FTPL-dormant) from a single unified model.

Default Boundary. The kink point in the bond payoff curve at τ_max = B1/P*: the level of the maximum surplus at which the government exactly repays bonds in real terms at the price-level target. When beliefs or bond quantities place the economy near the default boundary, small changes in π or i can push the economy across it, triggering large price-level responses. When the economy is far from the boundary (τ_max comfortably above B1/P*), small shocks have only small smooth effects.

Sudden Inflation / Currency Crisis (as defined in this paper). A discrete, discontinuous jump in the period-1 price level P1 that occurs when consumers pass the threshold V(.) = γ and investigate the government’s fiscal capacity, finding surpluses to be insufficient. The mechanism is: informed consumers refuse to hold bonds they know will not be repaid in real terms at P*, forcing the price level to jump to clear the government’s budget constraint with fewer bonds outstanding. The paper treats sudden inflations and currency crises as the same mechanism in different institutional contexts.

Repayment Risk Premium. The markup above the risk-free rate that consumers require on government bonds to compensate for the probability that the government’s surplus will be insufficient for full real repayment (i.e., the probability that the economy is in the τ_max < B1/P* region). This premium is present even when consumers are uninformed (i.e., do not know which state of τ_max will occur), and is reflected in the consumer’s first-order condition for bond demand.

A Preferred-Habitat Model of Term Premia, Exchange Rates, and Monetary Policy Spillovers

Mon, 01 Jan 0001 00:00:00 +0000

Layer 1 — Core Argument

The paper develops a two-country preferred-habitat model in which currency and bond markets are populated by different investor clienteles — currency traders with price-elastic demand for foreign assets, and bond investors whose preferences are habitat-specific by country and maturity — with segmentation partly overcome by global arbitrageurs who have limited capital and bear mean-variance risk. Risk premia in the model are time-varying, connected across markets, and consistent with the empirical violations of Uncovered Interest Parity (UIP) and the Expectations Hypothesis (EH): in particular, currency carry trade (CCT) and bond carry trade (BCT) strategies earn abnormally high expected returns in ways that co-vary across the two markets in a manner the standard frictionless model cannot generate. Through these time-varying, connected risk premia, large-scale bond purchases (QE) lower domestic bond yields, lower foreign bond yields, and depreciate the purchasing country’s currency; short-rate cuts also lower foreign yields, but with smaller effects than bond purchases. A key structural finding, quantified in the estimated model calibrated to US and Eurozone data, is that currency returns are nearly uncorrelated with long-maturity bond returns — an exchange-rate disconnect — yet the currency market is instrumental in transmitting bond demand shocks across countries, because arbitrageurs hedge their cross-currency positions in bond markets and vice versa. Sterilized foreign-exchange interventions have strong effects on the exchange rate but weak effects on bond yields, while QE/QT has weak effects on the exchange rate but sizeable effects on foreign bond yields — a sharp asymmetry that follows directly from the disconnect.

In depth

Q1. Why do UIP and EH fail in the standard model, and what changes in this model?

In the standard model with perfect capital mobility, risk premia are constant, so the yield curve depends only on expectations of the domestic short rate and the exchange rate absorbs short-rate differentials exactly. In this model, arbitrageurs bear the residual risk when currency traders and bond clienteles are unwilling to absorb excess supply or demand at prevailing prices. Because arbitrageurs have limited capital (captured by a risk-aversion parameter a ≥ 0 that can also represent capital or Value-at-Risk constraints in reduced form), they demand compensation — time-varying risk premia — for holding currency and maturity risk. When a = 0, arbitrageurs are risk-neutral, UIP and EH both hold, and the model collapses to the standard frictionless benchmark.

Q2. What are the three types of agents and what does each do?

Currency traders hold foreign assets and have a demand that is downward-sloping (price-elastic, with slope coefficient αe ≥ 0) in the log exchange rate; their demand also shifts with a stochastic currency demand factor γt. They can be interpreted as households engaged in expenditure switching or central banks managing reserve levels. Bond investors form clienteles, each with a preferred-habitat demand for bonds of a specific country and maturity that is downward-sloping in the log bond price (slope αj(τ)) and shifts with a country-specific bond demand factor βjt; examples are pension funds and insurance companies whose liabilities are long-dated and denominated in their home currency. Global arbitrageurs trade the currency and all bonds of both countries, maximizing mean-variance utility over instantaneous wealth changes; they bridge the segmented markets and their positions pin down equilibrium risk premia.

Q3. What is the equilibrium structure and which factors drive prices?

The equilibrium exchange rate and bond prices are log-affine functions of five stochastic factors: the home short rate iHt, the foreign short rate iFt, the currency demand factor γt, and the two bond demand factors βHt and βFt. These factors follow a mean-reverting (Ornstein-Uhlenbeck) system. The equilibrium is characterized by a scalar nonlinear system (25 equations in the general case) whose solution pins down the loadings of prices on each factor. This affine structure means each asset’s risk premium is the product of the arbitrageur’s risk-aversion coefficient, the factor covariance matrix, and arbitrageur net positions, which are themselves determined by market-clearing.

Q4. How does a conventional short-rate cut transmit domestically and internationally in the model?

Following a home short-rate cut, arbitrageurs find it attractive to enter the CCT — borrow home currency, invest in foreign currency. If currency traders’ demand is price-elastic (αe > 0), arbitrageurs’ equilibrium foreign-currency holdings rise, and the expected return on the CCT rises too (arbitrageurs must be compensated for the increased risk). This attenuation effect means the foreign currency appreciates less than implied by UIP: the exchange rate response is dampened. Simultaneously, arbitrageurs enter the home BCT (borrow at the home short rate, invest in long home bonds); if home bond investors’ demand is price-elastic (αH(τ) > 0), arbitrageurs’ long-bond holdings rise and the BCT’s expected return rises, attenuating the transmission to domestic long-maturity yields (which fall less than EH would imply). A propagation effect to foreign bond yields arises through arbitrageur hedging: by taking long positions in foreign currency (CCT), arbitrageurs become exposed to the risk that the foreign short rate drops and the foreign currency depreciates; long-maturity foreign bonds provide a natural hedge (their price rises when the foreign short rate drops), so arbitrageurs increase foreign bond demand, depressing foreign yields. This international transmission of conventional policy is absent from the standard model.

Q5. How does unconventional policy (QE/QT) transmit domestically and to the exchange rate and foreign yields?

Following QE purchases of home bonds, their prices rise; arbitrageurs accommodate by holding fewer home bonds, which reduces their exposure to home short-rate risk. With less home-rate risk, arbitrageurs become more willing to hold foreign currency (which depreciates when the home short rate rises, offering a natural hedge against the home rate risk they have shed). The increased foreign-currency position in turn makes arbitrageurs more willing to hold foreign bonds (which hedge the foreign-currency position against foreign rate changes). The net result in the model is: QE lowers domestic bond yields, lowers foreign bond yields, and depreciates the home currency. The quantitative finding from the estimated model is that QE/QT effects on foreign bond yields are sizeable and stronger than those of conventional short-rate policy.

Q6. What explains the exchange-rate disconnect, and how can the currency market still transmit bond demand shocks?

In the estimated model, variance decompositions reveal that long-maturity bond yields in each country are driven primarily by bond demand factors (βHt and βFt), while the exchange rate is driven primarily by the currency demand factor (γt); short rates account for a small fraction of movements in both, and each factor type accounts for negligible variation in the other asset class’s price. The disconnect between bond yields and the exchange rate arises because bond demand shocks in the two countries move the exchange rate in opposite directions — a home bond demand shock that lowers home yields also raises the exchange rate via arbitrageur hedging, while a foreign bond demand shock moves the exchange rate in the opposite direction. These offsetting effects make the exchange rate nearly uncorrelated with long-maturity bond yields. However, bond demand shocks in one country are transmitted to bond yields in the other country through the currency market: arbitrageurs hedge their bond positions using the currency, so a shock to home bond demand moves arbitrageurs’ currency positions, which in turn affects their willingness to hold foreign bonds. Cross-country bond yield comovement is therefore positive and sizeable, despite the exchange-rate disconnect.

Q7. What are the model’s implications for foreign exchange intervention?

A sterilized purchase of foreign currency by the home or foreign central bank — which shifts the currency demand factor — has strong effects on the exchange rate but weak effects on bond yields. This follows directly from the variance decomposition: the exchange rate loads heavily on the currency demand factor and bond yields load lightly on it. The asymmetry mirrors the QE result in reverse: QE shifts bond demand factors, which load heavily onto bond yields and lightly onto the exchange rate; FX intervention shifts the currency demand factor, which loads heavily onto the exchange rate and lightly onto bond yields. The model thus delivers a sharp policy instrument separation between QE/QT (primarily a bond yield tool) and FX intervention (primarily an exchange-rate tool), with each having spillovers in the other dimension that are quantitatively weaker.

Q8. How is the relationship between currency risk premia and bond risk premia captured, and what empirical regularities does the model match?

The model’s risk premia are linked through the shared arbitrageur portfolio: the price of each risk factor is proportional to the covariance between that factor and the arbitrageur’s overall portfolio return, so a shock that changes arbitrageurs’ currency positions also changes the compensation required for bond positions, and vice versa. The estimated model is reported to match closely the violations of UIP (CCT profitability) and EH (BCT profitability) documented in the literature, and the ways in which these violations are connected — including findings that yield-curve slope differentials predict CCT profitability, and that CCT profitability declines when carried out with long-maturity rather than short-maturity bonds. These matches are described as consistent with the empirical regularities, not structural identification of the underlying causes.

Q9. What is the role of segmented versus global arbitrage, and why does the distinction matter?

The paper considers both cases. Under segmented arbitrage, separate arbitrageur pools operate in the currency market (risk aversion ae), home bond market (aH), and foreign bond market (aF); first-order conditions for each pool reflect only their own portfolio risk, so the prices of risk factors differ across markets. Under global arbitrage, a single pool of arbitrageurs trades all assets, and their shared portfolio means the price of each risk factor is the same across currency and bond markets — this is the mechanism through which bond demand shocks in one country propagate through the currency market to bond yields in the other. Global arbitrage is the primary specification; segmented arbitrage serves as a benchmark to isolate the hedging-based transmission channel that requires global positions.

Q10. How does the model relate to and extend predecessor frameworks?

The model extends Vayanos and Vila (2021) — a closed-economy preferred-habitat yield curve model — to two countries by adding a currency market and a second country’s bond market, with arbitrageurs who are global rather than country-specific. In the currency dimension, the attenuation of UIP deviations parallels Gabaix and Maggiori (2015), which models exchange-rate dynamics with financially constrained intermediaries but without a yield curve. The two-country structure allows the paper to simultaneously study term premia (EH violations), exchange rate dynamics (UIP violations), and their connection, and to quantify the effects of QE, conventional monetary policy, and FX intervention within a single internally consistent framework estimated on US-Eurozone data.

Key Concepts

Preferred-habitat demand: A bond investor’s demand for bonds of a specific country and maturity that does not arise from portfolio optimization over the full menu of available assets, but rather from institutional constraints or liability-matching motives (e.g., pension funds matching long-dated domestic liabilities). In the model, preferred-habitat demand is price-elastic with slope αj(τ) and shifts with a country-specific bond demand factor βjt; the elastic component means that as bond prices rise, clientele demand falls, so arbitrageurs must absorb the residual supply and require a risk premium to do so.

Global arbitrageur: An investor who trades the currency and bonds of both countries simultaneously, bridging the segmented currency and bond markets. In the model, global arbitrageurs maximize mean-variance utility over instantaneous wealth changes; their shared portfolio across all asset classes is the mechanism through which shocks in one market create hedging-driven demand in other markets, generating the cross-market linkages in risk premia and monetary policy transmission.

Currency carry trade (CCT): A strategy that borrows at the home short rate and invests at the foreign short rate, profiting when the foreign currency does not depreciate enough to offset the interest rate differential. Under UIP, the CCT earns zero expected return; the model generates a positive expected CCT return — a currency risk premium — when arbitrageurs are risk-averse and currency traders’ demand is price-elastic. In the paper’s notation, the CCT return is det/et + (iFt − iHt)dt.

Bond carry trade (BCT): A strategy that borrows at the short rate and invests in long-maturity bonds of the same country, profiting when long yields fall or when expected short rates are below current long yields. Under EH, the BCT earns zero expected return; the model generates a positive expected BCT return — a term premium — when arbitrageurs are risk-averse and bond clientele demand is price-elastic.

Exchange-rate disconnect: The empirical and model finding that movements in the exchange rate are nearly uncorrelated with movements in long-maturity bond yields, even though both are endogenously determined in the same model. The disconnect arises in the estimated model because long bond yields are driven primarily by bond demand factors, while the exchange rate is driven primarily by the currency demand factor, and the two sets of factors move the exchange rate in offsetting directions so that their net effect on bond yield-exchange rate covariance is approximately zero.

Attenuation effect: The dampening of monetary policy transmission to asset prices caused by the need to compensate risk-averse arbitrageurs for the increased risk they bear when accommodating the policy-induced excess demand. In the currency market, a home short-rate cut causes the CCT’s expected return to rise (arbitrageurs must be paid more to hold foreign currency), which means the foreign currency appreciates less than UIP predicts. In the bond market, a short-rate cut causes the BCT’s expected return to rise (term premia increase), so long yields fall less than EH predicts.

Propagation effect: The international transmission of a domestic monetary policy shock to foreign asset prices through arbitrageur hedging. A home short-rate cut causes arbitrageurs to increase their foreign-currency position (CCT); this exposes them to the risk of foreign short-rate declines (which depreciate the foreign currency), and long-maturity foreign bonds hedge this risk; so arbitrageurs increase foreign bond demand, depressing foreign yields. This channel is absent from the standard model where risk premia are constant.

Log-affine equilibrium: The conjectured and verified form of the equilibrium in which the log exchange rate and log bond prices are affine (linear plus constant) functions of the five state factors (iHt, iFt, γt, βHt, βFt). This structure allows the model to be solved as a system of ordinary differential equations and scalar equations, and enables closed-form or numerically tractable characterization of risk premia, variance decompositions, and policy effects.

Bond demand factor (βjt): A stochastic variable that shifts the intercept of bond clientele demand in country j, independent of maturity τ. A positive shock to βjt increases desired bond holdings of country-j clienteles at any given price, forcing arbitrageurs to shed country-j bonds, which lowers bond yields. The factor follows a mean-reverting process and in the estimated model is found to be the primary driver of long-maturity yields in both countries.

Currency demand factor (γt): A stochastic variable that shifts the intercept of currency traders’ demand for foreign assets, independent of the exchange rate level. A positive shock to γt increases desired foreign asset holdings of currency traders, so arbitrageurs reduce their foreign-currency position, which affects their bond positions through hedging. In the estimated model, γt is the primary driver of exchange-rate movements.

A Theory of How Workers Keep up with Inflation

Mon, 01 Jan 0001 00:00:00 +0000

A Theory of Supply Function Choice and Aggregate Supply

Mon, 01 Jan 0001 00:00:00 +0000

Research Question

Modern macroeconomic models of aggregate supply universally restrict firms to price-setting — committing to a price and supplying whatever quantity the market demands. Flynn, Nikolakoudis, and Sastry ask: what happens if instead firms choose any supply function, a mapping that describes the price charged at each quantity of production? The paper develops the first general-equilibrium, macroeconomic theory of supply function choice and characterizes its implications for the slope of aggregate supply, monetary non-neutrality, and time-varying inflation-output tradeoffs.

Methodology

The paper proceeds in two stages. In partial equilibrium, a single monopolistic firm with constant-returns-to-scale technology and constant-elasticity demand faces log-normal uncertainty about demand shifters, the aggregate price level, real marginal costs, and the stochastic discount factor. The firm chooses a non-parametric supply function — any implicit mapping f(p,q) = 0 — to maximize expected real profits. The paper shows that supply function choice is equivalent to conditioning price-quantity decisions on the realized nominal demand state z = ΨP^η. The authors prove (Theorem 1) that the optimal supply function is endogenously log-linear: log p = α₀ + α₁ log q, where the inverse supply elasticity α₁ is characterized in closed form.

In general equilibrium, the authors embed supply function choice in an otherwise standard monetary business cycle model (in the tradition of Woodford 2003a and Hellwig and Venkateswaran 2009), featuring a representative household demanding differentiated goods, a money supply following a random walk with time-varying volatility, and idiosyncratic shocks to productivity, wages, and demand. They guess and verify a log-linear equilibrium and derive a scalar fixed-point equation for the equilibrium supply elasticity (Theorem 3).

For quantification, the authors calibrate structural parameters (η = 8 from Hottman et al. 2016 scanner data; γ = 0.11 from Gagliardone et al. 2023 Belgian firm data; κ^M = 0.29 calibrated to match an average aggregate supply slope of 0.11 from Hazell et al. 2022) and estimate time-varying uncertainty via a GARCH model of quarterly US data on GDP growth, inflation, and real marginal cost growth from 1960 Q1 to 2024 Q4. Idiosyncratic demand uncertainty is set proportional to aggregate TFP uncertainty using the proportionality factor R = 6.5 from Bloom et al. (2018).

Main Findings

Optimal supply function. The optimal firm-level supply function is log-linear with inverse supply elasticity α₁ determined by the relative variances and covariances of demand, the price level, and real marginal costs. Three comparative statics drive the macroeconomic results: (1) higher idiosyncratic demand uncertainty (σ²_Ψ) flattens the supply function toward price-setting, because a fixed price insulates profit markups against demand variation; (2) higher price-level uncertainty (σ²_P) steepens the supply function toward quantity-setting, because setting a fixed quantity allows relative prices to adjust; (3) lower price elasticity of demand (less elastic demand, more market power) flattens the supply function, conditional on a sufficient condition that holds in US data whenever η > 2.5.

From micro supply to aggregate supply. With fixed log-linear supply functions, the economy has a unique log-linear equilibrium with an AD/AS representation (Theorem 2). The slope of aggregate supply ε^S_t depends on ω₁ (the transformed inverse supply elasticity), κ^M (firms’ signal precision about the money supply), γ (income effects), and η (demand elasticity). Aggregate supply is maximally elastic — money is as non-neutral as possible — if and only if firms are pure price-setters (ω₁ = 0). Aggregate supply is perfectly inelastic — money is neutral — if and only if firms are quantity-setters (ω₁ = 1/η). A lower elasticity of demand flattens aggregate supply through general equilibrium strategic complementarities, a prediction opposite to the New Keynesian model.

Equilibrium supply slope and its determinants. The equilibrium ω₁ solves a fixed-point equation (Theorem 3) in which macroeconomic uncertainty shapes firms’ optimal supply functions, which in turn shape macroeconomic dynamics. Under the special case of balanced strategic interactions (ηγ = 1), the slope of aggregate supply has a clean closed form depending only on the ratio ρ_t = σ_{ϑ,t}/σ^M_{t|s} (idiosyncratic demand uncertainty relative to posterior monetary uncertainty). Critically, the equilibrium supply slope is invariant to the overall level of uncertainty — only the composition of uncertainty matters (Proposition 3). Even vanishingly small uncertainty can generate any level of monetary non-neutrality depending on uncertainty composition.

Quantitative results — United States over time. The model’s estimated slope of aggregate supply shows sharp variation since 1960. The slope is relatively flat and stable during the 1960s, the Great Moderation (1991–2007), the Great Recession (2008–2019), and the recovery from the Great Recession. It spikes dramatically during the 1970s oil crisis and the post-Covid inflation of the 2020s. Compared to Ball and Mazumder (2011), the model qualitatively matches the steepening during 1973–1984 (+58% in the model) vs. the data’s +175%, and a subsequent flattening of −25% vs. −32% in the data during 1985–2007. Compared to Cerrato and Gitti (2022), the model accounts for approximately 4/5 of the steepening between the pre-Covid and post-Covid periods (+112% model vs. +145% data). For the Hazell et al. (2022) comparison, the model accounts for approximately 1/2 of the estimated flattening from 1978–1990 to 1991–2018.

Quantitative results — Cross-country. Using OECD annual data from 1960–2019, the model’s predicted slope of aggregate supply is not positively correlated with the average level of inflation across countries. For countries with the highest inflation rates, the model predicts a negative slope of aggregate supply, driven by very high correlation between price-level uncertainty and real marginal cost uncertainty. The model-predicted slope correlates positively with the reduced-form regression coefficient of inflation on real output growth across countries, even after instrumenting for demand. This predictive power is over and above what can be explained by the level or volatility of inflation alone.

Scope Conditions

All results are derived under log-normality of uncertainty, which ensures the log-linear structure of optimal supply functions. The quantification relies on GARCH-estimated uncertainty and treats idiosyncratic demand uncertainty as proportional to aggregate TFP uncertainty. The model abstracts from microeconomic nominal price stickiness (though the authors show in Appendix B that Calvo-style sticky prices can be incorporated). The baseline model requires the equilibrium condition on firm beliefs to be consistent (rational expectations). Multiple equilibria of the scalar fixed-point are possible in principle, bounded by at most five log-linear equilibria (Proposition 2).

In depth

Q1. What is wrong with assuming price-setting or quantity-setting as a primitive restriction on firm behavior?

A: Price-setting and quantity-setting are two isolated, generically non-optimal points in the larger space of supply functions. Corollary 2 establishes that price-setting is optimal only in the limit as idiosyncratic demand uncertainty becomes unboundedly large (σ²_Ψ → ∞), while quantity-setting is optimal only in the limit as price-level uncertainty becomes unboundedly large (σ²_P → ∞). In a macroeconomic environment where both sources of uncertainty are present in comparable magnitudes, both extreme policies perform poorly and the analyst who imposes either inadvertently restricts firms’ strategies in ways that have large macroeconomic consequences — for example, making money neutral under quantity-setting even when information frictions are present, or making the slope of aggregate supply invariant to demand elasticity under price-setting.

Q2. What is the formal equivalence between supply function choice and conditioning on realized demand?

A: The firm’s problem of choosing a supply function f(p,q) = 0 ex ante is mathematically equivalent to choosing a price-quantity plan (p(z), q(z)) indexed by the nominal demand state z = ΨP^η (Equation 4 in the paper). After the supply function is set, the firm produces where the supply function intersects the demand curve, which pins down the market-clearing outcome as a function of z. Choosing the supply function ex ante is therefore the same as choosing z-contingent prices and quantities without any parametric constraint. This links the model to rational expectations equilibrium in the spirit of Lucas (1972): firms use the demand for their product as a noisy signal to update beliefs and set their optimal price and quantity in response to realized demand conditions.

Q3. How is the optimal inverse supply elasticity α₁ derived, and what is the 2SLS interpretation?

A: Because the optimal supply function allows the firm to set a z-contingent price, the first-order condition at each realized demand state z = t equates expected marginal revenue and expected marginal cost (Equation 7). Under log-normality, this yields a log-linear relationship log p = α₀ + α₁ log q. The elasticity α₁ equals the ratio (d log p / d log z) / (d log q / d log z) = Cov[log z, log p**] / Cov[log z, log q**], where p** and q** are the full-information optimal price and quantity (Equation 9). This is formally equivalent to a 2SLS regression: the firm estimates how its optimal price should change with its optimal quantity, using the nominal demand state z as an instrument for the optimal quantity. The supply function is steep if nominal demand strongly predicts movements in the full-information optimal price (large reduced-form coefficient); it is flat if nominal demand primarily predicts movements in the full-information optimal quantity (large first-stage coefficient).

Q4. How do uncertainty and demand elasticity shape the firm’s optimal supply function in partial equilibrium?

A: Three key comparative statics apply when the supply function is upward-sloping. (1) Greater price-level uncertainty (σ²_P increases) steepens α₁ toward quantity-setting: not knowing competitors’ prices makes aggressive dynamic pricing attractive because it allows the firm’s relative price to adjust ex post. (2) Greater idiosyncratic demand uncertainty (σ²_Ψ increases) flattens α₁ toward price-setting: demand uncertainty favors a fixed price to keep the markup over real marginal costs constant, accommodating demand with quantity variation. (3) A lower price elasticity of demand (more market power, lower η) flattens α₁: more market power reduces the cost of setting the “wrong” price, reducing the benefit of dynamic pricing. Corollary 1 provides a sufficient condition — σ_{M,P} ≥ 0, 2ησ_{M,P} + σ_{M,Ψ} ≥ σ_{P,Ψ}, and α₁ ≥ 0 — under which ∂α₁/∂η > 0, implying greater market power flattens supply; the paper verifies this condition holds in US data whenever η > 2.5.

Q5. How does the model generate an aggregate supply and demand representation from supply function choices?

A: Theorem 2 establishes that, given any fixed log-linear supply functions with slope ω₁,t, there is a unique log-linear equilibrium. In this equilibrium, the price level and real output are jointly determined by an aggregate demand curve — shifting with the money supply but not productivity — and an aggregate supply curve — shifting with productivity but not the money supply. The inverse elasticity of aggregate supply is ε^S_t = γ(κ^M_t + ω₁,t(η − 1/γ)(1 − κ^M_t)) / ((1 − ω₁,t η)(1 − κ^M_t)), derived from aggregating firm-level pricing decisions. The slope depends on ω₁,t (micro supply), κ^M_t (signal precision about money), γ (income effects), and η (demand elasticity). An aggregate demand shock of ∆ log M raises the price level by ε^S_t ∆ log M / (ε^D_t + ε^S_t) and raises real output by ∆ log M / (ε^D_t + ε^S_t), where ε^D_t = γ is the inverse elasticity of aggregate demand.

Q6. What is the equilibrium fixed-point equation and why can there be multiple equilibria?

A: Theorem 3 shows that the equilibrium transformed inverse supply elasticity ω₁,t solves a quintic polynomial fixed-point equation (Equation 29) that depends on the variances of idiosyncratic demand shocks (σ²_ϑ,t), posterior uncertainty about productivity (σ^A_{t|s}), and posterior uncertainty about money (σ^M_{t|s}). Multiple equilibria can arise because of a self-reinforcing feedback: if firms set steep supply functions, prices respond more to demand, which raises price-level volatility, which in turn makes quantity-setting more attractive, further steepening supply functions. Proposition 2 establishes existence of at least one log-linear equilibrium and at most five. Idiosyncratic productivity and factor price uncertainty do not enter the fixed-point equation because the variance of real marginal costs per se does not affect optimal supply function choice — only the covariance of marginal costs with demand and the price level matters.

Q7. What determines the slope of aggregate supply in the special case of balanced strategic interactions (ηγ = 1)?

A: Under ηγ = 1 — where strategic complementarities from relative price effects exactly offset strategic substitutabilities from aggregate consumption effects — the slope of aggregate supply has the closed-form expression ε^S_t = γ(κ^M_t / (1 − κ^M_t))(1 + 1/(γ²ρ²_t κ^M_t)) where ρ_t = σ_{ϑ,t}/σ^M_{t|s} is the ratio of idiosyncratic demand uncertainty to posterior monetary uncertainty (Corollary 5). Aggregate productivity uncertainty drops out entirely because firms do not use the demand state to infer aggregate productivity when strategic interactions are balanced. As ρ_t → ∞ (idiosyncratic demand dominates), the slope converges to the price-setting value γκ^M_t/(1 − κ^M_t). As ρ_t → 0 (monetary uncertainty dominates), the slope goes to infinity, corresponding to quantity-setting and monetary neutrality.

Q8. What is the role of total uncertainty versus the composition of uncertainty?

A: Proposition 3 establishes a striking invariance result: if all standard deviations in the economy are scaled by a common factor λ > 0, the equilibrium supply elasticity and slope of aggregate supply are unchanged. The equilibrium outcomes depend only on the ratios of different sources of uncertainty, not their absolute magnitudes. This sharply distinguishes the model from menu-cost models, in which any increase in uncertainty unambiguously raises the benefit of price adjustment and steepens aggregate supply. A corollary is that idiosyncratic productivity uncertainty has no effect on the slope of aggregate supply in the supply function model, whereas it would steepen aggregate supply in Golosov-Lucas menu-cost models. Moreover, even a vanishingly small level of uncertainty can generate any level of monetary non-neutrality, because the equilibrium supply elasticity is discontinuous at zero uncertainty (ε^S_t (0) = {∞} while ε^S_t (λ) is bounded for any λ > 0).

Q9. How does market power (demand elasticity) affect the slope of aggregate supply, and why does this differ from the New Keynesian prediction?

A: In the supply function model, a lower elasticity of demand (more market power, lower η) flattens aggregate supply by reducing general-equilibrium strategic complementarities. When other firms raise their prices following a demand shock, a given firm faces higher relative demand; the strength of this effect is parameterized by η. With supply functions (ω₁,t ≠ 0), this relative demand increase generates an additional price response, so higher η steepens aggregate supply. Crucially, this effect is exactly zero if and only if firms are pure price-setters (ω₁,t = 0) — meaning the prediction that market power affects aggregate supply is absent from price-setting models. This is the opposite of the New Keynesian prediction: in Woodford (2003b) with decreasing returns to scale, a higher elasticity of demand (less market power) steepens the Phillips curve, because more elastic demand amplifies the quantity response to price changes and thereby the marginal cost response to nominal cost shocks.

Q10. How does the model rationalize the steepening of aggregate supply in the 1970s and 2020s?

A: The GARCH estimates of macroeconomic uncertainty show abrupt increases in inflation uncertainty during the 1970s oil crisis period and after the Covid-19 shock in the 2020s. In the model, a spike in aggregate price-level uncertainty (σ²_P increases) causes firms to choose steeper supply functions — closer to quantity-setting — endogenously. This steepens the aggregate supply curve so that demand shocks have larger nominal effects and smaller real effects. Quantitatively, relative to the base period, the model predicts a steepening of +58% during 1973–1984 and +112% during 2021–2023. The empirical comparisons are +175% (Ball and Mazumder 2011, 1973–1984) and +145% (Cerrato and Gitti 2022, 2021–2023). The model thus accounts for the direction and rough order of magnitude of both episodes but not their full extent. The quarterly time series of model-implied ε^S_t has a correlation of 0.93 with one-quarter-ahead inflation uncertainty and 0.62 with the quarterly level of inflation.

Q11. How does the cross-country evidence help distinguish the model from alternatives based on the level of inflation?

A: The cross-country analysis uses OECD data from 1960–2019 to construct country-level model-implied slopes of aggregate supply using the same structural parameters (η = 8, γ = 0.11, κ^M = 0.29) and country-specific GARCH uncertainty estimates from a one-lag VAR. The key finding is that the model-implied slope is not positively predicted by average inflation across countries (Panel A of Figure 5) — in fact, for the highest-inflation countries such as Chile, Israel, and Mexico, the model predicts a negative slope of aggregate supply, reflecting high correlation between price-level uncertainty and real marginal cost uncertainty. By contrast, the model-implied slope correlates positively with the reduced-form regression coefficient of inflation on real output growth (Panel B), and this positive correlation is also found using a model-derived instrument isolating exogenous monetary variation. This implies that relative uncertainties, not the mean or volatility of inflation per se, help account for cross-country heterogeneity in inflation-output tradeoffs beyond the predictions of Ball et al. (1988).

Q12. How can supply functions be integrated into larger linearized macroeconomic models?

A: Section 4.5 provides a general framework. For any model in which firms face a demand function q_it = d(p_it, z^D_it) and a value function V(p_it, q_it, z^V_it), log-linearization around a deterministic steady state yields an optimal pricing rule ˆp_it = ω₁,it ˆz^D_it (Equation 35) for some scalar ω₁,it determined by the covariance structure of the linearized model. The coefficients ω₁,it enter the standard representation of aggregate dynamics (McKay and Wolf 2023) through the ideal price index ˆP_t = ∫₀¹ ˆp_it di. The additional “rational expectations” restriction is that ω₁,it must be consistent with the equilibrium law of motion for prices. The paper argues that supply functions can thereby be embedded in the broad class of linearized DSGE models used for quantitative work, including models with decreasing returns, monopsony, endogenous markups, sticky prices, investment, and quality choice.

Q13. What are the implications of supply function choice for monetary policy discretion?

A: The model implies a thorny tradeoff for monetary policymakers. If a central bank wishes to maintain discretion — the ability to surprise private agents — this increases firms’ uncertainty about the money supply (higher σ²_M). Under balanced strategic interactions (ηγ = 1), greater posterior monetary uncertainty (σ^M_{t|s}) lowers the ratio ρ_t = σ_{ϑ,t}/σ^M_{t|s}, which flattens the aggregate supply curve (reduces ε^S_t) and thereby increases the real effect of monetary surprises. However, this also endogenously induces firms to set steeper supply functions — closer to quantity-setting — so that the aggregate supply curve steepens in response to the greater price-level uncertainty generated by such an environment. The paper therefore concludes that maintaining monetary policy discretion may be, at least partially, self-defeating.

Inverse supply elasticity (α₁): The percentage by which a firm increases its price in response to a one percent increase in production, characterizing the slope of the firm’s optimal supply function. It is endogenously log-linear and determined by the ratio of covariances relating the nominal demand state to the firm’s optimal price vs. optimal quantity under full information — formally equivalent to a 2SLS coefficient using nominal demand as an instrument.

Supply function: A mapping f(p, q) = 0 describing the locus of prices and quantities a firm commits to, as an implicit function over price-quantity pairs. Unlike price-setting (f depends only on p) or quantity-setting (f depends only on q), the general supply function allows prices to vary with realized demand, nesting both polar cases as limits of extreme uncertainty.

Nominal demand state (z): The composite variable z = ΨP^η that indexes the demand curve. Firms observing their own output market clearing can use z as a noisy signal for inference about the aggregate price level, real marginal costs, and monetary conditions. The supply function is formally equivalent to conditioning price-quantity choices on z.

Slope of aggregate supply (ε^S): The inverse elasticity of the aggregate supply curve in the AD/AS representation, measuring the relative within-period response of the price level versus real output to an aggregate demand shock. It depends on the slope of firm-level supply functions (ω₁) interacted with the information precision about the money supply (κ^M) and income effects (γ).

Transformed inverse supply elasticity (ω₁): The reparameterization ω₁ = α₁/(1 + ηα₁), where α₁ is the firm-level inverse supply elasticity and η is the price elasticity of demand. ω₁ = 0 corresponds to price-setting; ω₁ = 1/η corresponds to quantity-setting. The equilibrium value of ω₁ solves a fixed-point equation that maps macroeconomic uncertainty back into firms’ optimal supply function choices.

Balanced strategic interactions (ηγ = 1): A parametric special case in which strategic complementarities from aggregate demand externalities (parameterized by η) exactly offset strategic substitutabilities from wage pressure (parameterized by 1/γ). Under this condition, the slope of aggregate supply has a closed-form solution that depends only on the relative uncertainty about idiosyncratic demand vs. the money supply.

Relative uncertainty sufficient statistic (ρ_t): The ratio σ_{ϑ,t} / σ^M_{t|s}, measuring firms’ uncertainty about idiosyncratic demand shocks relative to posterior uncertainty about the money supply. Under balanced strategic interactions (ηγ = 1), ρ_t is the single sufficient statistic determining the equilibrium slope of aggregate supply. As ρ_t → ∞ (idiosyncratic demand uncertainty dominates), firms converge to price-setting and aggregate supply flattens; as ρ_t → 0 (monetary uncertainty dominates), firms converge to quantity-setting and aggregate supply becomes vertical.

Invariance to total uncertainty: A key property of the model: the equilibrium slope of aggregate supply is invariant to the overall scale of uncertainty (Proposition 3). Only the composition of uncertainty across idiosyncratic vs. aggregate sources and demand vs. productivity shocks matters. This distinguishes the model from menu-cost models, in which any increase in uncertainty raises the benefit of price flexibility and steepens aggregate supply regardless of uncertainty composition.

A traffic-jam theory of growth

Mon, 01 Jan 0001 00:00:00 +0000

Research Question. Finocchiaro and Weil ask whether financial development necessarily promotes long-run economic growth, or whether congestion externalities in R&D markets can offset — and even reverse — the growth benefits of easier credit access. The paper proposes that the empirical coexistence of expanding financial sectors and roughly constant per-capita GDP growth rates (approximately 2% annually in the United States over the last century) can be explained by the interplay of search frictions in two sequential markets: credit and innovation.

Methodology. The authors build a continuous-time endogenous growth model in which all growth is innovation-led. Firms must pass through four sequential stages — creation, fund-raising (Stage 0–1), R&D search (Stage 1–2), and high-productivity production (Stage 2–3) — before being exogenously destroyed. Both the credit market (firms searching for banks/venture capitalists) and the innovation market (firms searching for innovators after securing finance) are characterized by constant-returns-to-scale matching functions with endogenous market tightness. Nash bargaining determines the loan repayment, and free entry drives profits to zero in both markets. The model is then calibrated to annual U.S. data, with the risk-free rate r = 3.5%, separation rate s = 4%, symmetric bargaining power ω = 0.5, a productivity jump γ = 0.023 targeting a baseline growth rate of 2%, credit market duration for creditors just below one month and for firms slightly above one year (consistent with Wasmer and Weil, 2004), a two-year average patent approval time (USPTO 2020), 6% employment in finance (BLS 2020), and 0.5% employment in scientific R&D (BLS 2020).

Core Mechanism. The paper derives a “spillover function” Q(p,g) that links the equilibrium probability of finding an innovator (q) to the probability of finding a bank (p) and the growth rate (g). Because free entry holds profits at zero, easier credit — a higher p — forces q downward: if a firm spends less time raising funds, the innovation market becomes more congested (Qp < 0). This negative spillover between the two markets is the paper’s central traffic-jam analogy: relieving one bottleneck shifts congestion downstream.

Main Findings. The GG curve — the locus of (p, g) pairs consistent with equilibrium — is hump-shaped under the symmetric cost condition c = ωn (flow search cost for firms in credit markets equals the firm’s share of search costs in innovation markets). Growth is maximized when expected credit search time equals expected innovation search time (1/p = 1/q). Beyond that interior optimum, further financial deepening lowers the growth rate. The calibrated economy sits to the right of the hump in a flat region (p > q), so that reducing credit frictions alone has a marginally negative effect on growth: eliminating credit frictions lowers g from 2.000% to 1.997%, a reduction of 0.003 percentage points. Reducing innovation frictions alone raises g modestly to 2.071% (+0.071 pp). Only a simultaneous reduction of frictions in both markets raises g meaningfully, to 2.122% (+0.122 pp). The quantitative effects are deliberately small, consistent with the near-constancy of long-run growth despite financial deepening.

Scope Conditions. The non-monotonicity requires both markets to carry search frictions; when only one friction is present, financial development is unambiguously good for growth (Section 4.3). The hump-shape is established analytically in the symmetric case c = ωn; more generally, the paper shows (via back-of-envelope approximation) that the sign of the finance–growth link depends on whether c/ω is less than or greater than n. The quantitative insensitivity of growth to finance is amplified when the real interest rate is close to the growth rate and when potential growth γ is close to actual growth g: the elasticity of growth with respect to finance is proportional to (γ − g)/γ. Extensions to fixed bank entry costs (introducing a growth-to-finance feedback), endogenous innovator wages (Section 4.2), and frictionless innovation (Section 4.3) all confirm the benchmark conclusions under stated parameter conditions.

Q1: What is the paper’s central theoretical claim about the finance–growth nexus? The paper claims that the finance–growth relationship is non-monotonic: financial development raises growth when credit is scarce (left of the hump on the GG curve) but lowers it when credit is readily available (right of the hump), because easier financing draws more firms into the innovation market, tightening it and reducing the probability of finding an innovator. This congestion spillover from the credit market to the innovation market is the “traffic-jam” mechanism. The non-monotonicity vanishes if either market lacks search frictions.

Q2: What is the “spillover function” and why is it central to the model? The spillover function Q(p, g) is derived from the free-entry zero-profit condition for firms and expresses the innovation-matching probability q consistent with equilibrium for given credit-matching probability p and growth rate g. It has Qp < 0 (easier credit reduces q) and Qg < 0 (faster growth reduces q), capturing the two-way negative interaction between the markets. It is central because all equilibrium and comparative-statics results flow through it: the GG curve is defined by substituting Q into the growth equation g = γ/(1 + s/p + s/Q(p,g)).

Q3: Under what condition is the GG curve hump-shaped, and what is the intuition? The GG curve is hump-shaped when the flow search cost for firms in the credit market c equals the firm’s share of innovation search costs ωn (Proposition 4). The intuition mirrors equalizing travel times across two congested roads: growth is maximized when expected credit search time (1/p) equals expected innovation search time (1/q). When credit is very tight (p small), a marginal increase in p raises the share of innovating firms faster than it tightens the innovation market, so growth rises. Once credit is abundant (p large), the congestion effect on innovation dominates and growth falls.

Q4: What does the benchmark calibration predict about the quantitative effect of financial development on growth? The benchmark calibration, targeting 2% annual U.S. growth, places the economy to the right of the hump in a flat region of the GG curve (p > q). Eliminating credit market frictions alone reduces the annual growth rate by 0.003 percentage points (from 2.000% to 1.997%) while lengthening expected innovation search time from 2 years to 3.4 years. This marginally negative effect arises because the economy is already well to the right of the optimum. The results are deliberately small and consistent with the empirical near-constancy of growth alongside financial deepening.

Q5: What combination of policies does the model recommend for raising growth? Only a simultaneous reduction of frictions in both the credit and the innovation market raises the growth rate meaningfully, to 2.122% in the calibration (+0.122 pp relative to the 2.000% benchmark). Isolated improvements in credit markets have a marginally negative effect; isolated improvements in innovation markets have a marginally positive effect (+0.071 pp). The authors interpret this as supporting the OECD view that growth-stimulating policies should be designed as a system rather than as isolated pro-growth measures.

Q6: How does the elasticity of growth to finance depend on the gap between potential and actual growth? The authors show (referenced as available on request) that the elasticity of the growth rate with respect to financial factors is proportional to (γ − g)/γ, where γ is the potential growth rate (the productivity jump per innovation) and g is the actual equilibrium growth rate. When actual growth is close to potential — as in the benchmark calibration with γ = 0.023 and g = 2.000% — this factor is near zero, making growth nearly insensitive to changes in financial conditions. This provides a structural rationale for why empirically measured finance–growth effects are often small or nil in advanced economies.

Q7: How does introducing fixed bank entry costs (Section 4.1) change the results? When banks bear a fixed licensing cost K (paid each time they enter the credit market), credit market tightness φ becomes an increasing function of (r − g)K: the annuity value of the fixed cost falls as growth rises, inducing more bank entry and reducing credit tightness. This introduces an upward-sloping PP curve (rather than a vertical one) and creates a direct positive feedback from growth to financial deepening. The qualitative conclusions on non-monotonicity are preserved: lower licensing costs shift the PP curve right and steepen it, with the equilibrium effect on growth remaining ambiguous due to the congestion spillover into the innovation market.

Q8: What happens to the spillover function when innovators are paid (Section 4.2)? When innovators receive a Nash-bargained wage, the equilibrium wage (Equation 30) is increasing in innovator productivity (πγ), innovation market tightness (θn), and the growth rate, and decreasing in total credit market search costs K(φ). Easier credit raises both expected revenues and innovator wages for the firm. For innovator bargaining power α sufficiently small (and always for α < 1, as shown in the Appendix), the revenue effect dominates so that Qp < 0 is preserved: finance still creates bottlenecks in the innovation market, and the core non-monotonicity result carries through.

Q9: What does the model predict when only one market has search frictions? When only the credit market is frictional and innovators are found instantly after financing is secured, improving credit market efficiency unambiguously raises growth (Section 4.3, Figure 4). The GG curve becomes g = γ/(s/p + 1), which is strictly increasing in p, and the PP curve shifts in a way that unambiguously raises equilibrium growth. The paper uses this case to isolate the source of non-monotonicity: the negative spillover from credit ease to innovation congestion requires frictions in both markets to operate.

Q10: How does the paper relate to the empirical “too much finance” literature? The paper offers a distinct theoretical mechanism for the inverted-U relationship between credit and productivity growth documented by Arcand et al. (2015), Aghion et al. (2019), and Popov (2018), among others. While Aghion et al. (2019) explain the inverted-U through less-efficient incumbents surviving longer with better credit access, and Malamud and Zucchi (2019) emphasize how financing frictions differentially affect entrant and incumbent composition, Finocchiaro and Weil’s mechanism operates through congestion externalities in sequential search markets — a channel not previously formalized in the innovation-led growth literature.

Search frictions in credit markets: Firms searching for financiers (banks or venture capitalists) and banks searching for firms face a matching technology with constant returns to scale; credit market tightness φ is the ratio of firms searching for banks to banks searching for firms, and the matching probability p(φ) is strictly decreasing in φ. Free entry drives bank profits to zero, pinning equilibrium tightness.

Search frictions in innovation markets: After securing financing, firms search for innovators who can upgrade their productivity by factor γ; innovation market tightness θ is the ratio of firms searching for innovators to innovators, and the matching probability q(θ) is strictly decreasing in θ. The number of innovators is held fixed (analogously to fixed labor supply in Mortensen-Pissarides).

Spillover function Q(p, g): Derived from the free-entry zero-profit condition for firms, Q expresses the equilibrium innovation-matching probability q as a function of the credit-matching probability p and the growth rate g. It has Qp < 0 and Qg < 0, meaning easier credit and faster growth both reduce q by tightening the innovation market. It is the formal embodiment of the traffic-jam mechanism.

GG curve: The locus of (p, g) pairs consistent with the equilibrium growth equation g = γ/(1 + s/p + s/Q(p,g)). Under the symmetric cost condition c = ωn, the GG curve is hump-shaped: it rises from the origin, reaches a maximum interior growth rate, then declines toward an asymptote g∞ < γ. Its shape encodes the non-monotonic relationship between finance and growth.

PP curve: The locus of equilibrium credit-matching probabilities consistent with free entry in the credit market. In the benchmark model it is a vertical line at p* = p(ω/(1−ω) · k/c), independent of q and g. When banks bear a fixed entry cost K, the PP curve becomes upward-sloping, introducing a direct positive feedback from growth to financial deepening.

Potential growth rate γ: The productivity jump per successful innovation; in a frictionless world (p = q = ∞) the economy grows at γ. Actual growth g falls below γ to the extent that search frictions delay the delivery of credit and innovation. The elasticity of g to financial factors is proportional to (γ − g)/γ, so when actual and potential growth are close, financial factors matter little for growth.

Congestion externality in R&D: The mechanism by which financial deepening — raising p — drives more firms to seek innovators, tightening the innovation market and reducing q. This negative spillover (Qp < 0) is the paper’s central departure from models with only a single friction, where finance is always growth-enhancing.

Abundance from Abroad: Migrant Income and Long-Run Economic Development

Mon, 01 Jan 0001 00:00:00 +0000

Layer 1 — Overview

Research Question

This paper asks how persistent increases in international migrant income prospects affect long-run economic development in migrant-origin areas. The central question is whether Philippine provinces with persistent access to higher-income migration opportunities develop faster than provinces with less attractive migration opportunities, and through which channels.

Natural Experiment and Identification Strategy

The authors exploit the 1997 Asian Financial Crisis as a large-scale natural experiment. The crisis triggered sharp, heterogeneous, and persistent exchange rate changes across Philippine migrants’ destination countries — ranging from a 4% depreciation against the Philippine peso (Korea) to a 57% appreciation (Libya), with Japan and Saudi Arabia in between (appreciations of 32% and 52%, respectively). Because Philippine provinces differed in the pre-crisis distribution of migrant income across destinations (measured using unusual POEA/OWWA administrative contract data covering all overseas worker contracts, including migrant incomes, origins, and destinations), these exchange rate shocks generated exogenous, province-level variation in a shift-share instrument: the predicted change in province migrant income per capita due to the 1997 shocks. Identification follows the “exogenous shares” framework of Goldsmith-Pinkham et al. (2020). Pre-trend tests across up to 12 years of pre-shock panel data find no evidence of differential trends across provinces. The five destinations with the highest Rotemberg weights — Saudi Arabia, Japan, United States, Taiwan, and Hong Kong — collectively account for 75% of the identifying variation. The exchange rate shocks and the exposure weights both exhibit strong persistence over two decades post-1997.

Data

Philippine government administrative data (POEA/OWWA) on all overseas worker contracts, 1992–2015, matched at 95% rate, providing province-of-origin and destination-specific migrant income.
Philippine Family Income and Expenditure Survey (FIES), up to twelve triennial rounds from 1985–2018 (74 provinces, ~40,000 households per round), for domestic income and expenditure.
Six rounds of the Philippine Census of Population (1990–2015) for education, migration rates, and sectoral employment shares.
Province-level consumer price index data (1994–2017) and firm-level export survey data for robustness checks.
Unit of analysis: 74 Philippine provinces (consistent 1990 borders).

Main Findings with Quantitative Magnitudes

Six-fold magnification of migrant income: Each unit of initial short-run shock (1997–1998) to migrant income per capita is magnified more than six-fold by 2009–2015. A one-standard-deviation shock (0.093) raises long-run migrant income per capita by 14.7% of the baseline mean (PhP 601 per capita, 0.2 standard deviations).
Domestic income gains predominate: A one-standard-deviation shock raises domestic income per capita (excluding migrant income and remittances) by 6.4% of the baseline mean (PhP 1,676, 0.18 standard deviations). Remarkably, 73.6% of the long-run global income increase comes from domestic income and only 26.4% from migrant income.
Global income and expenditure: A one-standard-deviation shock raises global income per capita by PhP 2,277 (0.2 standard deviations, or 7.5% of the baseline mean) in 2009–2015. Expenditure per capita rises by PhP 1,159 (0.13 standard deviations). Effects emerge gradually over two decades.
Education: A one-standard-deviation shock increases the college-educated share of the population by 0.46–0.51 percentage points (0.11–0.12 standard deviations) and secondary completion by 0.63 percentage points. There is no significant effect on primary completion.
Migration rates and skill composition: A one-standard-deviation shock increases the migration rate by 0.19 percentage points (0.22 standard deviations), raises the share of skilled migrants by 1.84 percentage points (0.19 standard deviations), and increases average migrant annual salary by PhP 23,703 (0.16 standard deviations). New migration concentrates in higher-education-quartile occupations.
Structural change: The shock reduces primary sector employment shares by 1.2 percentage points per standard deviation (0.06 standard deviations), with over 70% of that shift absorbed by non-tradable goods and services sectors. Domestic income gains are driven almost entirely by non-agricultural income, and roughly 55% of the increase in entrepreneurial income is from service sectors.
Education’s contribution to income: Model-based calculations assign 19.6% of the global income gain, 17.8% of the migrant income gain, and 20.2% of the domestic income gain to educational investments. Exchange rate persistence plus altered migration flows explain an additional 64.6% of the migrant income increase, so together these mechanisms account for 82.3% of the six-fold magnification. A demand multiplier (assuming 64% of migrant income returns to origin economies and a multiplier of 2.9, consistent with estimates from the literature) accounts for approximately 83.3% of the non-education-related portion of the domestic income increase.

Threats to Identification Ruled Out

Import and export shift-share controls (constructed analogously using bilateral trade data and province-level industry employment shares) are uncorrelated with the migrant income shock and leave coefficient estimates unchanged. Province-level manufactured exports, agricultural income, the CPI, and national-level FDI inflows show no statistically significant response to the shock. Internal migration rates are unaffected. Geographic spillover controls and tourism controls do not alter results. Placebo regressions in the pre-period yield small, statistically insignificant coefficients.

Scope Conditions

The paper studies formal, government-regulated temporary labor migration from the Philippines, where migrants sign contracts through POEA-licensed agencies and typically expect to return after one or more contracts. The findings apply specifically to settings where persistent (not transitory) migrant income shocks occur. Approximately 60% of contract migrants are female. The study period spans 1985–2018, with main long-run outcome analyses comparing 1994 (pre-shock) with 2009–2015 (post-shock).

In depth

Q1. What makes the 1997 Asian Financial Crisis useful as a natural experiment for this paper’s purposes?

A1: The crisis was largely unanticipated by policymakers, international organizations, and financial markets, making it implausible that pre-1997 migration destination choices reflected anticipation of the shocks. Exchange rate changes were heterogeneous across destinations (ranging from a 4% depreciation to a 57% appreciation), and crucially, these changes proved highly persistent over two decades — regression coefficients of long-run exchange rate changes on the initial 1997–1998 shock are close to and statistically indistinguishable from 1 in nearly all post-shock periods. Combined with the province-specific variation in migrant destination exposure, this generates persistent, exogenous, and heterogeneous shocks to migrant income prospects across provinces.

A2: The shift-share variable Shiftshareo equals the sum over destinations d of (ωdo0 × ΔRd), where ωdo0 is province o’s pre-shock migrant income per capita from destination d (the “exposure weight” or “share”), and ΔRd is the fractional change in destination d’s exchange rate from before to after the crisis (the “shift”). It captures the predicted change in province-level migrant income per capita due to the 1997 exchange rate shocks, and is derived directly from a theoretical model of migration. Identification relies on the “exogenous shares” approach of Goldsmith-Pinkham et al. (2020): the pre-1997 exposure weights are treated as as-good-as-randomly assigned conditional on controls, because they reflect historical migration networks formed well before the crisis.

Q3. Why is the six-fold magnification of the initial migrant income shock so striking, and what does the structural model say about its sources?

A3: The coefficient on migrant income per capita (6.463 in Panel D of Table 1) implies that for each unit of initial short-run migrant income shock, migrant income per capita is more than six units higher in 2009–2015 — a far larger response than a one-for-one pass-through would predict. The structural model, which augments a Fréchet-based gravity model of migration with endogenous education investments, accounts for 82.3% of this magnification. Education investments explain 17.8% of the migrant income increase; persistent favorable exchange rates and resulting shifts in migration flows across destinations explain an additional 64.6%. The Fréchet elasticity of migration flows with respect to destination wages is estimated at θ = 3.42 via PPML, implying that even partial reorientation of migrants toward now-higher-wage destinations substantially raises aggregate migrant income.

Q4. What evidence supports the parallel trends assumption in the pre-shock period?

A4: The authors present event study diagrams (Figure 2) showing no differential positive pre-trends in either expenditure per capita or domestic income per capita prior to 1997 — for domestic income, there is a statistically insignificant negative trend from 1985–1991 and no trend in 1991–1994. Placebo regressions estimated on the pre-period only (1985, 1988, 1991 as “pre,” 1994 and 1997 as “post”) yield small, statistically insignificant coefficients on both domestic income and expenditure. Balance tests focusing on the five high-Rotemberg-weight destination shares (Saudi Arabia, Japan, US, Taiwan, Hong Kong) — which collectively account for 75% of the identifying variation — also show no significant pre-trends in key outcomes across provinces with varying levels of exposure.

Q5. How do the authors rule out trade flows as an alternative mechanism for the estimated income effects?

A5: They construct separate import and export shift-share variables, analogous to the “China shock” of Autor et al. (2013), using baseline bilateral trade values (from COMTRADE, disaggregated to 36 ISIC industries), province-level employment shares in import and export industries (from the 1990 Census), and the same destination exchange rate shocks. These trade shift-share variables are uncorrelated with the migrant income shock after conditioning on baseline controls (Appendix Table A5). Including them as additional controls in Panel D of all main regression tables leaves the migrant income coefficient stable. Further, province-level manufactured exports per capita show no large or statistically significant response to the migrant income shock, agricultural income similarly shows no significant response, and consumer price indices are unresponsive — ruling out import price changes as a confound. FDI inflows at the national level also show no significant relationship with destination-country exchange rate shocks.

Q6. What is the composition of the domestic income gains — where do they come from?

A6: Both wage income and entrepreneurial/rental income rise significantly and in similar magnitude, while “other income” (pensions, interest, dividends) shows no robust increase (Table 4). Non-agricultural income drives virtually the entire domestic income gain; agricultural income per capita is statistically insignificant (Table 5, columns 1–2). Within entrepreneurial income, approximately 55% of the increase is from service sectors, with manufacturing and primary sector entrepreneurial income showing insignificant effects at the 10% level (Table 5, columns 3–5). These patterns are consistent with the structural change finding: the shock shifts labor from primary sectors toward non-tradable goods and services rather than toward tradable manufacturing.

A7: Global income per capita is defined as the sum of domestic income per capita (earned within the Philippine economy, excluding all international transfers) and migrant income per capita (the full income earned abroad by a province’s international migrants, calculated from contract data). Of the long-run global income increase, 73.6% comes from domestic income and 26.4% from migrant income. A one-standard-deviation shock raises global income by PhP 2,277 per capita in 2009–2015 (0.2 standard deviations, or 7.5% of the baseline mean).

Q8. How do education effects translate into more and higher-skilled migration?

A8: A one-standard-deviation migrant income shock increases college completion by 0.46 percentage points and secondary completion by 0.63 percentage points (with no significant effect on primary completion), consistent with the shock raising the return to higher education in the broader population. These better-educated workers then migrate at higher rates: the share of migrants who are skilled (college-educated) rises by 1.84 percentage points per standard deviation. Migration increases are concentrated in the two highest-education quartiles of occupations (engineers, medical professionals, teachers in the 4th quartile; caregivers, restaurant workers, performing artists in the 3rd quartile), with no significant effect in the two lowest quartiles. Average annual migrant salary rises by PhP 23,703 per standard deviation (0.16 standard deviations).

Q9. What mechanisms does the structural model invoke to explain the domestic income gains?

A9: The model treats domestic income changes as arising through at least two channels: (1) the education channel, which the model assigns 20.2% of the domestic income increase (using the estimated college completion response of 0.046 per unit shock, baseline skill-migration probabilities, and baseline skill premia for domestic income); and (2) a demand multiplier operating on the portion of migrant income remitted to origin provinces, combined with capital accumulation from sustained migrant income flows. Assuming 64% of migrant income returns to origin economies (estimated indirectly from KNOMAD/ILO and Survey on Overseas Filipinos data) and a multiplier of 2.9 (consistent with estimates from Kenya and India), this demand-plus-investment channel can explain approximately 83.3% of the remaining (non-education-related) domestic income increase of PhP 14.4 per unit shock. Under baseline assumptions (α = 0.64), the stylized dynamic model generates PhP 18.88 of domestic income by 2015 from a PhP 1 initial shock — close to the empirical estimate of PhP 18.02.

Q10. How do the authors assess SUTVA and internal migration?

A10: They test whether the migrant income shock affects net internal migration rates at the provincial level (Appendix Table A6) and find no large or statistically significant impact. There is a small negative effect on outmigration of young adults (aged 16–24) that the authors judge cannot account for the documented income impacts. The Philippines’ archipelago geography (over 7,000 islands) is noted as likely limiting inter-provincial economic spillovers; to the extent spillovers occur, they would be positive (demand spillovers from provinces experiencing income gains to neighboring provinces), making estimates conservative lower bounds. Direct tests controlling for the inverse-distance-weighted migrant income shock in neighboring provinces leave main estimates unchanged.

Q11. Are the exposure weights (migration shares) persistent, and does this support interpreting the shock as persistent?

A11: Yes. Regressions of dyadic migrant income per capita in post-shock years (2009, 2012, 2015) on dyadic migrant income per capita in 1995 yield coefficients ranging from 0.4 to 0.6, each statistically significantly different from zero (and from 1, indicating partial but substantial persistence). The exchange rate shocks ΔRd are even more persistent: regression coefficients on the initial 1997–1998 shock are close to 1 and statistically indistinguishable from 1 in nearly all post-shock periods (with the only exceptions in 2009–2012 during the Great Recession). Both components of the shift-share variable thus show persistence over two decades, supporting interpretation of the long-run effects as responses to a persistent (not transitory) income shock.

Q12. What are the policy implications and how do the authors connect findings to migration policy?

A12: The findings suggest migration policy should be an important part of the development policy toolkit. The results are directly relevant to origin-country policies facilitating formal, contract-based labor migration (e.g., regulation of recruitment agencies, educational investments to raise worker skills and competitiveness for overseas employment) and destination-country policies governing legal immigration opportunities. The authors also note implications for overseas development assistance: development agencies could consider supplementing traditional foreign aid with programs that facilitate international labor migration. The paper’s context — formal, government-regulated migration through POEA and OWWA — is described as highly policy-relevant, with 94% of developing countries with populations exceeding 1 million having a dedicated government migration agency and 78% having policies promoting migrant remittances.

Key Concepts

Shift-share variable (Shiftshareo): The paper’s primary independent variable, equal to the sum over all overseas destinations d of (ωdo0 × ΔRd) — the province’s pre-shock migrant income per capita from each destination (the exposure weight or “share”) multiplied by that destination’s exchange rate shock (the “shift”). It is the predicted change in province migrant income per capita due to the 1997 Asian Financial Crisis exchange rate shocks, and is derived directly from the theoretical model of migration (Equation A9). Identification treats the exposure weights as exogenous following the “exogenous shares” approach of Goldsmith-Pinkham et al. (2020).

Exposure weights (ωdo0): Province o’s pre-shock aggregate migrant income per capita earned in destination d, calculated from administrative POEA/OWWA contract data for 1995. These serve as the “shares” in the shift-share and capture the extent to which a province’s residents are exposed to a given destination’s exchange rate shock. They reflect historically-formed migration networks rather than anticipation of future shocks.

Global income per capita: The sum of domestic income per capita and migrant income per capita. Domestic income is household income earned within the Philippine economy (wages, entrepreneurial, and other sources), explicitly excluding all income from international sources including remittances. Migrant income is the full income earned abroad by all international migrants from the province, calculated from contract data (not remittances sent home). Global income thus captures the full resource gain available to a province from the combination of domestic production and international migration.

Magnification (of migrant income shock): The empirical finding that the long-run coefficient on migrant income per capita (6.463 in Panel D, Table 1) far exceeds 1 — meaning each unit of initial short-run shock becomes more than six units of migrant income per capita in 2009–2015. The paper decomposes this magnification into contributions from persistent exchange rates, educational investments raising skill levels and migration, and shifts in migration flows toward now-higher-wage destinations.

Brain gain: The paper’s term for the process by which improved migrant income prospects raise educational investments among the broader population (not just among migrants), leading to higher skill levels among non-migrants as well. The paper distinguishes this from “brain drain” (where migration of skilled workers reduces origin-area human capital) and provides evidence of a “virtuous cycle”: education raises migration rates and migrant skill levels, which in turn raises migrant and domestic incomes, potentially funding further education.

Rotemberg weights: Province-destination-level weights (following Goldsmith-Pinkham et al. 2020) characterizing which destination-specific exchange rate shocks drive the estimates most. Saudi Arabia (0.20), Japan (0.19), United States (0.18), Taiwan (0.10), and Hong Kong (0.08) together account for 75% of the total Rotemberg weight. These weights guide which destination-specific exposure shares receive the most scrutiny in pre-trend and balance tests.

Fréchet elasticity (θ): The elasticity of migration flows from an origin province to a destination with respect to destination wages (in Philippine pesos), estimated at 3.42 via PPML using the exchange rate shocks. This parameter governs how much migration flows — and thereby migrant income — respond to the persistent exchange rate changes, and is central to the model’s decomposition of the six-fold magnification of migrant income effects.

Domestic income multiplier: The ratio of long-run domestic income increase to the portion of the migrant income shock that returns to origin provinces. Assuming 64% of migrant income returns to origin economies (estimated from multiple administrative data sources), the implicit demand multiplier in the paper’s context ranges from about 2.9 to 3.4, consistent with multipliers found in related literature on cash transfers and credit supply shocks in low-income settings.

Across-Country Wage Compression in Multinationals

Mon, 01 Jan 0001 00:00:00 +0000

Layer 1 — Summary

Many multinationals do not fully adjust wages to the local context of their foreign establishments; instead, they partially link the wages of foreign workers in a given position to the wages paid in the same position at headquarters — a practice the authors call “wage anchoring.” Using yearly establishment-level compensation data on roughly 1,200 multinationals operating across 174 cities worldwide (2000–2015) and matched employer-employee administrative data (RAIS) from Brazil, Hjort, Li, and Sarsons document that a 10 percent higher headquarters wage is associated with 1.63–2.8 percent higher wages for workers in the same occupation at foreign establishments, with the within-firm across-country correlation substantially exceeding the correlation between a given establishment’s wages and the local average paid by other multinationals for the same position. To establish a causal link between externally imposed headquarters wage changes and subsequent foreign establishment wage responses, the paper exploits two identification strategies: minimum wage shocks in the headquarters country or U.S. state and exchange rate fluctuations, both of which generate plausibly exogenous variation in headquarters wages that is then partially transmitted to foreign workers in the same position. Wage change transmission appears to be direct and to operate through firm-wide wage-setting procedures rather than through associated changes in technology or employment at foreign establishments, a conclusion the Brazil RAIS data support because total employment at multinationals’ Brazilian establishments shows little change following positive external shocks to headquarters wages. Wage anchoring is strongest for low-skill occupations (cleaners, drivers, security guards), where a 10 percent higher headquarters wage is associated with a 2.8 percent higher foreign establishment wage, versus roughly 1.2 percent for middle- and high-skill occupations; the resulting spatial compression of wages is in line with how many multinationals themselves report setting pay across locations.

In depth

Q1. What is the central phenomenon documented in this paper, and what are the two broad empirical components of the analysis?

The central phenomenon is “wage anchoring”: multinationals link wages at their foreign establishments to the wage level at headquarters for the same narrowly-defined occupation, so that the within-firm across-country wage distribution is more compressed than what local labor-market conditions alone would imply. The first empirical component is descriptive — documenting the high cross-sectional correlation between headquarters and foreign establishment wages within a firm×occupation cell, controlling for city×year effects and local wage benchmarks. The second component is causal — using minimum wage shocks in the headquarters country or U.S. state and exchange rate shocks to generate externally imposed changes in headquarters wages, and tracing whether and how quickly those changes are partially transmitted to foreign establishments.

Q2. What is the primary dataset, what does it cover, and what are its key limitations?

The primary dataset was compiled by an unidentified consulting company that gathers compensation information from client employers and harmonizes positions globally into 309 occupations across 16 skill levels and 26 occupational categories. It covers roughly 1,200 multinationals (private-sector firms and multinational public-sector employers such as NGOs and multilateral organizations), operating in more than 170 cities, with yearly observations spanning 2000–2015. The data report average nominal gross total monthly wages for domestic (non-expat) workers in each establishment-occupation-year cell. Key limitations: the panel is unbalanced because multinationals choose which establishments report each year and often rotate establishments in and out; matching between the headquarters and any given foreign establishment requires observing the same occupation in the same year at both, which reduces the headquarters-matched sample to 80 employers and 611 foreign establishments (Sample 3, the most comparable subsample). The publicly listed U.S. firms in the data account for about one-third of total revenue of all publicly listed U.S. firms, so the sample is skewed toward unusually large employers.

Q3. How do the authors define and measure “wage anchoring” in the descriptive section?

The authors regress log average wages of workers in occupation j at a firm f’s foreign establishment in city c in year t (wjfct) on log average wages for the same occupation at the firm’s headquarters (HQwjft), controlling for firm×occupation fixed effects, city×year fixed effects, and a local market wage benchmark measured either as the average paid by other multinationals in the same city-occupation-year cell or as a city×occupation×year fixed effect. The estimated coefficient on the headquarters wage — around 0.163 using the benchmark-wage control and about 0.09 using the more restrictive city×occupation×year fixed effect — measures how much of a headquarters wage difference is “passed through” to foreign establishment wages within the same firm and occupation. They further document that the within-firm wage slope (the difference between wages in consecutive skill levels within an occupational category) at foreign establishments is similarly anchored to the corresponding slope at headquarters, with a 10 percent greater consecutive-skill wage gap at headquarters associated with about a 1.4 percent greater gap at the foreign establishment.

Q4. What exactly do the minimum wage and exchange rate identification strategies exploit, and what do they identify?

The minimum wage strategy compares multinationals whose headquarters are located in a country or U.S. state that experiences a minimum wage increase (“treated”) against multinationals whose headquarters are not exposed (“control”), conditioning on establishments being in the same foreign city. Within the treated group, it also exploits cross-occupation variation: within a given foreign establishment, workers in positions whose headquarters counterparts are more exposed to the minimum wage increase (because their wages are closer to the new minimum) experience larger foreign wage gains. The exchange rate strategy exploits appreciation of a non-U.S. headquarter country’s currency against the dollar: when the USD-measured headquarters wage of such a multinational increases following an appreciation, this tests whether foreign establishment wages in USD also rise. Because exchange rates increase and decrease, are less stable than minimum wages, and have different underlying drivers, the exchange rate design provides an independent corroboration of the minimum wage findings. Both strategies identify the effect of externally imposed headquarters wage changes on wages at the same firm’s foreign establishments in the same narrowly defined occupation.

Q5. What evidence is marshaled against indirect pathways (technology changes, employment changes, offshoring) as the driver of foreign wage transmission?

The paper presents three types of evidence against indirect pathways. First, including headquarters country×year fixed effects in the descriptive wage regressions — which absorbs any technology shocks originating in the headquarters country that affect all occupations uniformly — leaves the estimated wage anchoring coefficient essentially unchanged. Second, event study and panel regressions using the Brazil RAIS data show little change in total employment at multinationals’ Brazilian establishments following positive external shocks to headquarters wages, which is hard to reconcile with employment-driven or offshoring-driven wage adjustment. Third, a causal forest analysis of the conditional average treatment effect of minimum wage shocks on foreign wages — estimated allowing responses to vary with a wide range of job, employer, sector, and location characteristics — finds that occupation characteristics and sector have little explanatory power for which establishments transmit more, while differences in transmission are more closely related to characteristics of the headquarter-establishment country pair (proximity, similarity, shared language), which are more naturally associated with administrative coordination than with technology or production-style linkages.

Q6. How does occupation skill level moderate wage anchoring, and what does this heterogeneity imply?

Wage anchoring is strongest for low-skill occupations. In the descriptive correlations, a 10 percent higher headquarters wage is associated with 2.8 percent higher foreign wages in low-skill jobs (cleaners, drivers, data entry clerks, security guards) but only about 1.2 percent higher foreign wages in both middle-skill and high-skill jobs. The occupation heterogeneity is visible graphically (Figure 1 Panel C) and holds in regressions interacting the headquarters wage with skill-level indicators. A natural interpretation, consistent with the firm-wide wage-setting procedure explanation, is that firms are most likely to apply standardized pay rules to lower-level positions where local market customization may be seen as less important; higher-skill workers may be more likely to have individually negotiated contracts responsive to local conditions. The heterogeneity also implies that the spatial compression effect — wages in foreign establishments being pulled toward headquarters levels — is particularly pronounced at the lower end of the within-firm wage distribution, affecting positions like cleaners and guards in ways that can result in wages that are, relative to GDP per capita, an order of magnitude higher than what headquarters workers in the same position receive.

Q7. What is the “spatial compression” implication and how does it relate to within-firm wage inequality?

Wage anchoring implies that workers in the same occupation at foreign establishments located in lower-income countries receive wages that are compressed toward headquarters levels rather than fully adjusted to local wages. The paper shows that nominal wages at foreign establishments average about 89 percent of headquarters wages in the same occupation and year — and about 78 percent for establishments in countries poorer than the headquarter country — a ratio that is roughly stable across the within-firm headquarters wage distribution. This partial equalization is what the authors call “across-country wage compression”: it reduces the within-multinational cross-country wage dispersion relative to what would arise from purely market-based, locally responsive wage-setting. The spatial compression is consistent with how many firms self-report setting wages: a survey of primarily North American employers (Culpepper & Associates, 2011) found 29 percent report paying the same nominal wages across locations, and several large employers (Amazon, IKEA, Walmart) have self-imposed country-wide wage floors.

Q8. What role do headquarter-establishment country-pair characteristics play in predicting which establishments exhibit stronger wage transmission?

Using a causal forest algorithm to estimate the conditional average treatment effect of a minimum wage shock at headquarters and then constructing above- versus below-median predicted treatment groups, the paper finds that differences in transmission are “generally not large” but that higher transmission is somewhat associated with characteristics of the headquarter-establishment country pair: pairs that are more closely connected and share more similarities (e.g., common language, closer geographic distance) transmit more. Some foreign-establishment-country characteristics such as inequality and urbanization also appear related. In contrast, occupation characteristics (such as offshorability), the sector the multinational operates in, and characteristics of the headquarter country alone have little explanatory power. The paper notes these findings do not conclusively rule out alternative explanations but are more consistent with administrative coordination channels than with technology- or employment-based ones.

Q9. What role do potential fairness preferences and firm-wide wage norms play in the paper’s interpretation?

The authors suggest several possible mechanisms through which firm-wide wage-setting procedures could operate. Firms may adopt uniform wage-setting to reduce the menu and information costs of localized wage-setting (Lemieux et al., 2012); to increase foreign worker morale, particularly if workers are averse to pay inequality relative to headquarters peers (Card et al., 2012; Dube et al., 2019); or to respond to fairness preferences from headquarters workers or consumers (Harrison & Scorse, 2010). Survey evidence from Alfaro-Urena et al. (2019) explicitly records that multinationals pay high wages abroad in part to “ensure cross-country pay fairness within the MNC.” Alternatively, the authors note that firm-wide wage-setting may represent a form of firm inertia or mistakes — an inability or unwillingness to fully adapt pricing and compensation to local contexts — consistent with DellaVigna & Gentzkow (2019). The paper presents this as an open question for future research rather than definitively adjudicating among the explanations.

Q10. How does the Brazil RAIS data corroborate and extend the global multinationals findings?

The RAIS matched employer-employee administrative data cover all employees at each Brazilian establishment of the 44 multinationals in the global dataset that operate in Brazil, with individual-level information on wages, education, race, gender, age, and tenure. Because RAIS is an administrative census of formal-sector employment rather than a consulting firm’s client dataset, it provides independent corroboration of the main findings. The paper confirms using RAIS that wages of individual workers at multinationals’ Brazilian establishments rise abruptly when their foreign headquarters experience positive external shocks. The RAIS data then enable the additional step of examining employment responses, where event study and panel regressions find little change in total employment at multinationals’ Brazilian establishments following such shocks — evidence against employment- or technology-driven indirect pathways as the primary explanation for wage transmission.

Key Concepts

Wage anchoring: The practice by which a multinational ties wages at its foreign establishments, for workers in a given occupation, to the wage level at its headquarters for the same occupation. In this paper’s usage, anchoring does not mean wages are set identically across locations but that they are partially linked — externally imposed changes in headquarters wages are partially transmitted to foreign establishment wages — rather than being independently set based on local labor-market conditions.

Across-country wage compression: The reduction in the cross-country dispersion of wages within a multinational that results from wage anchoring. Because foreign establishment wages are partially pulled toward headquarters levels rather than fully adjusting to local wages, the multinational’s within-firm wage distribution is more compressed across countries than it would be under purely localized wage-setting. In the paper’s data, this compression is particularly pronounced for low-skill occupations in lower-income host countries.

Firm-wide wage-setting procedures: Administrative practices, such as applying a single pay scale or a fixed wage ratio across all of a firm’s establishments regardless of location, that mechanically link foreign establishment wages to headquarters wages. The paper argues these procedures — rather than correlated technology shocks or employment adjustments — are the proximate driver of wage anchoring, on the basis of the employment non-response in Brazil, the persistence of anchoring after controlling for headquarters-country technology shocks, and the pattern of heterogeneity across country pairs.

Partial transmission: A load-bearing qualifier in this paper describing the magnitude of wage anchoring: headquarters wage changes arising from external shocks are not fully extended to foreign workers, but a fraction of the change is passed through. The estimated pass-through in descriptive regressions ranges from about 0.09 to 0.31 depending on specification and sample, and is highest (around 0.28) for low-skill occupations. The partial nature of transmission means that the spatial compression is real but incomplete.

Wage slope: The difference between log average wages paid by an employer to workers in jobs of consecutive skill levels within an occupational category, at a given establishment. The paper documents that the wage slope at foreign establishments is correlated with the wage slope at headquarters — a 10 percent greater consecutive-skill wage gap at headquarters is associated with a roughly 1.4 percent greater gap at the foreign establishment — suggesting that the anchoring extends beyond the level of wages to the internal wage structure.

External shocks to headquarter wages: Minimum wage increases in the headquarters country or U.S. state, and exchange rate fluctuations that change the USD value of wages set in local currency. These shocks serve as instruments or quasi-experimental sources of variation in headquarters wages that are plausibly exogenous to conditions at foreign establishments, enabling causal identification of the effect of headquarter wage changes on foreign establishment wages.

Causal forest (heterogeneous treatment effect estimation): A machine learning algorithm used in the paper to estimate the conditional average treatment effect of a minimum wage shock at headquarters, allowing the size of the foreign wage response to vary flexibly with a large set of characteristics (job, employer, sector, headquarter country, establishment country, headquarter-establishment country pair). The resulting predicted treatment effect scores are used to construct above- and below-median transmission groups, which are then compared across observable characteristics to identify what predicts stronger wage anchoring.

Aggregate demand externality and self-fulfilling default cycles

Mon, 01 Jan 0001 00:00:00 +0000

Layer 1 — Overview

Research Question. Why do corporate defaults cluster in recurring episodes rather than occurring smoothly? The paper asks whether observable fundamental factors — firm characteristics and macroeconomic variables — are sufficient to account for the clustered default patterns documented in the data, and, if not, what theoretical mechanism can explain them.

Empirical Motivation. Using Moody’s historical default rate data, the authors document that the long-run average corporate bond default rate during 1866–2008 was approximately 1.50%, yet defaults were highly episodic: the worst three-year period during the Great Depression totaled 12.88%, and the three-year period 1873–1875 after the railroad boom reached 35.80%. A Markov switching regression on post-war default rate data (1951–2017) strongly rejects a linear no-switch model in favor of a two-regime model across all information criteria (AIC, HQ, SC, and log-likelihood). The estimated high-default regime has a mean default rate of 1.93% (unconditional mean µ/(1−ρ)) — roughly eight times the 0.23% mean of the low-default regime — and a standard deviation nearly six times larger. The high-default regime persists on average 5.81 years (transition probability of staying ≈ 0.83), while the low-default regime lasts approximately 7.52 years (staying probability ≈ 0.87).

Model. The authors build a continuous-time general equilibrium model with Dixit-Stiglitz monopolistic competition (CES aggregation with elasticity σ) and an endogenous entry/exit/default mechanism. Households are risk-neutral and also act as entrepreneurs. At each instant, δµ new project blueprints are invented; entrepreneurs borrow to invest, then face an idiosyncratic liquidity shock z drawn from a Pareto distribution G(z). Entrepreneurs continue if z ≤ Z*, a cutoff determined by the continuation value of the firm, and default otherwise. Continuing firms become monopolists for a new variety until that variety becomes obsolete at a Poisson rate δ. Each operating firm must borrow working capital constrained by its firm value Vt (collateral constraint wtnjt ≤ θVjt). The entire equilibrium reduces to a two-dimensional dynamical system in (Mt, Vt), where Mt is the number of operating firms (state variable) and Vt is the firm value (control variable).

Key Mechanism — Demand Externality and Positive Feedback. Under CES aggregation, each firm’s gross revenue is y_jt^(1–1/σ) · Y_t^(1/σ), making individual firm revenue increasing in aggregate output Yt. A decline in Yt lowers firm profits and firm value Vt, which raises the default threshold Z* and increases the fraction of projects that are abandoned. Fewer operating firms further depress Yt, closing a positive feedback loop. This static strategic complementarity (through CES) is combined with dynamic strategic complementarity through the borrowing constraint: higher expected future firm value relaxes current working capital constraints, raising current production.

Multiple Equilibria and Global Dynamics. The two-locus phase diagram (˙Mt = 0 and ˙Vt = 0) yields multiple intersections — and hence multiple steady states — when productivity A lies in an intermediate range (A < A < Ā). When A > Ā, a single good saddle-point equilibrium exists. When A < A, no equilibrium can be sustained. In the intermediate range, a good steady state (low default rate, high firm value) coexists with a bad steady state (high default rate, low firm value). The good steady state is always a saddle; the bad steady state is a sink (locally indeterminate, κ < κ_Hopf) or a source (locally determinate but globally indeterminate, κ > κ_Hopf), depending on parameter κ = 1 + (θ + ρ)/δ.

Bogdanov-Takens Bifurcation. Using global dynamical methods, the paper demonstrates richer indeterminacy than local analysis permits. Near the Bogdanov-Takens point (κ, Ā), the system can exhibit: (a) infinite equilibrium trajectories converging to the bad steady state; (b) saddle-loop bifurcation at κ = κ_SL ≈ 14.25 (under the baseline calibration); (c) stable or unstable periodic orbits for κ ∈ (κ_Hopf, κ_SL) — endogenous business cycles in a perfect-foresight equilibrium; and (d) multiple trajectories from near the source that converge to the good saddle equilibrium.

Simulation of Clustered Defaults. With a two-state Markov process for productivity (Ah = 10, Al = 9.34) and pessimistic sentiment shifts (the “ugly” state), the model replicates the cluster pattern: in the good/high-productivity state, the default rate is near zero; when productivity falls to low and sentiment turns pessimistic, the default rate can spike to approximately 12%, consistent with the Great Depression observation. Critically, the paper shows that the cluster pattern is generated only under global dynamics — restricting to local dynamics produces substantially smaller fluctuations in the default rate, confirming that the ugly (sink) equilibrium is essential.

Policy. A countercyclical subsidy to non-defaulting entrants — financed by a lump-sum tax, calibrated as tr(Vt) = τ(VG − Vt) — shifts the ˙Mt = 0 locus downward and can eliminate the bad steady state entirely, leaving only the good saddle-path equilibrium. The paper provides a closed-form sufficiency condition for τ (Proposition 7).

Scope Conditions. Multiple equilibria require: (i) productivity in the intermediate range A < A < Ā; (ii) the elasticity of substitution σ not too large (below a threshold σ̄ that itself depends on µ); (iii) the borrowing constraint binding (δ > θσ/((σ–1)κ), which can always be ensured by choosing δ sufficiently large). Clustered defaults in the simulation require the joint occurrence of a negative fundamental shock (productivity falling from high to low) and a shift to pessimistic sentiment; either factor alone generates only limited default amplification.

In depth

Q1. What is the core empirical motivation for the model, and what does the regime-switching analysis establish?

The paper documents that the corporate bond default rate, drawn from Moody’s data covering 1866–2008, clusters sharply in episodes: the long-run average is 1.50%, yet the worst three-year period of the Great Depression totaled 12.88% and 1873–1875 reached 35.80%. A Markov switching regression on 1951–2017 data strongly rejects a linear no-regime-switch model across all four criteria (log-likelihood, AIC, HQ, SC). The two-regime model identifies a high-default regime with unconditional mean 1.93% and standard deviation roughly six times the low-default regime’s, a persistence probability of approximately 0.83 (duration ≈ 5.81 years), and a low-default regime with unconditional mean 0.23% and persistence approximately 0.87 (duration ≈ 7.52 years). The regime-switching result supports the prior literature’s claim (Das et al. 2007; Duffie et al. 2009; Azizpour et al. 2018) that observable fundamentals alone cannot account for clustered defaults.

Q2. How does the Dixit-Stiglitz CES structure generate a demand externality that links aggregate output to individual firm default decisions?

Under CES aggregation with elasticity σ, each firm’s gross revenue equals y_jt^(1–1/σ) · Y_t^(1/σ) (equation 7), so aggregate output Yt directly enters individual firm revenue. Each firm takes Yt as given, yet the aggregation of all firms’ output determines Yt. When aggregate output falls — because more firms have defaulted and exited production — each remaining firm’s revenue and profit fall, reducing the firm’s continuation value Vt. A lower Vt tightens the borrowing constraint (wtnjt ≤ θVjt), reduces working capital, and raises the probability that the firm’s idiosyncratic liquidity shock will exceed the default threshold Z*, producing further defaults. This positive feedback constitutes the demand externality: individual firms’ decisions are strategic complements, both statically (through CES demand) and dynamically (through the borrowing constraint on working capital).

Q3. What is the two-dimensional dynamical system that summarizes the equilibrium, and what do the two loci look like in the phase diagram?

The entire equilibrium reduces to two differential equations in (Mt, Vt): ˙Mt = –δ[Mt – µG(Z(Vt))] and ˙Vt = κδVt[1 – F(Vt, Mt)], where F captures the ratio of monopoly profit to firm value including the borrowing constraint. The ˙Mt = 0 locus slopes strictly upward because a higher firm value Vt raises the default cutoff Z* and lowers the fraction of entrants who default, so more firms survive and Mt rises until absorption equals entry. This locus has a minimum at Mm = µG(zm) because firm value must exceed the threshold that sustains the credit market. The ˙Vt = 0 locus is non-monotonic: it first slopes upward (more firms raise aggregate demand and profit through the scale/externality channel) and then slopes downward (more firms tighten the labor market, raising wages and lowering profits). The two opposing channels make the ˙Vt = 0 locus hump-shaped, creating the possibility of two intersections and hence two steady states.

Q4. Under what conditions do multiple steady states exist, and what does each look like?

Multiple steady states exist when productivity A satisfies A < A < Ā, where A and Ā are closed-form thresholds given by Equations (A.3) and (A.4), and the elasticity of substitution σ is below a threshold σ̄ (Equation A.5). When A < A, neither locus intersects and no equilibrium is sustainable. When A > Ā, a single good saddle-point equilibrium exists. In the multiple-equilibria range, the good steady state has a higher firm value and a smaller fraction of firms defaulting; the bad steady state has a lower firm value and a higher default rate. Under the paper’s numerical calibration (A = 10, η = 6.5, Zmin = 0.88), the low default rate at the good steady state is approximately 1.5% and the high default rate at the bad steady state is between 12% and 13%.

Q5. What are the local dynamics around each steady state, and how does parameter κ determine whether the bad steady state is a sink or a source?

Proposition 5 shows that the good steady state is always a saddle point, ensuring a unique convergent path for initial Mt near Mg_0. The bad steady state’s local nature depends on κ = 1 + (θ + ρ)/δ and the critical value κ_Hopf = 1 + ψ/(θMb_0Vb_0). When κ is between 1 and κ_Hopf, the Jacobian trace is negative and the bad steady state is a sink with one order of indeterminacy: given Mt close to Mb_0, infinitely many initial values of the control variable Vt satisfy all equilibrium conditions. When κ > κ_Hopf, the bad steady state is a source point; the economy diverges from it. Because κ does not affect the steady-state locations (Proposition 3), one can vary κ to change the dynamic character without moving the equilibria in the phase diagram.

Q6. What does the global dynamics analysis reveal that local analysis misses?

Global analysis via Bogdanov-Takens bifurcation (Proposition 6) reveals three classes of dynamics absent from local analysis. First, even in the saddle-source case (locally determinate), there exist multiple equilibrium trajectories diverging from near the bad (source) steady state and converging to the good (saddle) steady state; these paths satisfy all equilibrium conditions including transversality but are incorrectly ruled out by local methods. Second, at the critical value κ_SL ≈ 14.25 (under the baseline calibration), a homoclinic saddle-loop orbit connects the saddle point to itself — all trajectories interior to the loop converge to the bad steady state. Third, for κ between κ_Hopf and κ_SL, periodic orbits arise in a perfect-foresight equilibrium with no external shocks. For example, at κ = 14.9, the phase diagram displays a unique periodic orbit around the bad steady state, with two distinct initial values of Vt for any given Mt near the orbit — endogenous, perpetual oscillations without any exogenous driving force. Numerical experiments confirm that Mt = 0.23 admits two rational-expectations values of Vt (2.09 and 3.55) on the saddle path alone, illustrating abundant indeterminacy even at the endpoint.

Q7. How does the paper simulate the clustered default pattern and what is the role of the “ugly” equilibrium?

The paper constructs a three-state Markov economy: “good” (high productivity Ah = 10, single saddle equilibrium, near-zero default rate), “bad” (low productivity Al = 9.34, saddle-path equilibrium, modestly elevated defaults), and “ugly” (low productivity, sink-path equilibrium, sharply elevated defaults). The ugly state is reached when, upon a productivity decline, firms adopt pessimistic expectations and the economy slides to the high-default sink instead of remaining on the low-default saddle path. Transition probabilities are set so that the average ugly-state duration is approximately 6 years and roughly 45% of periods are ugly, consistent with the regime-switching estimates. With Zmin = 0.2 and η = 15, the ugly-state default rate can reach approximately 12%, matching the Great Depression observation. The counterfactual experiment deletes the ugly state (pGU = 0) and resets pGB = 0.45: the resulting default rate stays close to zero with no cluster pattern, demonstrating that global dynamics (the ugly sink) rather than the fundamental shock alone generate the clustering.

Q8. Can purely sentiment-driven cycles generate the clustered default pattern?

Section 6.2 fixes productivity at a low level (A = 9.53) and drives switches between the bad (saddle path) and ugly (sink path) states by pure sentiment shocks alone (πBU and πUB). The simulated default rate does spike upward when sentiment turns pessimistic, but the rises are generally more modest than in the combined fundamental-plus-sentiment exercise, and the default rate can no longer be characterized as countercyclical. The authors conclude that the realistic observed default cluster is the result of a combination of negative fundamental shocks and pessimistic sentiment shifts; either ingredient alone is insufficient to replicate all features of the data.

Q9. How does the collateral constraint on working capital create dynamic strategic complementarity?

Following Jermann and Quadrini (2012), Liu and Wang (2014), and Lian and Ma (2021), each operating firm must borrow to pay wages each period, subject to the constraint wtnjt ≤ θVjt. Since Vt is forward-looking (the discounted present value of the firm’s monopoly profit stream), optimistic expectations about future output raise Vt, relax the borrowing constraint, allow firms to hire more labor and produce more output today, and thereby validate optimism. This intertemporal complementarity means that the equilibrium is sensitive not only to current fundamentals but also to beliefs about the future, opening the channel for sentiment-driven multiple equilibria and self-fulfilling cycles.

Q10. What is the policy remedy for the bad equilibrium, and how does it work?

Proposition 7 establishes that a countercyclical lump-sum-tax-financed subsidy to non-defaulting entrants, tr(Vt) = τ(VG − Vt), with τ exceeding a computable threshold, eliminates the bad steady state. The subsidy works by effectively raising the value of continuing for a firm at any given Vt and Mt, shifting the ˙Mt = 0 locus downward until it lies below the ˙Vt = 0 locus everywhere in the relevant range, eliminating the second intersection and leaving only the good saddle-path equilibrium. The numerical illustration uses parameters from Section 6 with A = 9.67 and τ = 1/3 to demonstrate that the bad steady state vanishes and the phase diagram has a single equilibrium. The subsidy is self-limiting: in normal conditions when firm value is already high (Vt ≈ VG), the transfer is near zero.

Cui and Kaas (2021) show default cycles from self-fulfilling beliefs in a fully competitive firm environment, focusing on intertemporal default coordination. The present paper differs in three respects. First, firms engage in monopolistic competition under CES preferences, and the main novel mechanism is cross-firm default contagion through the demand externality — which can produce multiple equilibria even in a static setting, without any intertemporal coordination. Second, the paper examines the joint role of fundamental shocks and aggregate-demand externalities together, showing that multiple equilibria arise only in the presence of sufficiently low productivity (A < A < Ā), making indeterminacy contingent on external fundamentals rather than structural parameters alone. Third, the continuous-time framework with full global analysis via Bogdanov-Takens bifurcation allows characterization of periodic orbits and the interaction of the ugly sink path with Markov productivity regimes — dynamics not covered in Cui and Kaas (2021).

Q12. What is the markup prediction of the model, and is it consistent with empirical evidence?

Under Dixit-Stiglitz CES with elasticity σ, the equilibrium markup of each intermediate good equals σ/(σ–1) at the firm level. However, the measured gross markup — which includes the effective collateral constraint — is predicted to comove positively with the default rate in the model, and hence the markup is countercyclical. The paper notes this is consistent with the well-documented empirical regularity in Bils (1987) and Rotemberg and Woodford (1999). Additionally, the model replicates the finding in Gilchrist and Zakrajšek (2012) that a low default rate is associated with a high firm entry rate.

Key Concepts

Demand Externality (Dixit-Stiglitz type). In the paper’s sense, this is the mechanism by which individual firms’ revenues depend on aggregate output Yt through the CES aggregator: each firm’s gross revenue is y_jt^(1–1/σ) · Y_t^(1/σ). Each firm takes Yt as given, but the aggregation of all firms’ output determines Yt. This creates a positive spillover: more operating firms raise aggregate output, which raises each firm’s revenue, and vice versa. The paper uses this as the central transmission channel for self-fulfilling defaults, in contrast to prior literature that emphasized debt networks or asymmetric information contagion.

Self-Fulfilling Default Cycle. A dynamic equilibrium path in which pessimistic expectations about aggregate output are validated: if firms anticipate that more other firms will default (lowering Yt), their own continuation value Vt falls, raising the probability that their idiosyncratic liquidity shock will exceed the default threshold, increasing actual defaults, further lowering Yt, and so on. The paper distinguishes this from shock-amplifier stories by constructing a model with multiple rational-expectations equilibria in which the aggregate default rate is determined in part by initial beliefs.

Bogdanov-Takens Bifurcation. A mathematical tool for global dynamics analysis applied to two-dimensional continuous-time systems. In the paper, it is used to characterize system behavior when the parameters (κ, A) are near the point (κ̄, Ā) at which the Jacobian has two zero eigenvalues. Near this point, the system can exhibit saddle-loop bifurcations, Hopf bifurcations, homoclinic orbits, and stable or unstable periodic orbits — all of which are invisible to local linearization analysis. The paper uses this to establish that indeterminacy is more pervasive than local analysis suggests.

Good / Bad / Ugly Steady States. In the paper’s three-regime framework: the “good” state is the unique saddle-point equilibrium under high productivity Ah, with near-zero default rates; the “bad” state is the saddle-path equilibrium under low productivity Al, with modestly elevated defaults; the “ugly” state is the sink-path equilibrium under low productivity, characterized by self-fulfilling high default rates (up to ~12%). The ugly state is reached only when pessimistic sentiment coincides with the low-productivity regime, and it is the ugly state that generates the cluster pattern in simulation.

Collateral Constraint on Working Capital. The firm-level borrowing constraint wtnjt ≤ θVjt, where θ is the collateral ratio and Vjt is the firm’s continuation value. This constraint means that higher expected future profits — by raising Vt — relax the current borrowing limit, increase current labor demand and output, and create dynamic strategic complementarity between current and future production. It is this constraint, combined with the CES demand externality, that makes the dynamical system two-dimensional and generates the non-monotonic ˙Vt = 0 locus.

Global Indeterminacy. The existence, given an initial state variable Mt, of multiple equilibrium trajectories — each satisfying all equilibrium conditions including transversality — that converge to different steady states or follow periodic paths. In the paper, global indeterminacy arises even when the system is locally determinate (e.g., in the saddle-source case): trajectories diverging from near the source steady state can converge to the saddle steady state along multiple paths, none of which is detectable by local linearization.

Periodic Orbit (Endogenous Cycle). In the paper, a closed trajectory in the (Mt, Vt) phase plane that the economy follows indefinitely in perfect-foresight equilibrium without any exogenous shocks. Such orbits exist for κ ∈ (κ_Hopf, κ_SL), are stable if S < 0 and unstable if S > 0 (where S is a computable quantity defined in Equation A.13). Their existence demonstrates that business cycles can arise purely from internal forces — the demand externality and borrowing constraint — consistent with the view in Beaudry, Galizia, and Portier (2020).

Aggregation and the Estimation of Quality Change

Mon, 01 Jan 0001 00:00:00 +0000

Errico and Lashkari address two intertwined problems in the measurement of aggregate price indices: how to account for quality change and variety entry/exit when the demand system is not CES, and how to identify flexible demand systems from prices and market shares alone when supply and demand shocks are correlated. The paper makes a theoretical contribution and a methodological one, then applies both to the measurement of US import price inflation over 1989–2016.

The theoretical contribution generalizes the unified CES price index of Redding and Weinstein (2020a) and the Feenstra (1994) variety correction to the full class of smooth, invertible demand systems. The key insight is that the contribution of quality change to the aggregate price index depends on heterogeneous cross-product elasticities of substitution, not a single scalar as in the CES case. For practical implementation, the paper specializes to the Homothetic with Aggregator (HA) family of demand systems — which includes Kimball (1995), CRESH (Hanoch, 1971), and HSA (Matsuyama and Ushchev, 2017) — showing that within this family cross-product elasticities collapse to product-level elasticities, dramatically reducing dimensionality. The resulting approximate price index (Proposition 2) weights each product by its love-of-variety index 1/(epsilon_it − 1), departing from the uniform CES weighting.

The methodological contribution is a dynamic panel (DP) identification strategy that exploits the Markov structure of quality shocks. The paper assumes that innovations to product quality are mean-zero conditional on lagged prices. Under flexible pricing, firms maximize current-period profits without regard to future demand shocks, so lagged prices are valid instruments for current prices. This permits identification of rich demand systems without external cost instruments and without the conventional assumption of uncorrelated supply and demand shocks. The conventional Feenstra–Broda–Weinstein (FBW) approach imposes zero correlation between quality shocks and prices; the paper shows that when quality and marginal cost are positively correlated, FBW produces downward-biased elasticity estimates (endogeneity bias).

The empirical application constructs a dataset covering 155 time-consistent 5-digit NAICS industries over 1989–2018, matching US customs import data with domestic production data and treating country-of-origin varieties as the unit of observation. The paper estimates both CES and Kimball demand systems using the DP approach and compares them to FBW estimates.

Key quantitative findings: First, DP-estimated CES elasticities are larger on average than FBW estimates (weighted mean 5.99 vs. 4.62), confirming a downward endogeneity bias in conventional methods. Second, Kimball mean elasticities exceed CES estimates (weighted mean 3.11 for Kimball vs. 5.99 for CES at the industry level, but the Kimball distribution has a mean of 17.0 and median 4.70), reflecting a heterogeneity bias — CES understates the dispersion of elasticities and thereby understates the elasticity relevant for the base (domestic) product whose market share is declining. Third, quality improvements in imported goods reduced the US import price index by approximately 20.2 percentage points cumulatively (0.67 p.p. annually) under Kimball demand, and 15.9 percentage points cumulatively (0.53 p.p. annually) under CES demand, over 1989–2018. The headline figure cited in the abstract is approximately 0.7 p.p. annually. The aggregate import price index (price plus quality components combined) fell by 8.25 p.p. cumulatively under Kimball and 4.01 p.p. under CES, compared to a BEA PCE index increase of 57.8 p.p. over the same period. Sectorally, machinery and electrical equipment account for roughly 60% of total quality gains (~200 p.p. cumulative). By country, China accounts for approximately 35% of cumulative quality gains, with non-OECD countries collectively contributing ~59%, and China’s quality upgrading accelerating after WTO accession.

Validation using US automobile market data (1980–2018) confirms the DP identification assumption: controlling for current product characteristics, future characteristics are uncorrelated with current prices. The DP approach produces elasticity estimates and quality change measures similar to those obtained using real exchange rate cost-shock instruments, and the Kimball demand closely matches mixed logit (BLP) estimates of both price elasticities and price indices. CES estimates exhibit a measurable downward heterogeneity bias in this validation setting, which the paper traces theoretically and empirically to a positive covariance between demand elasticities and price volatility across products.

Scope conditions: results apply to homothetic (income-invariant) demand; nonhomothetic extensions are provided as a generalization (Proposition 4) but not the primary focus. The import price index measures the cost of imports conditional on given domestic consumption; it does not capture full consumption-side welfare effects including substitution away from domestic varieties.

Q1: What is the core theoretical result on price index measurement beyond CES? Proposition 1 shows that for any smooth, invertible demand system satisfying the connected substitute property, the change in the log aggregate price index can be approximated as a weighted sum of log price changes and log expenditure share changes, with the expenditure share changes premultiplied by the inverse of the matrix Psi_t capturing cross-product elasticities of substitution. In the CES special case this reduces to the scalar (1/(sigma−1)) weight of the Redding-Weinstein (2020a) CUPI. The key departure in general demand is that the weight applied to each product’s expenditure share change is heterogeneous and depends on the full matrix of cross-product substitutabilities, not a single constant.

Q2: How does the HA (Homothetic with Aggregator) family simplify the theoretical results? For HA demand — which nests Kimball, CRESH, and HSA — Lemma 1 establishes that cross-product elasticities sigma_ij depend only on product-level elasticities epsilon_i through simple analytic formulas (e.g., epsilon_i * epsilon_j / epsilon-bar for HDIA), reducing the estimation problem from an N×N matrix to a vector of N scalars. Proposition 2 then gives an approximate price index in which each product’s expenditure share change is weighted by its love-of-variety index 1/(epsilon_it − 1), rather than a common CES scalar. This is the operative formula for the Kimball application.

Q3: What is the endogeneity bias in conventional elasticity estimation and how large is it? Conventional FBW methods assume supply and demand shocks are uncorrelated; when quality improvements are positively correlated with product prices (e.g., higher-quality goods command higher prices and also have higher marginal costs), FBW estimates are biased downward. The paper documents this: for CES demand, the DP-estimated weighted mean elasticity is 5.99 versus 4.62 under FBW, and for median estimates the DP value is 4.27 versus 2.58 under FBW, across 155 industries. The bias matters because underestimated elasticities imply underestimated quality changes and a smaller quality correction to the price index.

Q4: What is the heterogeneity bias and how does it differ from the endogeneity bias? Even after correcting for endogeneity, CES demand imposes a single elasticity per industry, ignoring the cross-product distribution. The paper shows that the CES estimate is an average that does not correctly capture the behavior of the base product (the domestic US variety) whose market share is declining. Because the domestic variety tends to have a lower elasticity than the import average, CES understates this product’s love-of-variety index and thereby understates the quality correction attributable to rising import shares. Theoretically and empirically (Appendix E.4), this bias is larger when demand elasticities covary positively with price volatility across products.

Q5: What is the dynamic panel identification assumption and why does it hold under flexible pricing? The paper assumes that quality shock innovations u_it are mean-zero conditional on lagged log prices: E[u_it | log p_it−1] = 0. Under flexible pricing, firms maximize current-period profits using current variables only; current prices are determined by current quality but are not chosen in anticipation of future quality shocks. Therefore lagged prices are uncorrelated with future quality innovations, making them valid instruments for current prices. This assumption is validated empirically in the automobile market: controlling for current product characteristics (horsepower, weight, fuel economy), future characteristics are not correlated with current prices.

Q6: What are the headline findings on quality change in US import prices? Under Kimball demand, quality improvements in imported goods reduced the US import price index by 20.2 percentage points cumulatively over 1989–2018, equivalent to 0.67 p.p. annually (the abstract rounds this to approximately 0.7 p.p. annually). Under CES demand, the quality contribution is 15.9 p.p. cumulatively (0.53 p.p. annually). The aggregate import price index combining price and quality changes fell by 8.25 p.p. under Kimball and 4.01 p.p. under CES over the same period. These figures imply that official import price statistics substantially overstate import price inflation by failing to account for quality improvements.

Q7: Which sectors and countries drive the quality gains? Machinery and electrical equipment account for approximately 60% of total cumulative quality gains, with roughly 200 p.p. cumulative quality improvement in that sector. Computer and peripheral equipment (NAICS 3341) is a notable contributor — the official import-to-producer price ratio shows a nearly five-fold increase between 1989 and 2018, but after quality adjustment this ratio reverses direction. By country of origin, China accounts for approximately 35% of cumulative quality gains; other non-OECD countries collectively contribute approximately 59%; OECD countries contribute approximately 7%. China’s quality upgrading is documented to accelerate following its WTO accession.

Q8: Why does CES understate the quality correction relative to Kimball? The primary mechanism is that the US domestic variety — which serves as the numeraire for quality measurement — has a declining market share over the sample period. In Kimball demand, products with declining market shares are assigned lower elasticities (higher love-of-variety indices), amplifying the quality correction associated with import share gains. CES imposes a uniform elasticity, failing to capture this asymmetry. The paper shows that the key driver of the CES-Kimball gap in the import price index is CES underestimating the love-of-variety index of the base domestic product.

Q9: How is the identification approach validated in the automobile market? Using the Berry-Levinsohn-Pakes dataset extended by Grieco et al. (2024) for 1980–2018, the paper first verifies empirically that future product characteristics (horsepower, weight, fuel efficiency) are uncorrelated with current prices after controlling for current characteristics. It then compares DP estimates for both CES and Kimball demand against estimates obtained using real exchange rate (RER) variation as a cost-shock instrument, finding similar results in both cases. Finally, it compares Kimball and CES estimates against mixed logit (BLP) demand: Kimball closely matches BLP price elasticities and implied quality changes, while CES shows a downward heterogeneity bias.

Q10: What does the automobile market validation imply for the import price index methodology? Since Kimball demand matches the richer mixed logit demand in the auto setting — where product characteristics are observed — the validation provides evidence that Kimball demand serves as a good approximation to rich heterogeneous-elasticity models when characteristics are unavailable. The paper constructs price indices for the US auto industry based on mixed logit, mixed CES, Kimball, and standard CES, and shows that the Kimball index is closer to the mixed logit and mixed CES indices than is the standard CES index.

Q11: How does the paper handle product entry and exit? Proposition 3 generalizes Proposition 1 to accommodate product entry and exit. The expression includes a variety correction analogous to Feenstra (1994) but generalized to non-CES settings via the mean love-of-variety index of entering and exiting products. In the CES special case this reduces exactly to the Feenstra (1994) correction. In the empirical application to US imports, entry and exit of country-of-origin varieties within industries is a relevant margin given the expansion of trading partners over the sample.

Q12: How does the paper relate to Redding and Weinstein (2020a)? Redding and Weinstein (2020a) derive a price index formula under CES demand that accounts for taste shocks, applied to US retail scanner data where quality is constant at the barcode level. The present paper generalizes their CUPI formula beyond CES to general and HA demand systems, and extends their identification strategy to settings where demand changes partly reflect quality changes rather than pure taste shocks. The paper also shows that the CES assumption used in Redding-Weinstein may overstate the contribution of taste shocks to cost-of-living indices, since part of the expenditure share variation attributed to taste shocks under CES would be reassigned under heterogeneous-elasticity demand.

Q13: Does the paper address welfare implications beyond the import price index? The paper explicitly notes that the import price index does not capture the full consumption-side welfare effects of rising imports, since gains from lower import prices may be partly offset by substitution away from domestic varieties. The paper also notes that it abstracts from nonhomotheticity (income effects), pointing to Jaravel and Lashkari (2021) for that extension. The primary welfare-relevant quantity reported is the quality-adjusted change in the cost of the imported goods basket, which is the import price index in the conventional sense.

Love-of-variety index: For a product i, defined as 1/(epsilon_it − 1) where epsilon_it is the product-level demand elasticity in an HA demand system. It measures the welfare value of having access to that variety and serves as the weight applied to expenditure share changes in the generalized price index formula (Proposition 2). In the CES special case all products share the same love-of-variety index 1/(sigma−1).

Homothetic with Aggregator (HA) demand: A family of income-invariant (homothetic) demand systems — including Kimball (1995), CRESH (Hanoch, 1971), and HSA (Matsuyama and Ushchev, 2017) — in which preferences are represented by a utility function with a specific aggregator structure. The key property exploited in the paper is that cross-product elasticities of substitution sigma_ij depend only on product-level elasticities epsilon_i through simple analytic formulas, reducing the dimensionality of the estimation problem from an N×N matrix to N scalars.

Endogeneity bias (in elasticity estimation): Downward bias in estimated elasticities of substitution arising from a positive correlation between product quality shocks and prices. When higher-quality products command higher prices and also have higher marginal costs, conventional methods (FBW) that assume zero correlation between supply and demand shocks will attribute part of the price variation to supply, underestimating how much demand responds to price. The paper documents this bias as the gap between DP and FBW estimates.

Heterogeneity bias (in elasticity estimation): Additional downward bias in CES elasticity estimates relative to the mean of Kimball elasticities, arising from CES imposing a single elasticity per industry when the true elasticities are heterogeneous across products. The bias is stronger for differentiated products and is theoretically traced to a positive covariance between demand elasticities and price volatility across products.

Dynamic panel (DP) identification: The paper’s proposed identification strategy, which exploits the Markov structure of quality shocks. The key moment condition is that quality shock innovations are mean-zero conditional on lagged prices, which holds under flexible pricing. Lagged prices (and higher-order lags and nonlinear transformations) serve as instruments for current prices, permitting identification of demand parameters without external cost instruments.

Quality shock (phi_it): An unobserved product characteristic that shifts demand for product i at time t, defined through the utility function as a scalar multiplying the quantity consumed. Quality is identified from residual demand — the component of demand not explained by price — following the approach of Khandelwal (2010) and Hallak and Schott (2011). The paper models quality shocks as following a stationary AR(1) process with product-specific means.

Unified CES price index (CUPI): The price index formula of Redding and Weinstein (2020a) for CES demand, which decomposes the aggregate price change into a price component (expenditure-share-weighted price changes) and a quality/taste component proportional to (1/(sigma−1)) times expenditure share changes. The present paper’s Proposition 2 generalizes CUPI to HA demand by replacing the scalar 1/(sigma−1) with product-specific love-of-variety indices.

AI and task efficiency

Mon, 01 Jan 0001 00:00:00 +0000

AI can improve decisions, raise firm productivity, and accelerate human capital growth through its effect on signal quality in problem-solving tasks, but the consequences are heterogeneous across the skill distribution and depend on how AI changes the hierarchy within firms. This paper proposes a framework in which AI improves the accuracy of the signals that guide human decisions—individually and in groups—and derives implications for firm organization, wages, and productivity. It also examines preliminary evidence: a cross-sectional regression of changes in TFP growth (2024 versus 2022) on sectoral AI exposure (Eisfeldt et al. 2024) for Compustat firms yields a positive relationship, statistically significant at the 10% level at the 3-digit NAICS sector level and at the 5% level at the firm level, with a slope coefficient of 0.206 for the firm-level regression. The paper compares AI to earlier general purpose technologies (GPTs)—electricity and information technology—finding that if there is a productivity delay for AI it appears shorter than the five- and eight-year delays documented for electrification and IT.

In depth

Q1. What is the theoretical framework linking AI to decisions and productivity?

The paper models several mechanisms through which AI may improve outcomes by raising the accuracy of signals that guide problem-solving: when signal accuracy rises, individuals and groups make better decisions, potentially enabling lower-level workers to handle more complex tasks and reducing the need for expensive higher-level solutions. For example, if AI allows managers to understand problems faster, they can handle more problems at a given time, potentially reducing demand for specialized expert judgment at lower hierarchy levels. Alternatively, if AI allows lower-level workers (clerks, nurses) to handle tasks previously requiring specialists (partners, doctors), the demand for specialists may fall and the wage premium for top-tier workers may narrow. The direction of the effect depends on whether AI is a better complement to high-skill or to low-skill tasks.

Q2. What does the empirical evidence show about AI’s current productivity effects?

A cross-sectional regression using 5,009 Compustat firm-level observations for 66 three-digit NAICS sectors finds a positive and statistically significant relationship between sectoral AI exposure in 2022 (from Eisfeldt et al. 2024) and the change in annual TFP growth between 2024 and 2022, with a sector-level slope coefficient that is statistically significant at the 10% level. The firm-level regression (including 3-digit NAICS fixed effects) yields a slope of 0.206 on AI exposure, significant at the 5% level (t-statistic 2.08), with R² = 0.20 and 1,996 observations. The relationship is absent when examining TFP growth levels in any individual year between 2019 and 2022, consistent with AI’s macroeconomic effects only becoming measurable after the release of GPT-4 in March 2023.

Q3. How does AI compare with prior general purpose technologies?

The paper relates AI to the earlier GPT literature, noting that productivity growth tended to be lower at the start of both the electrification and IT eras—with delays of approximately five and eight years respectively before productivity gains became measurable—and that if there is a similar delay for AI it appears shorter based on the preliminary 2024 data. This comparison suggests that AI may be a GPT with unusually rapid diffusion or a shorter learning curve, though the authors caution that the evidence is still preliminary and depends on the dating of AI’s “arrival.”

Q4. Why might AI effects differ across the hierarchy within firms?

AI’s effect on a firm hierarchy depends on whether it complements or substitutes for skills at each level: if AI primarily helps managers (by speeding problem diagnosis), it may reduce demand for specialized lower-level workers; if it primarily helps clerks (by enabling them to handle more complex documents), it may reduce demand for partners while raising demand for lower-level staff. The paper argues that the distributional consequences—whether AI raises or lowers wage dispersion—depend on this complementarity/substitutability pattern, which likely varies by industry, as illustrated by the contrasting cases of automotive assembly (AI may help managers but not line workers) and law firms (AI may help clerks handle more complex work).

Key concepts

AI as signal accuracy improvement : the paper’s framework for thinking about AI’s effect on decision quality: AI raises the precision of the signals that guide problem-solving, which leads to better individual and group decisions regardless of the specific mechanism.

general purpose technology (GPT) delay : the empirical phenomenon documented by Jovanovic and Rousseau (2005) in which productivity growth is lower at the start of a major GPT era before eventually accelerating; the paper examines whether AI exhibits the same pattern, finding that any delay appears shorter than for electrification (five years) or IT (eight years).

An endogenous gridpoint method for distributional dynamics

Mon, 01 Jan 0001 00:00:00 +0000

This paper introduces the Distributional Endogenous Gridpoint Method (DEGM), a novel numerical technique for solving the distributional dynamics that arise in heterogeneous agent macroeconomic models. The core problem is how to efficiently update the distribution of agents over the state space as the economy evolves. The dominant existing approach — the “lottery method” of Young (2010) — discretizes the state space and represents policy functions as lotteries over nearby gridpoints, producing a transition matrix that is linear in optimal policies. This linearity renders the lottery method incapable of capturing nonlinear effects in distributional dynamics, a limitation that becomes quantitatively significant for higher-order perturbation solutions.

DEGM extends Carroll’s (2006) endogenous gridpoint method from individual optimization to the distributional level. Rather than discretizing the density and integrating forward, DEGM works directly on the cumulative distribution function (CDF). The key insight is that when the policy function is monotone — as savings functions typically are — the endogenous gridpoints generated by the policy function trace out exact points on the post-policy CDF without requiring integration. Specifically, if A*_{i,j} = a*(A_i, Y_j) are optimal asset choices from grid point A_i at income Y_j, then the CDF values at those endogenous points are known analytically as F_t(A_i | Y_j). An interpolant using shape-preserving splines constructed through these points allows evaluation of the updated CDF at any point without integration. The income transition step is handled separately via standard quadrature over the discretized income process.

The paper demonstrates DEGM’s performance with two applications. First, in the Aiyagari (1994) economy, DEGM converges to the stationary equilibrium an order of magnitude faster than the lottery method in terms of gridpoints. At nk=40 gridpoints, the lottery method deviates from the benchmark capital stock by 1.72% and the wealth Gini by 2.24% (for nh=5), while DEGM deviates by only 0.09% and 0.12% respectively. Both methods converge to the same solution as the number of gridpoints increases, but DEGM reaches this limit far faster.

Second, the authors introduce a Krusell-Smith style model with aggregate investment risk (capital depreciation shocks calibrated following Barro, 2006, as a 0.4% quarterly probability of 7.5% capital destruction causing a 10% annual GDP drop) as a new baseline for studying aggregate nonlinearities with household heterogeneity. This model overcomes the near-linearity of aggregate capital dynamics in the original Krusell-Smith specification. Using a third-order perturbation solution with DEGM, aggregate investment risk lowers the capital stock by 5 to 11 basis points and increases wealth inequality by up to 11 basis points relative to the non-stochastic steady state, depending on idiosyncratic income risk calibration. The lottery method systematically mispredicts these effects: it always predicts a decrease in wealth inequality in the presence of investment risk, while DEGM predicts an increase. At third order, the lottery method predicts wealth Gini changes of +2.0 bp (persistent calibration) and -149.7 bp (transitory calibration), while DEGM predicts +10.7 bp and +2.1 bp respectively.

The mechanism for increased inequality under investment risk is heterogeneous: for less wealthy households the substitution effect dominates (they reduce saving more in response to risky returns), while for wealthy households the income effect is stronger and precautionary saving motives dominate. The lottery method, by making the distributional transition matrix linear in policies, zeros out the second derivative of the transition matrix with respect to the policy function, missing the term capturing how the density at the pre-image of each asset level is affected nonlinearly. DEGM’s cubic spline interpolant captures all nonlinearities up to third order, enabling economically meaningful results that qualitatively differ from lottery-method predictions on wealth inequality.

Q: What is the fundamental numerical problem that DEGM solves? A: Evolving the distribution of agents forward over time in heterogeneous agent models requires evaluating a Kolmogorov forward equation, which naively demands numerical integration. The lottery method avoids integration by discretizing the state space and expressing transitions as a linear matrix operation, but this forces the distributional dynamics to be linear in optimal policies. DEGM avoids integration by exploiting policy function monotonicity: the endogenous policy gridpoints are the interpolation nodes, so the CDF update requires only interpolation, not integration. This preserves nonlinear effects up to the order of the splines used.

Q: How does DEGM handle the borrowing constraint and the resulting mass point? A: Savings policy functions are typically weakly monotone: constant at the borrowing constraint for sufficiently poor households, then strictly monotone above a threshold. DEGM accommodates this by starting the endogenous grid at the EGM solution corresponding to the borrowing constraint (the threshold a_j above which the policy is strictly monotone), restoring strict monotonicity on the relevant domain. The mass point at the borrowing constraint is captured by evaluating F_t(a_j, Y_j). Echoes of the borrowing constraint diminish as the number of income states increases, and in practice 10 income gridpoints are sufficient to smooth them.

Q: How much faster does DEGM converge relative to the lottery method for the stationary equilibrium? A: In the Aiyagari economy with nk=40 asset gridpoints, the lottery method’s capital stock deviates from the benchmark by 1.72% and the wealth Gini by 2.24% (nh=5), while DEGM deviates by only 0.09% and 0.12% respectively — roughly a 20-fold improvement in accuracy for the same gridpoints. At nk=80, the lottery method still shows 0.56%/0.78% deviations while DEGM shows 0.03%/0.00%. Although for a fixed number of gridpoints the lottery method is faster in wall-clock time (0.35s vs 0.82s at nk=40, nh=20), DEGM is faster for a given level of accuracy because it requires far fewer gridpoints.

Q: Why does the lottery method fail at higher-order perturbations? A: The lottery method constructs its transition matrix as a piecewise linear function of the optimal policy a*, so its second derivative with respect to a* is zero. As a result, it misses the second term in the second-order derivative of the end-of-period CDF: the term involving the derivative of the density at the pre-image of each asset level times the squared linear policy effect. This missing nonlinearity becomes quantitatively important at second and third order. DEGM’s cubic hermitian spline interpolant captures all nonlinearities up to third order, allowing it to correctly represent how the distribution responds nonlinearly to aggregate shocks.

Q: What does the paper find about the effect of aggregate investment risk on the capital stock and wealth inequality? A: Using a third-order perturbation solution with DEGM, aggregate investment risk lowers the capital stock by 5 to 11 basis points from the non-stochastic steady state, depending on whether income risk is persistent or transitory (DEGM third-order: -4.7 bp persistent, -11.4 bp transitory). Wealth inequality increases by up to 11 basis points (DEGM third-order: +10.7 bp persistent, +2.1 bp transitory). The lottery method diverges dramatically at third order, predicting Gini changes of +2.0 bp and -149.7 bp for the persistent and transitory calibrations respectively, compared to DEGM’s +10.7 bp and +2.1 bp.

Q: What is the mechanism through which aggregate investment risk increases wealth inequality? A: The mechanism operates through heterogeneous saving responses across the wealth distribution. For less wealthy households, capital income is a small share of total income, so the substitution effect of risky returns dominates: higher investment risk reduces their incentive to save. For wealthy households, capital income is central, so the income effect is stronger and precautionary saving motives intensify. A capital depreciation shock upon realization compresses the wealth distribution, but the risk of such a shock increases inequality on average because it disproportionately reduces saving among poorer households.

Q: How do the authors extend DEGM to handle aggregate risk and higher-order perturbations? A: The authors follow Reiter (2009) in including the distribution and value functions in the state space, defining a nonlinear difference equation over these objects. Higher-order perturbation of this system proceeds using the algorithms of Andreasen et al. (2018) and Levintal (2017), with second-order terms solved via a generalized Sylvester equation using Kim et al.’s (2008) doubling algorithm. The implementation handles up to 3,200 variables at second order and 220 variables at third order. For the second-order solution, the Bayer-Luetticke (2020) state-space reduction and its refinement in Bayer et al. (2024) yield results identical to the full unreduced system.

Q: What is the state-space reduction procedure and how much does it compress the system? A: The full system uses 402 states and 412 controls (persistent calibration). A copula representation of the distribution reduces this to 213 states and 412 controls; adding DCT compression of the value function gives 213 states and 98 controls; further adding a factor representation from the first-order solution yields 111 states and 98 controls — a 75% reduction. The R-squared-like IRF statistic remains 1.00 across all reductions, and ergodic moments are identical (capital: 25.54, Gini: 0.61 for the persistent calibration).

Q: Does DEGM produce different first-order impulse responses than the lottery method? A: For first-order perturbations, DEGM and the lottery method converge to the same solution as the number of gridpoints increases, but DEGM converges faster. For the first-order dynamics of the wealth distribution (wealth Gini IRFs), DEGM reaches convergence with nk=40 gridpoints while the lottery method requires nk=160. For aggregate capital stock IRFs, both methods converge quickly at first order. Quantitative differences become significant only at second and higher orders.

Q: What calibration is used for the investment risk model? A: Capital depreciation deviates from its steady-state value by a shock with second moment sigma_delta = 0.005 and third moment tau_delta = 0.012. This corresponds to a 0.4% quarterly probability that a disaster destroys 7.5% of the capital stock and causes a 10% drop in annual GDP, consistent with the evidence in Barro (2006). The model is solved under both a persistent income calibration (beta=0.98, rho=0.98, sigma_epsilon=0.14, implied Gini=0.66) and a transitory income calibration (beta=0.99, rho=0.88, sigma_epsilon=0.18, implied Gini=0.42).

Distributional Endogenous Gridpoint Method (DEGM): A numerical method for evolving the joint CDF of agents over the state space by constructing an interpolant at endogenous gridpoints A*_{i,j} = a*(A_i, Y_j) — the optimal policy values — at which CDF values are known analytically as F_t(A_i | Y_j), thus updating the distribution through interpolation rather than integration and preserving nonlinearities up to the order of the spline.

Lottery Method (LM): Young’s (2010) standard technique that replaces the continuous distribution with a discrete counterpart and represents optimal policy functions as probability weights over nearby gridpoints, yielding a single transition matrix A* such that f_{t+1} = f_t * A*. The transition matrix is linear in optimal policies, which zeroes out the second derivative of the distributional dynamics with respect to policies and causes systematic misprediction of distributional dynamics under higher-order perturbation.

Kolmogorov Forward Equation (Distributional Dynamics): The law of motion for the joint CDF F_t(a, y) describing how the distribution of households over assets and income evolves given optimal policies and the income transition process. In DEGM, this equation is split into a sub-period for asset choices (where endogenous gridpoints allow integration-free updating) and a sub-period for income transitions (handled by quadrature over the discretized income process).

Higher-Order Perturbation Solution: A Taylor expansion of the model’s nonlinear equilibrium conditions around the non-stochastic steady state beyond first order. Second-order solutions capture precautionary motives and mean deviations from the steady state; third-order solutions additionally capture asymmetric effects of shocks, requiring DEGM’s nonlinear distributional representation to produce accurate results.

Aggregate Investment Risk (Capital Depreciation Shocks): Shocks to the aggregate capital depreciation rate calibrated following Barro (2006) as a 0.4% quarterly probability of a disaster that destroys 7.5% of the capital stock and causes a 10% annual GDP drop. Proposed as a replacement for near-linear Krusell-Smith aggregate productivity shocks to generate genuine nonlinearities in aggregate capital dynamics while remaining equally parsimonious.

State-Space Reduction: A sequence of compression techniques — copula representation of the wealth distribution, discrete cosine transform (DCT) compression of the value function, and factor representation from the first-order solution — that reduce the Reiter (2009) system from 402 states and 412 controls to 111 states and 98 controls (a 75% reduction) with no measurable loss of accuracy in impulse responses or ergodic moments.

Shape-Preserving Interpolation: Interpolation methods (linear spline or piecewise cubic hermitian splines) that maintain the monotonicity of the CDF when constructing the interpolant from endogenous gridpoints. Cubic hermitian splines additionally preserve differentiability, making the distributional dynamics smooth enough for third-order perturbation and capturing all nonlinear effects that the lottery method misses.

An Equilibrium Analysis of the Effects of Neighborhood-Based Interventions on Children

Mon, 01 Jan 0001 00:00:00 +0000

Overview

Research question. How should governments design neighborhood-based policies to improve long-run outcomes for children, once one accounts for general equilibrium (GE) forces—endogenous rents, neighborhood quality, wages, and distortionary taxation—that small-scale experimental studies cannot identify?

Model. The paper embeds neighborhood effects into a quantitative, heterogeneous-agent overlapping-generations (OLG) model with endogenous location choice and child skill development. The economy has three building blocks: (1) a dynastic life-cycle structure in which parents choose a neighborhood (from two options: a disadvantaged n=1 and an advantaged n=2) and allocate time to child development, with child skills produced by a nested CES aggregator combining parental time and neighborhood quality (proxied by per-capita income in the tract); (2) a GE Aiyagari incomplete-markets framework with endogenous labor supply, wage uncertainty, and progressive labor taxation; and (3) a government that finances housing vouchers or place-based wage subsidies by adjusting the labor income tax parameter, with all additional net expenses fully offset by tax revenue. Housing supply is upward-sloping (elasticity 1.75, from Saiz 2010), so rents are endogenous.

Data and calibration. The model is estimated by simulated method of moments to match U.S. data from the 2000s, drawing on the PSID, NLSY, ATUS, the 2012–2016 ACS, and the Opportunity Atlas (Chetty et al. 2018). Neighborhoods are mapped to Census tracts divided into bottom-10-percent and top-90-percent median household income groups within each commuting zone. Key targeted moments include the income gap between neighborhoods (108 percent higher mean individual income in n=2), the 30 percent higher incomes for children from low-income families raised in the better neighborhood, and a 32 percent gap in weekly parental time with children across neighborhoods.

Validation. Before policy counterfactuals, the calibrated model is validated against two bodies of reduced-form evidence. First, a simulated small-scale, single-generation, partial-equilibrium voucher experiment generates 23 percent higher income for children—close to the 31 percent MTO experimental estimate from Chetty et al. (2016), with the difference largely explained by a smaller poverty-rate contrast (18 vs. 22 percentage points) in the simulation. Second, a simulated 20 percent place-based wage subsidy generates 17–21 percent earnings gains for adult residents of n=1, consistent with Busso et al.’s (2013) quasi-experimental EZ estimates of 17–24 percent.

Main findings — housing vouchers. The welfare-maximizing voucher program features a 100 percent subsidy rate, targets households with children and wages below the 80th percentile (fourth quintile), and is financed by progressive labor taxes. In the long-run steady state this policy raises 12.5 percent more children in the advantaged neighborhood, increases labor productivity by 1.1 percent, reduces income inequality (variance of log after-tax lifetime earnings) by 6.3 percent—comparable in magnitude to the Sweden–U.S. after-tax inequality gap—and raises upward mobility by 27.7 percent (roughly half its standard deviation across U.S. Census tracts). The average marginal tax rate must increase by 15.7 percent to fund the program. Despite this, long-run welfare rises by 3.4 percent in consumption equivalence units. A decomposition shows that intergenerational dynamics add 11.5 percentage points to welfare (relative to a short-run, single-generation scenario), while taxation subtracts 10.2 percentage points, and rent plus neighborhood-quality effects together subtract only 1.4 percentage points—leaving the net long-run GE gain similar to the short-run partial-equilibrium gain of 3.5 percent. Crucially, non-targeting children generates welfare losses of 5.0 percent, confirming that restriction to households with children is essential.

Main findings — place-based wage subsidies. A 12 percent wage subsidy to workers in the disadvantaged neighborhood yields the highest steady-state welfare gain of 0.7 percent. This is approximately one-fifth of the gain achievable with the optimal voucher. The subsidy induces substantial resorting toward n=1, reducing the share of children in n=2 by 6.7 percent while raising neighborhood quality in n=1 by 19.7 percent. Income inequality falls by 8.7 percent and upward mobility rises by 20.4 percent. However, in a short-run partial-equilibrium setup, the wage subsidy has a negative welfare effect of −1.0 percent because it draws parents (and their children) into the disadvantaged area; the positive net effect only emerges through long-run intergenerational channels (+2.5 percentage points) and equilibrium neighborhood-quality adjustments.

Political economy. Because voucher gains are concentrated among young cohorts (those aged 16–43 at introduction), only 33 percent of incumbent adults would rationally vote for the housing voucher program. In contrast, the place-based wage subsidy provides positive average welfare gains for all age cohorts alive at introduction, yielding estimated majority support from over 63 percent of adults. This creates a fundamental political economy tradeoff: the policy with the larger long-run social gains lacks majority democratic support, while the policy with broader support delivers smaller long-run gains.

In depth

Q1. What are the two market frictions that justify government intervention in the model?

A1: The first friction is the absence of intergenerational borrowing markets: parents cannot borrow against their child’s future income, which limits the parent’s willingness to pay the higher rent in n=2 to give their child a developmental advantage. Housing vouchers act as a tax-financed substitute for this missing contract by paying the rent premium and recovering the cost through taxes on the high-earning adults the children become. The second friction is a neighborhood externality: individuals do not internalize the effect of their own income on the neighborhood quality experienced by neighbors’ children. Place-based wage subsidies partially correct this externality by subsidizing work in the disadvantaged area, raising local income per capita and thereby improving the neighborhood quality index for all children resident there.

Q2. How is neighborhood quality defined and modeled, and why is this specification chosen?

A2: Neighborhood quality sn is defined as total income per capita (the sum of labor and capital income) for all residents of neighborhood n, including non-workers. This specification is intended to capture multiple mechanisms: school quality (which depends on local tax bases), role-model effects from productive adults, and social organization effects through adult supervision of children. The formulation includes retired and non-working residents, which means the arrival of children mechanically reduces neighborhood quality per capita in the model, partially capturing a crowding channel. Formally, the neighborhood spillover function takes the power form f(sn) = A * sn^ζ, where ζ governs the elasticity of child development to neighborhood quality.

Q3. How does the paper validate the model’s key mechanism — the neighborhood effect on children?

A3: The validation mimics the MTO RCT within the calibrated model: the government provides a 100 percent rent voucher usable only in n=2 to households in n=1 with incomes below the 10th percentile, holding prices and neighborhood qualities fixed (as in a small-scale experiment). The model generates 25 percent voucher take-up and a 23 percent increase in children’s income in their late 20s. This compares to the experimental MTO estimate of approximately 31 percent. The paper attributes most of the gap to the smaller poverty-rate contrast in the simulation (18 percentage points) relative to MTO (22 percentage points), and shows that plotting the simulated result against the site-specific MTO estimates in a scatterplot of child income gains against neighborhood poverty reductions places the model prediction on the fitted line through the experimental data.

Q4. What is the quantitative role of long-run intergenerational dynamics in the voucher program, relative to other GE channels?

A4: The decomposition in Table 5 isolates four GE channels. Starting from a short-run partial-equilibrium welfare gain of 3.5 percent (for the children of a single treated generation), allowing the economy to operate for the long run while holding prices and taxes fixed raises welfare to 15.0 percent — an increase of 11.5 percentage points — because improved skills in one generation create higher-skilled, higher-income parents who invest more in the next generation. Introducing housing market price adjustments (rents rise by 3.9 percent in n=2) reduces welfare by only 0.6 percentage points. Allowing neighborhood quality to adjust (quality in n=2 falls by 4 percent as lower-income families move in) reduces welfare by an additional 0.8 percentage points. Adding full taxation to balance the government budget reduces welfare by 10.2 percentage points, from 13.6 to 3.4 percent. The four channels nearly cancel, leaving the long-run GE steady-state gain close to the short-run single-generation gain.

Q5. Why does the optimal voucher program require targeting to families with children, and what happens without this restriction?

A5: When the voucher is extended to all households regardless of children (Column 6 of Table 4), nearly 82.6 percent of the population receives a subsidy, pushing almost everyone to n=2. Rents in n=2 rise by 5.3 percent. To finance this much broader program, the average marginal tax rate must increase by 44 percent, far exceeding the 15.7 percent required for the children-targeted program. The large tax increase suppresses labor supply and income, which reduces neighborhood quality in n=2 by 11.6 percent. The net effect is a welfare loss of 5.0 percent. The intuition is that the benefit of the voucher program flows primarily through child skill development, so subsidizing adults without children is fiscally expensive without producing the intergenerational gains that justify the cost.

Q6. What drives the difference in long-run welfare gains between vouchers (3.4 percent) and place-based wage subsidies (0.7 percent)?

A6: The primary channel is labor productivity. The optimal voucher program raises labor productivity by 1.1 percent by increasing the average neighborhood quality to which children are exposed by 1.2 percent. The wage subsidy raises productivity by only 0.2 percent because it induces resorting toward the disadvantaged neighborhood, meaning children’s average neighborhood quality actually decreases by 0.2 percent despite large improvements in n=1’s quality (up 19.7 percent), since fewer children reside in n=1 after the subsidy draws their parents there. Inequality reduction is not the source of the gap: the wage subsidy actually reduces inequality more (8.7–8.9 percent) than the voucher (6.3 percent), but this inequality effect does not translate into larger aggregate welfare because productivity effects dominate.

Q7. How does the wage subsidy produce positive long-run welfare when it generates negative welfare in the short run?

A7: In the short run, the wage subsidy draws parents into the disadvantaged neighborhood to exploit higher wages, which reduces the share of children in the advantaged neighborhood n=2 and lowers children’s late-life productivity (welfare of −1.0 percent for treated children in the single-generation scenario). Two long-run channels flip the sign. First, the subsidy is permanent, so children themselves receive it as adults, providing a direct wage income benefit. Second, the sustained presence of higher-income workers in n=1 raises neighborhood quality there durably (by 19.7 percent at the steady state), which benefits the children who reside in n=1. Together these intergenerational effects add 2.5 percentage points to welfare, while taxation costs reduce it by only 1.4 percentage points, yielding a net gain of 0.7 percent.

Q8. What determines the political economy divide between the two policies?

A8: For the housing voucher, welfare gains are concentrated among younger incumbent adults (ages 16–43), particularly those who are about to have or already have children, while older adults tend to lose because they face higher taxes without benefiting from improved neighborhood quality for their (now independent) children. This concentration implies only 33 percent of incumbent adults would support the voucher under the model’s welfare metric. For the place-based wage subsidy, average welfare gains are positive for every age cohort alive at introduction (though larger for younger cohorts), because the wage subsidy raises incomes for workers in n=1 immediately and benefits from equilibrium rent declines in n=1 that allow all residents to benefit. Over 63 percent of adults would support the wage subsidy. The paper notes that if the government could borrow to initially finance the voucher program and pay for it later (as in Daruich 2020 for early childhood programs), majority support for the voucher could potentially be achieved.

Q9. How sensitive are the welfare results to the key calibrated parameters?

A9: The sensitivity analysis (Table 9, following Andrews et al. 2017) shows that individual parameters would need to change substantially to overturn the conclusion that vouchers generate larger steady-state welfare gains than wage subsidies. For example, the altruism parameter β̃ would need to increase by 22 percent to eliminate the voucher welfare gain, which would require average parental transfers to rise to 198 percent of income — far from the empirical target of 125.4 percent. Using the more conservative tract-level housing supply elasticity from Baum-Snow and Han (2021) of 0.3–0.4 (about 80 percent below the baseline Saiz 2010 estimate of 1.75) would reduce the voucher welfare gain from 3.37 to approximately 2.57 percent, not reversing the qualitative conclusion. The parameters with the largest influence on welfare gains are the labor disutility parameter µ and the altruism parameter β̃; the housing supply elasticity matters more for the voucher than the wage subsidy because easier housing supply accommodates growth in n=2 without displacement under the voucher.

Q10. What does the transition path of the voucher program look like, and why do welfare gains initially dip before recovering?

A10: When the voucher is unexpectedly introduced, the first newborn cohort gains approximately 4 percent welfare, but gains for subsequent cohorts initially dip to around 3 percent before stabilizing at 3.4 percent by the 20th post-introduction cohort. The dip occurs because moving costs slow resorting: immediately after introduction, rents in n=2 begin rising and neighborhood quality there begins falling as low-income families move in, but the capital stock adjustment (which would counteract these effects by raising GDP) lags the resorting. The rebound comes as capital accumulates in n=2 over time and as intergenerational productivity gains build through successive cohorts of better-skilled parents. Labor productivity jumps noticeably for the first cohort born to parents who received the voucher (approximately 28 years after introduction) and again for the first cohort born to grandparents who received it, visibly demonstrating the intergenerational mechanism. In contrast, the wage subsidy’s welfare gains are approximately constant at 0.7 percent across all cohorts because the key channels (neighborhood quality improvement in n=1 and wage gains) materialize rapidly and remain stable throughout the transition.

Key Concepts

Neighborhood quality (sn): In this paper, neighborhood quality is not school quality or amenities in a generic sense but is explicitly defined as total income per capita — the sum of labor income and capital income — for all residents of neighborhood n, including non-workers. This endogenous measure rises when higher-income or more productive residents move in and falls when lower-income residents or additional children arrive.

Intergenerational borrowing constraint: The inability of parents to borrow against their child’s future income, modeled as a non-negativity constraint on the monetary transfer from parent to child (transfer ≥ 0). This is the paper’s first key market friction: without it, a poor parent who moved to a better neighborhood would smooth consumption across generations by having the high-earning child compensate the parent. The constraint prevents this, reducing parental investment below the socially efficient level.

Consumption equivalence (veil of ignorance): The welfare metric used throughout the policy analysis. It is defined as the percentage change in consumption that would make a newborn individual indifferent between the pre-policy and post-policy steady states, computed before knowing their position in the skill or income distribution. This is the paper’s measure of long-run steady-state welfare.

Parental investment aggregator (CES): A nested constant-elasticity-of-substitution function that determines how parental time τ and neighborhood quality sn combine to form the effective investment input I into child skill development: I = Ā[αI f(sn)^γ + (1 − αI)τ^γ]^(1/γ). The elasticity parameter 1/(1 − γ), estimated at 0.41, governs the degree of complementarity between time and neighborhood quality; a lower elasticity (γ = −1.43) implies the two inputs are complements, so parents with children in better neighborhoods also spend more time with them.

Place-based wage subsidy: A neighborhood-specific wage premium (denoted w̃s) paid to all workers who both live and work in the disadvantaged neighborhood n=1, raising their effective wage to w1 = (1 + w̃s)w2. This policy targets the neighborhood externality by increasing the income of residents in n=1, which raises neighborhood quality and provides an incentive for higher-skilled workers to relocate to (or remain in) the disadvantaged area.

Upward mobility: Measured in this paper as the probability that a child born to parents in the bottom 20 percent of the income distribution reaches the top 20 percent of the income distribution during the working stage of their own life. This is distinct from mean income rank measures; it specifically tracks cross-quintile transitions in the model’s stationary distribution.

Equilibrium decomposition: A simulation-based method in which GE channels are progressively activated. Starting from a short-run, partial-equilibrium, single-generation baseline (analogous to an RCT), the authors sequentially allow: (i) long-run intergenerational dynamics while holding prices fixed; (ii) housing market price adjustments; (iii) neighborhood quality adjustments; (iv) tax and production-price adjustments. Each step’s change in outcomes identifies the quantitative contribution of that specific channel.

Anatomy of the Phillips Curve: Micro Evidence and Macro Implications

Mon, 01 Jan 0001 00:00:00 +0000

Layer 1 — Overview

Research Question

This paper addresses a fundamental puzzle in macroeconomics: why do estimates of the New Keynesian Phillips curve (NKPC) slope differ sharply depending on whether real marginal cost or the output gap is used as the real activity variable? The conventional, output gap-based NKPC yields very flat slope estimates (e.g., 0.006 to 0.024 in Hazell et al. 2022 and Rotemberg and Woodford 1997), which has led to the widespread view that the Phillips curve is “flat,” at least during the pre-pandemic period. The authors argue that this view conflates two distinct structural relationships: the elasticity of inflation with respect to real marginal cost, and the elasticity of marginal cost with respect to the output gap.

Data and Methodology

The authors assemble a unique quarterly micro-level dataset covering 4,598 manufacturing firms in Belgium over 84 quarters (1999:Q1–2019:Q4), totaling 132,915 observations. The dataset combines product-level domestic prices and quantities from the PRODCOM administrative database, customs data on foreign competitors’ prices, and firms’ variable production costs (labor costs from social security declarations plus intermediate input costs from VAT declarations). Intermediate inputs account for approximately 75 percent of total variable costs on average and are the most volatile cost component (within-firm coefficient of variation 1.77, versus 0.77 for labor costs).

Their estimation strategy follows a “bottom-up” approach. Starting from a theoretical framework with heterogeneous firms subject to Calvo (1983) nominal rigidities and strategic complementarities in price setting (imperfect competition including dynamic oligopoly and Kimball demand), they derive a forward-looking dynamic pass-through regression linking a firm’s current price to discounted present values of its own marginal costs and competitors’ prices, plus a lagged price level that serves as an error-correction term. This is Model A; robustness variants include Model B (absorbing competitor prices via industry-by-time fixed effects), Model C (imposing an AR(1) process for marginal cost), and Model A-U (unrestricted lagged-price coefficient).

The structural parameters governing the NKPC slope — the degree of nominal rigidity (θ) and the strength of strategic complementarities (Ω) — are estimated jointly via GMM. Instruments for marginal cost are four-quarter-lagged firm-level total factor productivity (TFPQ), and instruments for competitors’ prices exploit variation in EU-area export prices to third-country destinations and bilateral exchange rates between non-EU competitor currencies and the Euro. Sector-by-time fixed effects and firm fixed effects absorb confounding trends, shifting trend inflation, and permanent markups.

Main Findings with Quantitative Magnitudes

The baseline estimate (Model A) yields θ = 0.711 (SE 0.014), implying that prices remain fixed for approximately three to four quarters on average, consistent with Nakamura and Steinsson (2008) Belgian PPI data (0.72). The strategic complementarity parameter is Ω = 0.570 (SE 0.059), indicating that competitor price dynamics reduce the pass-through of own marginal cost shocks by approximately half relative to the no-complementarities benchmark.

These structural estimates imply a slope of the marginal cost-based NKPC of λ = 0.052 (SE 0.007), tightly estimated and robust across specifications: λ = 0.077 in Model B, λ = 0.069 in Model C, and λ = 0.056 in the unrestricted Model A-U. This slope is two to ten times larger than existing estimates of the conventional output gap-based NKPC slope (κ ≈ 0.024, Rotemberg and Woodford 1997; κ ≈ 0.006, Hazell et al. 2022).

Reconciling the High Cost-Based Slope with the Flat Output-Based Slope

The paper shows that the output-based slope κ equals the product of the cost-based slope λ and the output elasticity of marginal cost σ_y: κ = λ · σ_y. Using Bartik-style instruments based on high-frequency ECB monetary policy surprises interacted with industry-level sensitivities, the authors estimate σ_y using two models. Model D yields σ_y = 0.406 and κ = 0.021; Model E (directly regressing changes in marginal cost on changes in output) yields σ_y = 0.112 and κ = 0.006. These estimates are consistent with, and overlap with, Rotemberg and Woodford (1997) and Hazell et al. (2022) during the pre-pandemic sample period. The low elasticity of marginal cost to output is attributed to near-constant short-run returns to scale at the firm level and wage rigidity that mutes general equilibrium effects.

Aggregate Inflation Dynamics

Feeding an aggregate marginal cost index (constructed as a Törnqvist-weighted average of firm-level marginal costs) into the model-implied inflation expression produces a series that tracks Belgian manufacturing PPI inflation well: marginal cost fluctuations alone account for approximately 70 percent of inflation variation (R² = 0.68, correlation 0.8), without appealing to unobservable cost-push shocks or inflation lags.

Model Validation via Supply Shocks

A validation exercise using identified oil shocks (Känzig 2021 — measured as unexpected OPEC-day movements in oil futures prices) confirms the model. A one-standard-deviation shock to oil prices (a 15.7 percent increase in Brent crude) raises firms’ real marginal costs by approximately 1.5 to 3 percent within the first three quarters, before reverting. The price response peaks at approximately 3 percent after six quarters, consistent with nominal rigidities generating a delayed but persistent response. Impulse-response matching yields λ_IRF = 0.042 (SE 0.005), within the confidence bands of the micro-level estimate λ = 0.052, validating the bottom-up approach.

Scope Conditions

All estimates are drawn from Belgian manufacturing firms over 1999–2019, a period of moderate inflation during which Calvo pricing provides a good approximation of firm behavior. The authors note that the elasticity of marginal cost to output may be time-varying and nonlinear, and that during large aggregate shocks (such as the post-pandemic inflation surge), both the frequency of price adjustment and the sensitivity of marginal cost to output can rise substantially, requiring state-dependent pricing models (addressed in a companion paper, Gagliardone et al. 2025).

In depth

Q1. What is the primitive formulation of the NKPC, and how does it differ from the conventional formulation?

A1: The primitive NKPC features real marginal cost (in log-deviation from its steady state) as the real activity variable: π_t = λ·mc_t + β·E_t{π_{t+1}} + u_t, where λ is the slope depending on nominal rigidities and strategic complementarities. The conventional formulation uses the output gap (or unemployment gap) as a proxy for marginal cost, which is valid only under specific conditions including perfectly flexible wages. When those conditions fail, the output gap is a poor proxy for marginal cost, typically leading to downward bias in slope estimates. Even when a proportionality holds, the output-based slope κ equals λ multiplied by σ_y (the output elasticity of marginal cost), so the two slopes carry different economic content.

Q2. What structural parameters govern the slope of the cost-based NKPC, and what is the formula?

A2: The slope is λ = (1−θ)(1−βθ)/θ, where θ is the Calvo probability of price non-adjustment (capturing nominal rigidity) and Ω = Γ/(1+Γ) is the strategic complementarities parameter derived from the markup elasticity Γ with respect to relative prices. High nominal rigidity (high θ) flattens the slope by making individual price adjustments less frequent; strong strategic complementarities (high Ω) flatten it further because firms mute their price response to marginal cost in order to avoid deviating from competitors. The discount factor β is calibrated at 0.99 for quarterly data.

Q3. How does the dynamic pass-through regression differ from the static (long-run) pass-through regressions used in prior literature?

A3: The dynamic pass-through regression (Model A) includes the firm’s lagged price as a regressor, which functions as an error-correction term controlling for persistent deviations between the price and the optimal reset price. Failing to include this term with quarterly data leads to omitted variable bias of magnitude −θ·Var(Δp_ft), since the cointegration error is autocorrelated with coefficient θ. Static pass-through regressions (as in Amiti, Itskhoki and Konings 2019 using annual data) are appropriate only when nominal rigidities can be ignored (θ ≈ 0); with quarterly data and θ ≈ 0.711, the orthogonality condition of the static model fails and the dynamic framework is necessary.

Q4. What are the baseline estimates of the structural parameters, and how robust are they?

A4: The baseline Model A yields θ = 0.711 (SE 0.014) and Ω = 0.570 (SE 0.059), implying prices fixed for approximately three to four quarters and competitor-price influence roughly equal to own marginal cost influence. The implied NKPC slope is λ = 0.052 (SE 0.007). Robustness checks across six specifications (Models B, C, A-U, variable SR-RTS controls, Translog TFPQ, eight-quarter-lagged instrument) yield λ in the range 0.044 to 0.077, with all estimates statistically significant and within each other’s confidence bands. The unrestricted model (A-U) cannot reject the restriction Ϛ = θ on the lagged-price coefficient (p-value 0.90).

Q5. What is the short-run elasticity of a firm’s own price to a permanent marginal cost shock, and how do nominal rigidities and strategic complementarities each contribute?

A5: The short-run pass-through elasticity is (1−Ω)(1−θ) ≈ (1−0.570)(1−0.711) ≈ 0.125. This is substantially below one because both forces dampen price adjustment: nominal rigidity (1−θ ≈ 0.289) means most firms cannot adjust in any given quarter, and strategic complementarities (1−Ω ≈ 0.430) mean that adjusting firms reduce their pass-through to avoid deviating from competitors’ prices. Without strategic complementarities (Ω = 0), the elasticity would be roughly 0.289; without nominal rigidities (θ = 0), it would be roughly 0.430; both together produce the observed 0.125.

Q6. How is marginal cost measured in the data, and why is the inclusion of intermediate input costs important?

A6: Marginal cost is proxied by average variable cost per unit of output: the log-nominal marginal cost equals ln(TVC_ft/Y_ft) + ln(1+ν_ft), where TVC is the sum of intermediate input costs (from VAT declarations) and labor costs (wage bill from social security declarations), and Y_ft is a quantity index. Intermediate inputs account for approximately 75 percent of total variable costs on average and are the most volatile component (within-firm coefficient of variation 1.77 vs 0.77 for labor). The authors note that DSGE models typically feature only labor as a variable input, but accounting for intermediates is pivotal because intermediate goods price shocks were among the most important drivers of the post-pandemic inflation surge.

Q7. What instruments are used for marginal cost and competitors’ prices, and what are the identifying assumptions?

A7: The instrument for marginal cost is the four-quarter lagged firm-level TFPQ (physical total factor productivity), estimated as the residual from a gross-output production function. Its relevance depends on TFP persistence (confirmed); the exclusion restriction requires that persistent TFP variation is orthogonal to current and future demand shocks after removing permanent demand components (via firm fixed effects) and industry trends (via sector-by-time fixed effects). Two instruments for competitors’ prices exploit international trade variation: (i) sales-weighted average export prices of EU-area competitors to non-Belgium, non-EU destinations (orthogonal to Belgian demand shocks by construction), and (ii) bilateral exchange rate movements between non-EU competitor currencies and the Euro. All instruments pass the Cragg-Donald and Kleibergen-Paap F-statistics (strongly rejecting weak instruments) and Hansen-Sargan over-identification tests (failing to reject validity).

Q8. What evidence supports the validity of the TFPQ instrument against capacity utilization concerns?

A8: The authors run two empirical tests. First, regressing marginal cost on four-quarter-lagged capacity utilization yields a small, statistically insignificant elasticity (0.011, SE 0.052), suggesting the TFPQ instrument’s predictive power does not reflect capacity utilization variation. Second, re-estimating with “purified” TFPQ instruments adjusted for capital utilization (Column 4) and for both capital and labor utilization (Column 5) produces parameter estimates and NKPC slopes essentially unchanged from baseline. Additionally, regression residuals show only weak and short-lived autocorrelation (−0.09 at one-quarter lag, p=0.09; −0.01 at two-quarter lag, p=0.69), indicating demand shocks are highly transitory after conditioning on fixed effects.

Q9. How does the model track aggregate Belgian manufacturing PPI inflation, and what does this imply for cost-push shocks?

A9: Using the reduced-form expression π_t = λ̃(mc_t^n − p_{t-1}) + α + θu_t, where the reduced-form slope λ̃ = 0.22 is evaluated at baseline structural estimates, the model produces a model-implied inflation series that accounts for approximately 70 percent of variation in manufacturing PPI inflation (R² = 0.68, correlation 0.8), without including inflation lags or cost-push shocks. The model captures the inflation drop during the 2008 financial crisis, the run-up in 2016, and the subsequent decline. This contrasts with the quantitative DSGE literature in which cost-push shocks (variation in desired price and wage markups) account for approximately 70 percent of inflation volatility (e.g., Primiceri, Schaumburg and Tambalotti 2006).

Q10. How do the authors estimate the output elasticity of marginal cost σ_y, and what do they find?

A10: They use two approaches. Model D is a pricing equation directly relating firm-level prices and nominal output (value added), estimated via GMM, instrumented with Bartik-style shifters based on high-frequency ECB monetary policy surprises (Altavilla et al. 2019) interacted with industry-level sensitivities. Model E directly regresses changes in nominal marginal cost on changes in nominal output, also instrumented. Model D yields σ_y = 0.406 (SE 0.099) and implied κ = 0.021 (SE 0.005); Model E yields σ_y = 0.112 (SE 0.026) and κ = 0.006 (SE 0.001). The low σ_y is consistent with near-constant short-run returns to scale at the firm level and wage rigidity muting general equilibrium labor-market feedback, at least during the moderate-inflation pre-pandemic period.

Q11. How does the oil shock validation exercise confirm the cost-based NKPC slope estimate?

A11: Following Känzig (2021), the authors identify oil shocks as unexpected movements in Brent crude oil futures around OPEC meeting days, normalizing to a one-standard-deviation shock (15.7 percent Brent increase). Local linear projection IRFs show that firms’ real marginal costs rise 1.5 to 3 percent within three quarters and then revert, while prices peak at approximately 3 percent increase after six quarters (consistent with nominal rigidity delaying the price response). Impulse-response matching — minimizing the weighted distance between empirical and model-implied price IRFs — yields λ_IRF = 0.042 (SE 0.005), which is close to and within the confidence bands of the micro-level estimate λ = 0.052, validating the bottom-up estimation approach.

Q12. What do the estimates imply about why the conventional NKPC appears flat in normal times?

A12: The flat conventional NKPC slope (κ ≈ 0.006–0.024) does not reflect limited transmission of marginal cost fluctuations to inflation — that transmission is high (λ ≈ 0.052–0.077). Rather, flatness reflects a weak link between the output gap and marginal cost during the pre-pandemic period (σ_y ≈ 0.112–0.406), attributable to near-constant short-run returns to scale in production and wage rigidity. This decomposition matters for policy: supply shocks that directly raise marginal cost will pass through strongly to inflation even when output does not move much, whereas demand shocks that operate through the output-cost channel face attenuated transmission.

Q13. Under what conditions does the cost-based Phillips curve decompose cleanly into a product of the two elasticities?

A13: The decomposition κ = λ · σ_y requires assuming that real wages are flexible and determined in general equilibrium at the industry level, with real wages increasing in industry output with elasticity σ_w; that the natural level of output is defined as the equilibrium under flexible prices and constant desired markups; and that the firm’s marginal product of labor depends on productivity and output with a common short-run returns-to-scale parameter ν (homogeneous across firms and time-invariant). Under these assumptions (which parallel those used to derive the conventional NKPC in the standard NK model), the output elasticity of marginal cost is σ_y = σ_w + ν, and the theoretical restriction κ = λ · σ_y holds exactly.

Q14. How do macroeconomic complementarities from aggregate decreasing returns to scale affect the NKPC slope?

A14: If aggregate SR-RTS fall below unity, the NKPC slope formula gains an additional term Θ = 1/(1+γν(1−Ω)) < 1, where ν is inversely related to average SR-RTS and γ is the within-industry elasticity of substitution. However, empirical estimates of sectoral SR-RTS range from 0.93 to 0.98, with an aggregate estimate of approximately 0.965 (implying ν ≈ 0.036). Given this and calibrating γ = 4, Θ ≈ 0.941, so macroeconomic complementarities would reduce the NKPC slope by only about 6 percent — well within the confidence bounds of the baseline estimates. The authors conclude that the constant-returns assumption in their main framework is a good approximation.

Key Concepts

Primitive (cost-based) NKPC slope (λ): The coefficient linking inflation to real marginal cost in the underlying New Keynesian pricing equation, defined as λ = (1−θ)(1−βθ)/θ. It captures how strongly firms’ aggregate price setting responds to movements in real marginal cost per unit of output, holding the discount factor, nominal rigidity, and strategic complementarities fixed. Estimated at 0.052 (tightly, range 0.044–0.077 across specifications) for Belgian manufacturing.

Calvo probability of price non-adjustment (θ): The parameter from Calvo (1983) staggered price setting capturing the share of firms that cannot change their price in a given period, equal to one minus the per-period probability of price adjustment. In this paper, θ is estimated directly from the dynamic pass-through regression coefficient on lagged prices, yielding θ ≈ 0.711, implying prices fixed approximately three to four quarters on average.

Strategic complementarities parameter (Ω): Defined as Ω = Γ/(1+Γ), where Γ is the elasticity of a firm’s desired markup with respect to its own relative price. Captures the extent to which a firm weights competitors’ prices (rather than its own marginal cost) when resetting its price. High Ω means firms strongly mute price responses to own cost changes to avoid relative price deviations from competitors. Estimated at Ω ≈ 0.570, implying competitor prices and own marginal cost enter the reset price with roughly equal weight.

Dynamic pass-through regression: A forward-looking pricing equation (Model A) relating observed firm prices to the discounted present values of own marginal costs and competitors’ prices, plus lagged own price as an error-correction term. The structural parameters θ and Ω are identified jointly from the regression coefficients, using GMM with instruments for the present values. The dynamic specification is necessary at quarterly frequency because the error-correction term (omitted in static pass-through models) is non-negligible when θ > 0.

Output elasticity of marginal cost (σ_y): The elasticity of firm-level real marginal cost with respect to the firm-level output gap, defined under the assumptions that real wages are flexible and industry-level, equal to σ_y = σ_w + ν (wage elasticity with respect to industry output plus the short-run returns-to-scale parameter). This parameter bridges the cost-based and output-based Phillips curve slopes via κ = λ · σ_y. Estimated from micro data using monetary policy shock instruments at σ_y ≈ 0.112–0.406 in the pre-pandemic period.

Short-run returns to scale (SR-RTS): The extent to which a firm’s marginal cost rises with output scale in the short run, parameterized by ν in the cost function MC^n_ft = C_{it} · A_{ft} · Y_ft^ν. If ν = 0, marginal cost is independent of output scale (constant returns), which the authors assume in their baseline. Firm- and sector-level estimates from Translog production functions yield SR-RTS ≈ 0.93–0.98 across sectors (aggregate ≈ 0.965), broadly consistent with the constant-returns assumption and implying modest macroeconomic complementarities.

Reduced-form aggregate pass-through slope (λ̃): A composite parameter capturing the contemporaneous pass-through of aggregate real marginal cost (defined as nominal marginal cost relative to the lagged price level) into quarterly inflation under the assumption that nominal marginal cost follows a random walk. Evaluated at θ ≈ 0.70 and Ω ≈ 0.52 (median across models), λ̃ = 0.22. This is distinct from the structural NKPC slope λ because it also captures the persistence of cost shocks.

Are Inflationary Shocks Regressive? A Feasible Set Approach

Mon, 01 Jan 0001 00:00:00 +0000

Layer 1 — Overview

Research Question. The paper asks whether inflationary shocks are regressive, and demonstrates that the answer depends critically on the source of the shock. A single aggregate inflation statistic conceals radically different distributional consequences depending on whether inflation is driven by an oil supply contraction or by expansionary monetary policy.

Framework. The authors develop a “feasible set approach” grounded in the envelope theorem. They show that the first-order money-metric welfare effect of any macroeconomic shock on a household is summarized by the present discounted value of changes to five components of the household’s budget constraint: (1) consumption prices, (2) wage income, (3) asset dividends, (4) asset prices, and (5) government transfers. Because the envelope theorem implies that endogenous substitution responses are not welfare-relevant to a first order, no assumption about the utility function’s form or the economy’s general equilibrium structure is required. The framework is valid for generic stationary shocks that do not directly shift household preferences.

Empirical Strategy. The welfare formula requires two inputs: (i) impulse response functions (IRFs) for all prices, dividends, wages, and unemployment, estimated using internal-instrument SVAR methods applied to two identified shocks — the Kanzig (2021) oil supply news shock (instrumented by oil futures surprises around OPEC announcements) and the Gertler-Karadi (2015) monetary policy shock (instrumented by fed funds futures surprises in 30-minute windows around FOMC announcements) — and (ii) cross-sectional data on consumption bundles, labor income, and asset portfolios from the CEX, CPS, SCF, and SIPP for three education groups (high school or less, some college, college-educated) across the full lifecycle. The baseline cross-section uses 2019 data. Shocks are normalized to produce comparable aggregate inflation responses: a 10% WTI oil price increase and a 25 basis point decline in the one-year Treasury yield each generate roughly 15–16 basis points of CPI-U inflation on impact, rising to approximately 34–35 basis points after two quarters.

Main Findings. Oil supply contractions are regressive and monetary expansions are progressive, and this divergence is primarily driven by the asset price channel, not the consumption price or labor income channels.

For the 10% oil supply shock: middle-aged households with high school education or less must be paid approximately $870 (around 2% of annual consumption) to be made whole relative to their pre-shock utility; college-educated middle-aged households, by contrast, gain the equivalent of approximately $833 (1.1% of annual consumption). Younger college-educated households (still net equity accumulators) gain around $572.

For the 25 basis point monetary rate cut: low-education households approximately break even (net welfare effect near $23), while middle-aged college-educated households must be paid approximately $4,051 (around 5.5% of annual consumption) to restore their pre-shock utility. Older college-educated households must be paid approximately $851.

Why asset prices dominate. Oil supply contractions reduce equity prices (S&P500 falls approximately 2% one year post-shock) and depress dividends (approximately 82 basis points), while leaving house prices and bond prices largely unaffected. Because middle-aged college-educated households are the primary accumulators of equities, they benefit from the price decline (cheaper future accumulation), making oil shocks progressive through this channel — but regressive overall once the consumption and labor income channels (both mildly regressive) are included. Monetary expansions do the opposite: equity prices rise approximately 3 percentage points on impact, house prices rise approximately 1.5% after three years, and dividends increase. These asset price increases hurt those in the accumulation phase — disproportionately middle-aged college-educated households — creating a progressive distributional pattern.

Consumption and labor income channels. Both shocks generate disproportionate inflation in motor fuel and fuel and utilities, and low-education households spend a larger share of their budget on these goods, making the consumption channel mildly regressive for both shocks. The labor income channel differs sharply: oil shocks raise unemployment (approximately 0.15 log points for low-education households two years post-shock) and reduce weekly earnings by 0.2–0.6 log points, mildly harming low-education workers; monetary expansions reduce unemployment (approximately 0.83 log points for low-education workers one year post-shock) and similarly benefit low-education households through the labor market, pushing toward progressivity.

Scope conditions. Results apply to short-run first-order welfare effects of identified stationary macroeconomic shocks (four-year horizon). The framework does not incorporate uncertainty shocks, preference shocks, or the role of hedging motives in portfolio choice. Results concern policy shocks rather than policy rules.

Robustness. Qualitative conclusions hold across six alternative specifications: incorporating borrowing constraints (with or without empirical death rates), adjusting for unemployment insurance replacement rates (approximately 6% true average replacement rate), allowing for log-linear trends in no-shock choices, and dropping aggregate CPI controls from IRF estimation.

In depth

Q1. What is the “feasible set approach” and how does it differ from prior work on inflation incidence?

A: The feasible set approach measures welfare effects through changes in the household’s entire budget constraint — consumption prices, wage income, asset dividends, asset prices, and government transfers — rather than focusing on any single channel. Prior work either examined the Fisher channel (net nominal positions), or consumption price heterogeneity, or labor income responses in isolation. The key insight is that the envelope theorem implies substitution responses are not welfare-relevant to a first order, so the money-metric welfare change is simply the discounted sum of changes in the five budget constraint components evaluated at pre-shock choices, without requiring knowledge of the utility function’s form or the economy’s general equilibrium structure.

Q2. Why is the asset price channel — rather than consumption prices — the dominant channel in both shocks?

A: Asset holdings are large relative to annual consumption (net worth averages $1.5 million for college-educated and $260,000 for high-school-educated households in 2019), so even modest percentage movements in asset prices generate large dollar welfare effects. By contrast, the budget shares on the goods most responsive to both shocks (motor fuel, fuel and utilities) are relatively modest, so the consumption channel, while mildly regressive, is quantitatively small relative to the portfolio channel. The portfolio channel accounts for roughly 0.5% of consumption gains for middle-aged college-educated households under the oil shock, while the consumption channel produces losses of only about 0.1% for college-educated and 0.25% for low-education households.

Q3. How does the direction of the equity price response differ between oil and monetary shocks, and why does this create opposite distributional effects?

A: An oil supply contraction reduces equity prices (approximately 2% decline one year post-shock) and dividends (approximately 82 basis points decline), while a monetary expansion raises equity prices (approximately 3 percentage points on impact, approximately 4% higher after four quarters) and increases dividends. The welfare effect of asset price changes falls on those who trade the asset, not those who merely hold it at a constant level: middle-aged college-educated households are the primary net accumulators of equity, so falling prices benefit them (they can buy more cheaply) while rising prices hurt them. This is the principal reason oil shocks appear progressive through the portfolio channel — but regressive overall — while monetary expansions are regressive through the portfolio channel and progressive overall.

Q4. What are the precise welfare numbers for oil supply shocks by education group (baseline, ages 22–65)?

A: From Table 3 (baseline row, lifecycle-weighted averages for ages 25–65): households with high school or less experience a welfare loss of approximately $798; those with some college experience a loss of approximately $816; and college-educated households experience a welfare gain of approximately $494. These numbers reflect the sum of the consumption, labor income, portfolio, and transfer channels over a 16-quarter horizon, discounted at the one-year Treasury yield.

Q5. What are the precise welfare numbers for monetary policy shocks by education group (baseline, ages 25–65)?

A: From Table 3 (baseline row): households with high school or less experience a small welfare gain of approximately $23; those with some college experience a welfare loss of approximately $1,278; and college-educated households experience a welfare loss of approximately $3,055. These losses for college-educated households are driven overwhelmingly by rising equity and house prices that raise the cost of planned asset accumulation.

Q6. How does the life cycle interact with the distributional incidence of both shocks?

A: There is substantial heterogeneity within education groups across the life cycle because asset accumulation and decumulation patterns are age-dependent. Under oil shocks, younger college-educated households (who are net equity accumulators) gain approximately $572, middle-aged college-educated households gain approximately $833, while older college-educated households lose approximately $69 (because they hold large equity positions and lose dividend income). Under monetary shocks, middle-aged college-educated households lose the most (approximately $4,051) because they are simultaneously accumulating equities and housing, both of which become more expensive. Older college-educated households lose less (approximately $851) because rising dividends on existing holdings partially offset the asset price cost. Low-education households are approximately flat across the life cycle under monetary shocks.

Q7. How does the consumption channel compare across education groups and across the two shocks?

A: The consumption channel is mildly regressive for both shocks, but of similar absolute magnitude across the two shocks because both generate similar inflation in motor fuel and fuel and utilities — the goods with the largest price response. Low-education households spend a larger share on motor fuel and fuel and utilities; as a result, they lose approximately 0.25% of consumption from the consumption channel under the oil shock, compared with less than 0.1% for college-educated households. For monetary shocks, the consumption channel affects all household types roughly equally in proportional terms.

Q8. How does the labor income channel differ between oil and monetary shocks across education groups?

A: Oil shocks raise unemployment disproportionately for low-education workers (approximately 0.15 log point increase after two years, roughly 0.68 standard deviations, compared with near-zero response for college-educated workers) and reduce weekly earnings by 0.2–0.6 log points across groups. Monetary expansions reverse this: a 25 basis point rate cut reduces log unemployment by approximately 0.83 log points for low-education workers and approximately 1.96 log points for college-educated workers after one year, with limited response in conditional wages. Thus the labor income channel pushes toward regressive incidence for oil shocks and toward progressive incidence for monetary expansions, though in both cases it is quantitatively smaller than the portfolio channel.

Q9. What is the role of housing in the portfolio channel?

A: Housing behaves simultaneously as a durable consumption good and a financial asset. A house price increase raises welfare for households planning to decumulate (sell) housing (primarily older households) through the portfolio channel, but also raises the implicit rental cost for those who use housing — a negative consumption-side effect. Monetary expansions raise house prices by approximately 1.5% after three years. College-educated households accumulate housing at a faster rate and earlier in the life cycle than low-education households, making them more exposed to the cost of rising house prices during the accumulation phase. This amplifies the progressive pattern of monetary shocks through the portfolio channel.

Q10. How does the paper handle the dual role of durable goods (vehicles and housing)?

A: Durable goods are treated as both a consumption good and a financial asset. The utility-relevant consumption price of a durable is proportional to the price times the depreciation rate per unit of use, capturing the “implicit rent” of ownership. On the asset side, the durable enters the portfolio channel like a zero-dividend financial asset. This allows the framework to correctly attribute, for example, that a rise in house prices hurts net accumulators (through the portfolio channel) while also raising the implicit cost of housing services (through the consumption channel), rather than treating house price appreciation as an unambiguous welfare gain for homeowners.

Q11. What happens to the main conclusions when borrowing constraints are introduced?

A: Incorporating net worth constraints (with either constant or empirical death rates) dampens the portfolio channel for young and middle-aged college-educated households, because rising asset prices relax borrowing constraints for these households, partially offsetting the welfare cost of more expensive accumulation. Under constant death rates with borrowing constraints, college-educated households’ oil shock welfare gain falls from +$494 to +$76; under empirical death rates, it becomes a loss of -$394. For monetary shocks, the college-educated loss falls from -$3,055 to -$1,718 (constant death rate) or -$1,036 (empirical death rates). Despite these quantitative changes, the qualitative conclusion — oil shocks are regressive, monetary expansions are progressive — holds across all specifications.

Q12. What is the implication of these findings for the policy interaction between oil shocks and monetary tightening?

A: If the monetary authority responds to oil-price-induced inflation with unexpected interest rate increases, it may exacerbate the distributional consequences of the initial oil shock. An oil supply contraction is already regressive (harming low-education households through consumption prices and labor market effects); a disinflationary monetary tightening would additionally harm low-education households through the labor income channel (higher unemployment, lower wages) while partially benefiting college-educated households through lower asset prices. The paper notes this policy interaction as noteworthy, while cautioning that the results concern identified policy shocks rather than policy rules.

Q13. How are the two shocks calibrated to be comparable?

A: The oil shock is normalized to a 10% increase in WTI crude oil prices (approximately one standard deviation of monthly oil price growth). The monetary shock is normalized to a 25 basis point decline in the one-year Treasury yield — chosen because it generates approximately the same aggregate CPI-U inflation response as the oil shock (approximately 15–16 basis points on impact, rising to approximately 34–35 basis points after two quarters). This normalization allows the paper to attribute the different distributional outcomes to the source of inflation rather than to differences in the aggregate inflation magnitude.

Q14. What role does the transfer channel play, and for whom?

A: The transfer channel is small relative to the other three channels for the vast majority of working-age households, because transfer income is less than $100 per month for most households under age 65. Social Security payments — the bulk of transfer income — are explicitly indexed to the CPI; the paper models them as moving with CPI with a one-year lag. The transfer channel exclusively benefits older households (those receiving Social Security), and its quantitative effect is modest even there. Transfer income is more than 20 times smaller than labor and asset income for prime-age households of all education groups.

Key Concepts

Feasible set approach. The paper’s organizing framework, in which the first-order welfare impact of a macroeconomic shock is measured by how the shock changes the household’s budget constraint (consumption prices, wage income, asset dividends, asset prices, and government transfers) evaluated at the household’s pre-shock choices. Substitution responses are not welfare-relevant to a first order by the envelope theorem.

Money-metric welfare gain. The willingness-to-pay measure used throughout: the welfare change from a shock divided by the household’s marginal utility of consumption at time zero, expressed in time-zero dollars. Interpreted as an equivalent variation — the amount the household must be paid or would give up to be indifferent to receiving the shock. Used because it places households with very different utility functions on a common dollar scale.

Portfolio channel. The component of the welfare formula capturing the effect of asset price and dividend changes on household welfare. Asset price changes are welfare-relevant only for households that trade (accumulate or decumulate) the asset: rising prices benefit sellers and harm buyers; falling prices benefit buyers and harm sellers. This is distinct from the “Fisher channel” in prior literature, which focuses on net nominal positions rather than on which households are in the accumulation versus decumulation phase.

Internal instrument SVAR. The time-series estimation procedure used throughout: the pre-estimated identified shock series (oil supply news or monetary policy surprise) is included as a variable ordered first in a recursive structural VAR for each outcome variable. This separates shock identification (using the published instruments and controls from Kanzig 2021 and Gertler-Karadi 2015) from IRF estimation for each outcome variable, allowing the use of the full available sample for each outcome series.

Oil supply news shock (Kanzig 2021). An identified supply shock to oil markets, constructed from changes in oil price futures in tight windows around OPEC production announcements. Used to capture exogenous cost-push inflation driven by supply constraints rather than demand.

Monetary policy shock (Gertler-Karadi 2015). An identified demand-side shock, constructed from federal funds rate futures surprises in 30-minute windows around FOMC announcements, instrumented into a monetary SVAR. Captures exogenous interest rate cuts that generate aggregate demand expansion and inflation.

Borrowing constraint wedge. An additional term that appears in the welfare formula when households face net worth constraints. Proportional to the Lagrange multiplier on the net worth constraint, it discounts future periods more heavily when constraints bind, and adds a term for the welfare value of relaxed constraints when asset prices rise. Identified from deviations from perfect consumption smoothing using CEX lifecycle consumption data.

Artificial intelligence and cognitive inequality

Mon, 01 Jan 0001 00:00:00 +0000

Artificial intelligence and technological unemployment

Mon, 01 Jan 0001 00:00:00 +0000

Wang and Wong develop a continuous-time labor-search model to assess the dynamic effects of generative AI (GenAI) on labor productivity and unemployment. The paper is motivated by conflicting empirical evidence: micro studies find productivity gains of 14% (Brynjolfsson, Li, and Raymond 2025) and 55.8% faster coding (Peng et al. 2023), while macro estimates suggest modest TFP gains of at most 0.064% annually (Acemoglu 2024), and occupation-level evidence shows a 13% relative employment decline in AI-exposed jobs (Brynjolfsson, Chandar, and Chen 2025).

The model distinguishes GenAI from earlier automation technologies by its learning-by-using mechanism: AI capability grows at rate µ per employed worker (law of motion dAt/At = µHt − δ), raises employed workers’ productivity, and creates a displacement threat through renegotiation. When renegotiation fails, AI replaces the worker, generating technological unemployment. Firms renegotiate wages at a rate ρµAt proportional to AI’s learning rate and the job’s exposure ρ. The joint surplus condition governs whether replacement occurs: AI replaces a worker if and only if πA (AI’s net present value per output) exceeds the post-renegotiation joint surplus St.

The model admits three steady states: (i) a some-AI steady state with finite AI capability, persistent AI adoption (It = 1), expanded job creation but declining employment at H∞ = δ/µ; (ii) an unbounded-AI equilibrium with sustained endogenous growth, no displacement (It = 0), and employment at H∞ = α/(α+σ); and (iii) a no-AI equilibrium reverting to the Mortensen-Pissarides benchmark. In the benchmark model (exogenous job-finding rate, AI-augmented productivity), multiple steady states can coexist—global indeterminacy—when condition (28) holds. In the full model (endogenous job creation via free entry), both global and local indeterminacy are possible, and a continuum of oscillatory transition paths converge to the some-AI steady state.

Calibrated to U.S. data, targeting a pre-AI unemployment rate of 5%, AI elasticity of productivity εy = 1.069 (from Czarnitzki et al. 2023), initial AI productivity boost of 14% (Brynjolfsson et al. 2025), worker exposure ρ = 0.618 (Brynjolfsson et al. 2018’s machine learning suitability index), AI replacement cost ϕ = 0.0043 (from U.S. business GenAI spending), AI learning rate µ = 0.632, and AI error rate δ = 0.462 (Moore’s law half-life of 1.5 years), the model converges to a some-AI steady state. The long-run results are: a 23% employment loss (H∞ = 0.732 vs. H0 = 0.95), AI capability improvement of 321%, and labor productivity gain of 366%. Approximately half of the employment loss—11.5 percentage points—occurs within the first five years, alongside a 49.3% output gain and 45.5% AI capability improvement over that period.

Untargeted moments are validated: the model implies 7.08% labor productivity growth over the first 10 years (consistent with Briggs and Kodnani 2023) and an AI elasticity of vacancies averaging 0.16 over the first five years (consistent with Acemoglu et al. 2022).

On welfare, equilibria are inefficient even when the Hosios condition holds. AI introduces four externalities beyond standard matching frictions: job destruction via displacement, productivity enhancement for employed workers, feedback from AI learning depending on employment, and direct effects on matching surpluses. A constrained-optimal subsidy to jobs at risk of AI displacement is 26.6% in the short run and exceeds 50% in the long run. In the full model, the Hosios condition requires fixing firm bargaining power θ to the vacancy elasticity of matching ξ, but an additional per-output transfer T = µApωA to firm-worker matches is necessary to correct AI adoption inefficiency.

Q: What is the core mechanism by which AI generates unemployment in this model? A: AI capability grows through a learning-by-using process (dAt/At = µHt − δ), improving as it observes employed workers. As capability rises, firms gain a displacement option that arrives at rate ρµAt per matched pair. When renegotiation over wages fails—i.e., when the AI’s NPV πA exceeds the joint surplus—firms replace workers with AI, causing unemployment. This creates a feedback loop: higher employment accelerates AI learning, which increases displacement pressure and reduces employment.

Q: What are the three steady states and what distinguishes them? A: The some-AI steady state features finite AI capability, persistent displacement (It = 1), and long-run employment H∞ = δ/µ; it involves technological unemployment. The unbounded-AI steady state features infinite AI capability, no displacement (It = 0), endogenous productivity growth, and employment H∞ = α/(α+σ) as in the standard Mortensen-Pissarides model. The no-AI steady state has A∞ = 0 with the same H∞ = α/(α+σ) but no AI contribution. Employment is higher in the unbounded-AI equilibrium than in the some-AI equilibrium.

Q: What does the calibration imply for long-run employment and productivity? A: The calibrated full model converges to a some-AI steady state with a 23% employment loss (H∞ = 0.732), a 321% improvement in AI capability, and a 366% gain in labor productivity. The parameters yield a unique equilibrium under the baseline calibration (πA = 1.949 > sAI = 0.8735 confirms some-AI existence). These results reflect a large worker replacement effect under the calibrated AI learning and error rates, while the job creation effect is relatively modest.

Q: How fast does technological unemployment materialize? A: Approximately half of the total 23% employment loss occurs within the first five years; specifically, employment falls by 11.5 percentage points over that period. Over the same five years, AI capability improves by 45.5% and output rises by 49.3%. Over the first 10 years, AI capability improvement accumulates to 94.0% and output gain to 103% (approximately double the five-year output gain).

Q: How does the full model differ from the benchmark model in transition dynamics? A: In the full model, job-finding rates are endogenous: firms post vacancies until a free-entry condition (κyt = ftΠt) is satisfied, tying job-finding rate αt to the surplus ratio st via αt = α(st). This endogeneity implies that as AI raises labor productivity, firms create more vacancies, slowing the employment decline relative to the benchmark model with a fixed job-finding rate. At the same time, AI capability grows faster in the full model because higher employment accelerates AI learning.

Q: What is global indeterminacy and when does it arise? A: Global indeterminacy occurs when both the some-AI and unbounded-AI steady states coexist, so the long-run outcome depends on initial conditions or expectations. In the benchmark model this requires condition (28): 0 < r + σ + α(1−θ) − (1−b)/πA ≤ εy(µα/(α+σ) − δ). In the full model, global indeterminacy is plausible when firm bargaining power rises to θ = 0.95 given the baseline AI replacement cost ϕ = 0.0043. The region of global indeterminacy is larger when firm bargaining power is higher.

Q: What is local indeterminacy and what does it imply for transition paths? A: Local indeterminacy means there is a continuum of equilibrium paths converging to the some-AI steady state in the neighborhood of that steady state, rather than a unique saddle path. In the full model, under alternative parameters (θ = 1, ξ = 0.765, εy = 6), the eigenvalues feature a negative real root and two complex roots with negative real parts, yielding oscillatory local dynamics in employment and AI capability. This implies short-run cycles in productivity and unemployment, consistent with the wide range of empirical findings on AI’s labor-market effects.

Q: Why does the Hosios condition fail to deliver efficiency in this model? A: The Hosios condition eliminates the standard matching externality by setting firm bargaining power to the vacancy elasticity of matching. But AI introduces four additional externalities: (i) job destruction through displacement, (ii) productivity enhancement for employed workers, (iii) feedback from AI learning that depends on aggregate employment, and (iv) direct effects on matching surpluses and job-finding rates. These externalities mean the standard Hosios rule alone is insufficient; additional instruments are required.

Q: What is the constrained-optimal policy response? A: In the simple model, the constrained optimal AI adoption threshold differs from the equilibrium threshold because firm bargaining power θ distorts adoption decisions: AI is over-adopted when πA > (1−b)/(r+σ+α(1−θ)) and under-adopted when (1−b)/(r+σ+α) < πA ≤ (1−b)/(r+σ+α(1−θ)). In the full model, constrained optimality requires setting θ = ξ (Hosios) plus a per-output subsidy T = µApωA to firm-worker matches exposed to AI displacement. This targeted subsidy is 26.6% in the short run and exceeds 50% in the long run.

Q: How does AI compare to computers in this model’s counterfactual? A: The paper reports that exogenous productivity growth from computers reduced unemployment only modestly—by 0.16 percentage points. By contrast, AI’s learning-by-using and displacement features imply a nearly 20% long-run employment loss in a comparable counterfactual. The key distinction is that computers lack the self-learning improvement and associated renegotiation-triggered displacement that characterize GenAI in this model.

Q: How is AI exposure parameterized and what does it capture? A: The exposure parameter ρ captures the degree to which a job is subject to AI-driven replacement risk. It is calibrated using Brynjolfsson et al. (2018)’s suitability for machine learning (SML) index: on a 1–5 scale, SML averages 3.47 across 964 O*NET occupations, translating to (3.47−1)/(5−1) = 61.8%, so ρ = 0.618. The effective exposure measure is ρµ, which is higher when facing a faster-learning AI.

Q: What is the predator-prey analogy in the model’s dynamics? A: The dynamical system for AI capability (At) and employment (Ht) in the simple model resembles the Lotka-Volterra predator-prey system. Employment (prey) feeds AI learning; as AI capability (predator) grows, it displaces workers faster, reducing employment; lower employment then slows AI learning, causing capability to decay; and the cycle repeats with diminishing magnitude until the steady state is reached. This mechanism operates only when the AI learning rate µ is neither too high nor too low, with the convergence path being a spiral when µα < 4δ²(1 − δ(α+σ)/(µα)).

Q: What is the labor-share implication of the unbounded-AI equilibrium? A: In the unbounded-AI steady state, employment is higher than in the some-AI steady state (H^AJJ > H^AI) and labor productivity grows without bound. However, the labor share is lower in the unbounded-AI equilibrium if the firm’s bargaining power θ is sufficiently low. This implies that while workers are not fully displaced and rising AI-augmented productivity sustains employment, workers’ income share may still decline even in the more favorable unbounded scenario.

Technological unemployment: A phenomenon in which AI adoption raises labor productivity and expands job creation, yet still causes sizable employment losses because the worker displacement effect (driven by renegotiation failure when AI’s NPV πA exceeds the joint surplus) dominates the job-creation effect. In the calibrated model this amounts to a 23% employment loss despite a 366% productivity gain.

Learning-by-using AI: The model’s representation of GenAI as a technology whose capability At grows through reinforced learning from employed workers at rate µ per worker, so aggregate AI growth is µHt, offset by deterioration at rate δ. This distinguishes GenAI from earlier automation technologies (computers, robotics) that do not self-improve through usage.

Some-AI steady state: A long-run equilibrium with finite AI capability (gA∞ = 0), persistent AI adoption (It = 1), and employment pinned at H∞ = δ/µ—the ratio of AI’s error rate to its learning rate. Characterized by expanded job creation but lower employment than the no-AI benchmark, constituting the model’s primary calibrated outcome.

Unbounded-AI steady state: A long-run equilibrium with infinite AI capability (A∞ = ∞), no displacement (It = 0), and endogenous growth at rate gA = µH^AJJ − δ. Employment equals the Mortensen-Pissarides level H∞ = α/(α+σ), and labor productivity grows without bound, complementing Aghion, Jones, and Jones (2019)’s idea production framework.

Global indeterminacy: Coexistence of multiple steady states (some-AI and unbounded-AI) such that the long-run equilibrium depends on initial conditions or expectations rather than being uniquely determined. Arises in the benchmark model when condition (28) holds and becomes more likely with higher firm bargaining power θ.

Local indeterminacy: A continuum of equilibrium transition paths converging to a single steady state from nearby initial conditions, rather than a unique saddle path. Arises in the full model under certain parameter configurations (e.g., θ = 1, ξ = 0.765, εy = 6), implying oscillatory short-run dynamics in employment and AI capability.

AI exposure (ρ): A firm-level parameter capturing the degree to which a job-match is subject to AI-driven displacement risk. The displacement option arrives at rate ρµAt per matched pair; ρ is calibrated at 0.618 using the average suitability-for-machine-learning score across O*NET occupations. The effective exposure measure is the product ρµ.

Renegotiation-proof displacement: Proposition 1’s result that the joint surplus Snt is independent of the renegotiation round n, so the AI adoption decision It is also round-invariant. This simplifies the model to a single indicator function: AI replaces the worker if and only if πA exceeds the joint surplus St, regardless of how many renegotiation rounds have occurred.

Auctions with Frictions: Recruitment, Entry, and Limited Commitment

Mon, 01 Jan 0001 00:00:00 +0000

This paper develops an auction model that jointly incorporates three frictions pervading informal price-formation processes: (1) costly recruitment by the seller, (2) costly participation by bidders, and (3) the seller’s inability to commit to a recruitment level or reserve price. The authors argue these frictions are especially prevalent in markets for idiosyncratic assets such as mergers and acquisitions, real estate, and home repair contracting, where auction houses like Christie’s and Sotheby’s command fees of 20–30% of revenues precisely because they reduce the underlying inefficiencies.

The model features a single seller who exerts recruitment effort gamma at cost gamma*s, generating a Poisson-distributed number of contacted bidders with mean gamma. Each contacted bidder independently decides whether to pay entry cost c > 0 to learn their private value and participate in a first-price auction (FPA). The seller cannot commit to gamma (which is unobservable to bidders) or to a reserve price. Two scenarios are analyzed: PO (participation-observable, where bidders observe the number of entrants before bidding) and PU (participation-unobservable).

The central tension is between the seller’s incentive to recruit more bidders to intensify competition and raise revenue, and bidders’ rational concern that excessive recruitment makes entry unprofitable. Because the seller cannot commit, this tension generates several novel inefficiency results.

In the PO scenario, the seller’s marginal revenue from recruitment Ro’(lambda) is single-peaked, meaning there is a minimum profitable participation scale lambda_o below which the seller will never recruit. Combined with a maximum participation level lambda-bar_c above which bidders will not enter (defined by U(lambda-bar_c) = c, where U is the bidder’s expected payoff), no-trade equilibrium is the unique outcome whenever lambda-bar_c < lambda_o — even for arbitrarily small recruitment cost s. This result holds because with unobservable effort, bidders correctly anticipate the seller will target participation above lambda_o, making entry unprofitable. When lambda-bar_c > lambda_o, three regimes arise: (i) no trade if s exceeds a threshold s-bar_o; (ii) an interior equilibrium with full entry (q* = 1) and lambda* = lambda_o(s) for intermediate s; and (iii) for small s, an equilibrium with lambda* = lambda-bar_c and partial entry q* = Ro’(lambda-bar_c)/s < 1. In regime (iii), total recruitment cost lambda*(s/q*) equals the constant lambda-bar_c * Ro’(lambda-bar_c) regardless of s — so even as s approaches zero, wasteful recruitment costs do not vanish, because they are determined by incentive constraints rather than by technology.

In the PU scenario, a no-trade equilibrium always exists for all parameter values, because the seller cannot credibly disclose participation, creating self-reinforcing expectations of zero competition. The seller’s recruitment incentive xi(lambda) is strictly weaker than Ro’(lambda) in the PO scenario (proven via revenue equivalence: Ro’(lambda) = xi(lambda) + a positive term reflecting how greater participation induces more aggressive bidding). This yields ranking reversals: for intermediate s and small c, the PO scenario dominates PU; but for small s or large c, the PU scenario’s weaker recruitment incentive reduces wasteful over-recruitment, making PU preferable. These comparisons translate directly to a comparison of FPA and SPA with unobservable participation: the two formats are not equivalent in the presence of recruitment and entry frictions because they generate different recruitment incentives.

A sampling-curse mechanism drives near-complete market unraveling when sellers have privately known recruitment costs drawn from a continuous uniform distribution on [0, s_o]. Because low-cost sellers recruit more, a contacted bidder believes the seller is more likely to have low costs — and hence to have recruited many other bidders — making entry unprofitable. Proposition 3 establishes a threshold c-hat such that for c in (c-hat, c-bar), as the lower bound of the cost distribution approaches zero, the fraction of seller types that remain inactive approaches one — near-complete unraveling — even though each type would be active if its cost were commonly known.

Q: What is the paper’s main modeling innovation relative to the existing literature? A: The paper’s central novelty is combining all three frictions — costly recruitment by the seller, costly participation by bidders, and limited seller commitment — in one model. The existing literature had studied entry and recruitment separately; Szech (2011) examined costly recruitment with costless entry; McAfee and McMillan (1987) and Levin and Smith (1994) studied costly entry with an exogenously given number of potential bidders; Milgrom (1987) and McAfee and Vincent (1997) studied limited commitment to a reserve price with a fixed bidder set. None combine all three.

Q: What is the “minimum profitable scale” result and why does it arise? A: Because the seller cannot commit to a reserve price, the first few bidders are complementary — they stimulate competitive bidding, causing the seller’s marginal revenue Ro’(lambda) to be initially increasing, then decreasing (single-peaked). This means the seller’s profit Pi_o(lambda, q) is maximized either at zero or at a participation level above a minimum scale lambda_o, defined by Ro’(lambda_o) = s-bar_o. The seller will never choose a participation level between 0 and lambda_o.

Q: Under what conditions does the market completely shut down in the PO scenario? A: No-trade is the unique equilibrium outcome whenever lambda-bar_c < lambda_o, where lambda-bar_c is defined by U(lambda-bar_c) = c (the participation break-even level) and lambda_o is the seller’s minimum profitable scale. This condition arises when entry costs c are large enough relative to the competitive dynamics. Importantly, no trade occurs for every recruitment cost s > 0, including arbitrarily small s — commitment failure alone can cause complete market breakdown even when recruiting bidders is nearly costless.

Q: What is the inefficiency in regime (iii) of Proposition 2 (small s, PO scenario)? A: When s < Ro’(lambda-bar_c), equilibrium has lambda* = lambda-bar_c and q* = Ro’(lambda-bar_c)/s < 1. The total recruitment cost is lambda* * (s/q*) = lambda-bar_c * Ro’(lambda-bar_c), a strictly positive constant independent of s. As s approaches zero, total recruitment effort and its cost do not vanish — they are pinned by incentive constraints. This waste could be avoided if the seller could commit to an effort level below lambda-bar_c, illustrating that commitment failure creates persistent inefficiency even when the technology of recruitment is inexpensive.

Q: Why does a no-trade equilibrium always exist in the PU scenario but not always in the PO scenario? A: In the PU scenario, if bidders expect zero participation, they bid zero conditional on being contacted; the seller then has no incentive to recruit, validating the expectation. This equilibrium is self-sustaining for all parameter values (Claim 2). In the PO scenario, the equilibrium refinement (requiring that off-path beliefs not support negative seller payoff at lambda = 0 when trade equilibria exist) rules out no-trade equilibria when lambda-bar_c > lambda_o and s is not too large; specifically, Proposition 2 shows that no-trade equilibrium is unique only when s > s-bar_o or lambda-bar_c < lambda_o.

Q: What drives the ranking reversal between PO and PU scenarios? A: The core result is Claim 3: Ro’(lambda) > xi(lambda) for all lambda > 0, meaning the marginal incentive to recruit is strictly stronger under PO than PU. This follows from revenue equivalence: Ro’(lambda) = xi(lambda) + (d/d lambda-hat) Ru(lambda, beta_{lambda-hat})|_{lambda-hat=lambda}, and the second term is strictly positive because greater expected participation induces more aggressive bidding. For intermediate s and small c, stronger PO recruitment incentives support higher participation and revenue. For small s or large c, those same stronger incentives generate wasteful over-recruitment in PO, making PU preferable.

Q: How does the paper connect its PO/PU comparison to a comparison of first- and second-price auctions? A: In any standard auction where the highest-value bidder wins, payoff and revenue equivalence imply that the bidder payoff function U(lambda) and seller revenue Ro(lambda) are identical. In particular, the dominant-strategy equilibrium of the SPA (where bidders bid their true values regardless of participation) generates the same outcomes as the PO equilibrium, because with truthful bidding the observability of participation is irrelevant. Therefore, comparing PO and PU with an FPA is equivalent to comparing the SPA and FPA with unobservable participation. The two formats are not revenue-equivalent when recruitment and entry frictions are present: their ranking depends on s and c in exactly the way described for PO vs. PU.

Q: What is the “sampling curse” and how does it cause market unraveling? A: The sampling curse arises when sellers have privately known recruitment costs. Because a lower-cost seller optimally recruits more bidders, the probability of any given bidder being contacted is higher when the seller has a lower cost. Conditional on being contacted, a bidder therefore believes the seller more likely has a low cost and thus has recruited many competitors, reducing the value of entry. In the binary-type case (Claim 8), if sL is sufficiently small relative to sH, the low-cost seller must recruit so many bidders that entry becomes unattractive; the resulting low q* makes the marginal recruitment cost sH/q* prohibitively high for the high-cost type, driving it out (lambda*_H = 0).

Q: What does Proposition 3 establish about near-complete unraveling with a continuum of seller types? A: With seller costs uniformly distributed on [s-bar, s_o], Proposition 3 establishes a threshold c-hat strictly between 0 and c-bar such that: (i) for c in (c-hat, c-bar), as s-bar approaches zero, the fraction of seller types with zero recruitment approaches one — near-complete market unraveling; (ii) for c < c-hat, all seller types remain active regardless of how small s-bar is. This is striking because for any commonly known s in (0, s_o), the PO scenario supports trade for all c < c-bar; unraveling arises purely from the interaction of private cost information and the sampling curse, not from any type’s cost being intrinsically too high.

Q: What does the welfare analysis say about equilibrium efficiency? A: The welfare-maximizing participation level lambda_w satisfies U(lambda_w) = c + s (equating the marginal bidder’s surplus to the full social cost of one more participant), with full entry q_w = 1. In equilibrium under PO, q* < 1 in some cases (wasted recruitment) and lambda* differs from lambda_w for almost all (s, c) pairs — both excessive participation (lambda* > lambda_w) and deficient participation (lambda* < lambda_w) can arise. Full efficiency requires Ro’(lambda*) = s and U(lambda*) = s + c simultaneously, but since both U and Ro’ are independent of s and c as parameters, these equalities generically fail.

Q: Does the seller benefit from being able to commit to recruitment effort? A: Claim 10 shows that with observable effort in the PO scenario, the seller commits to gamma-hat = min{lambda-bar_c, lambda_o(s)} when lambda-bar_c >= lambda_o, and to lambda-bar_c (if profitable) when lambda-bar_c < lambda_o. Commitment strictly improves the seller’s profit whenever gamma-hat = lambda-bar_c: it enables positive trade when lambda-bar_c < lambda_o and Ro(lambda-bar_c) > lambda-bar_c * s (otherwise impossible without commitment), and it saves recruitment costs when lambda-bar_c > lambda_o and Ro’(lambda-bar_c) > s. However, the commitment outcome is always welfare-inefficient: lambda-bar_c > lambda_w whenever s > 0.

Q: What anecdotal evidence do the authors cite for the model’s relevance? A: Subramanian (2010) and Boone and Mulherin (2004, 2009) show that the majority of merger and acquisition auctions are “informal” — mixtures of auctions and negotiations rather than structured processes with rules laid out in advance — and that sellers are typically unable to credibly commit to participation levels. Milgrom (2003) states from consulting experience that marketing an auction is often more critical than clever mechanism design. Fees of 20–30% of revenues paid to intermediaries like Christie’s and Sotheby’s are offered as quantitative evidence of the magnitude of the inefficiencies that such intermediaries reduce. Home repair contracting is cited as a familiar informal-auction setting where both recruitment and entry costs are material.

Recruitment effort (gamma): The seller’s costly action of contacting potential bidders, modeled as a Poisson process with mean gamma at cost gamma*s; unobservable to bidders in the baseline model.

Participation-observable (PO) vs. participation-unobservable (PU) scenarios: The two variants of the model; in PO, bidders observe the total number of entrants n before bidding; in PU, they do not observe n and the seller cannot credibly disclose it.

Minimum profitable scale (lambda_o): The smallest positive participation level the seller will ever choose in equilibrium, defined as the value where Ro’(lambda_o) equals the peak of the average revenue curve s-bar_o. The seller always recruits either zero bidders or at least lambda_o, due to the initial complementarity of bidders (they stimulate each other’s bids) under no-commitment-to-reserve-price.

Break-even participation level (lambda-bar_c): The maximum participation level at which a bidder’s expected gross payoff U(lambda) equals the entry cost c; bidders will not enter if they expect participation above lambda-bar_c.

Sampling curse: The adverse-selection mechanism arising when sellers have privately known recruitment costs: because low-cost sellers recruit more, a contacted bidder infers the seller is more likely to have a low cost and thus to have recruited many competitors, making entry less attractive and potentially driving higher-cost seller types out of the market.

xi(lambda): The seller’s marginal revenue with respect to recruitment in the PU scenario, defined as the total derivative of Ru(lambda, beta_{lambda-hat}) evaluated where actual and expected participation coincide (lambda-hat = lambda). Strictly less than Ro’(lambda) for all lambda > 0, reflecting that in PU the seller loses the ability to leverage bidder aggression via observable competition.

Wasteful recruitment: The equilibrium phenomenon in which total recruitment cost lambda*(s/q*) remains at the positive constant lambda-bar_c * Ro’(lambda-bar_c) even as s approaches zero, because incentive constraints — not technology — pin the equilibrium effort level.

Automated credit limit increases and consumer welfare

Mon, 01 Jan 0001 00:00:00 +0000

Layer 1 — Overview

Research Question. Should regulators restrict banks from proactively raising credit card limits using machine-learning algorithms, and if so, how? The paper asks: to what extent are bank-initiated credit limit increases directed toward revolving borrowers (those who carry interest-accruing balances month-to-month), and what are the welfare consequences of policies that constrain such increases?

Data. The empirical analysis uses the Federal Reserve’s Capital Assessments and Stress Testing (Y-14M) regulatory data, January 2014 to December 2024, covering monthly account-level records for all credit cards issued by large stress-tested banks (assets > $100B). The 26 banks in the sample collectively represent more than 70% of U.S. credit card balances. A 0.5% sample yields more than 150 million observations across more than 3.6 million unique active credit cards. A key advantage of Y-14 over credit bureau data is that it identifies whether each limit change was bank-initiated or consumer-initiated — a distinction not available in other datasets.

Stylized Facts. Credit limit increases are an important and understudied source of consumer credit. During the post-pandemic period, limit increases generate more than $40 billion of additional available credit per quarter, roughly 60% of the approximately $70 billion coming from new card originations; prior to the pandemic the figure was about $30 billion, or roughly half of new issuance. The number of accounts undergoing a limit increase each quarter is on average 30% higher than the number of new cards issued. Consistent with “low-and-grow” lending strategies, limit increases are disproportionately important for lower credit-score borrowers: average subprime credit limits rise from $700 at origination to $2,700 by five years after origination (a 285% increase) and to nearly $5,000 by eight years, while average superprime limits rise only from approximately $12,000 to $15,000 (a 25% increase). About 30% of total revolving balances are made possible by limit increases, with the share reaching 60% for subprime borrowers but only 12% for superprime borrowers. Approximately 75–80% of all limit increases — both by dollar amount and by number of cards — are bank-initiated rather than consumer-initiated. Banks that more frequently reference “artificial intelligence” or “machine learning” in their 10-K filings support a larger share of revolving balances through limit increases. Bank-initiated increases are roughly 1.5–2 times more prevalent among accounts that have revolved in the prior three months, whereas consumer-initiated increases show essentially no differential by revolving status.

Empirical Analysis. Using a linear probability model with card-portfolio-group fixed effects, month fixed effects, and controls for credit score, income, prior limit changes, and other account characteristics, the authors show that the probability of a bank-initiated limit increase follows an inverse-U shape in revolving utilization: accounts with revolving utilization in the moderate range (roughly 0.2–0.7) are most likely to receive an increase, while those near zero or near 1.0 are not. An account with revolving utilization in the (0.2, 0.3] bin is approximately as likely to receive a limit increase as an account whose credit score just rose by 66 points. Transacting utilization, by contrast, follows a logistic growth pattern: the probability rises monotonically until about a utilization of 0.3 and is flat above that. An event study shows that after a bank-initiated limit increase, revolving utilization rebounds to its pre-increase level within approximately 8 months; on average, revolving balances increase by about 40% of the limit increase, with approximately 30% of the limit increase going toward revolving balances. This rebound occurs even for accounts with revolving utilization below the pre-increase mean of 0.28, indicating that the effect is not confined to liquidity-constrained borrowers.

Model. The authors develop a life-cycle consumption–saving model with credit card borrowing, uninsurable income and employment risk, potential default (Chapter 7 style), and heterogeneous preferences following Nakajima (2017) and Gul–Pesendorfer (2001, 2004). Two household types coexist: 60% with standard exponential-discounting preferences (calibrated β = 0.92) and 40% with temptation preferences (calibrated β = 0.96, temptation parameter λ = 0.28 from Kovacs et al., 2021). The credit limit increase function is calibrated using Y-14M data via a latent-variable formulation, replicating the empirical inverted-U relationship between revolving utilization and limit increase probability. The four internally calibrated targets are: share of households with revolving credit card debt (data: 45%, model: 41.8%); utilization rate conditional on debt (data: 35%, model: 28.9%); default probability (data: 0.94%, model: 0.94%); debt-to-income ratio (data: 8.6%, model: 6.8%).

Main Findings — Baseline. Through the model, tempted agents are disproportionately likely to receive credit limit increases because they are more likely to revolve. For customers with utilization above 50%, the majority of credit limit increases are detrimental from the borrower’s own perspective. Standard agents almost always benefit from higher credit limits.

Counterfactual 1 — UK-style (prohibit limit increases for revolving borrowers). This policy reduces the annual probability of limit increases from roughly 5.5% to approximately 1.0%. The default probability falls from about 0.9% to near zero. The debt-to-income ratio declines by roughly 2 percentage points. Aggregate welfare improves by 1.12% in consumption equivalent variation (CEV) when the social planner internalizes the psychological cost of temptation (0.98% without). Standard households incur a modest welfare loss of 0.21% from reduced consumption-smoothing flexibility, while tempted households gain approximately 3.12% in CEV.

Counterfactual 2 — Canada/EU-style (require consumer consent). This policy reduces the annual limit-increase probability from 5.5% to approximately 1.9%. Aggregate welfare improves by 1.16% in CEV (1.04% without psychological costs). Standard households lose 0.19%, while tempted households gain approximately 3.19%. Under the baseline assumption of sophisticated tempted households, results are nearly identical to the UK-style policy. However, when the fraction of naïve tempted households is large, the consent-based policy becomes ineffective (naïve consumers accept limit increases they will regret), whereas the UK-style revolving-borrower ban remains welfare-improving regardless of the naïve share.

Robustness. When the firm is allowed to re-optimize its credit limit increase policy, it endogenously reallocates more limit increases toward standard consumers. Welfare gains remain positive but are attenuated: the UK-style policy yields 0.21% CEV (vs. 1.12% in the baseline calibration) and the consent-based policy yields 0.27% CEV.

Policy Implications. The U.S. lacks regulation of bank-initiated proactive credit limit increases (existing rules under ECOA and ability-to-pay provisions are largely non-binding for this purpose). The authors conclude that banks’ revealed preference for targeting revolvers constitutes an implicit targeting of consumers with self-control issues, and that if a meaningful share of households have self-control issues, there are strong consumer protection grounds for regulating algorithmic credit limit increases.

In depth

Q1. Why do the authors use Y-14M data rather than credit bureau data, and what does this data uniquely enable?

A: The Y-14M dataset allows the authors to distinguish between bank-initiated and consumer-initiated credit limit changes — a distinction not observable in credit bureau data. It also contains actual payment information enabling identification of revolvers (those carrying interest-accruing balances) rather than just total balances. The sample covers more than 70% of U.S. credit card balances and more than 150 million monthly observations over the January 2014 to December 2024 period.

Q2. How large are credit limit increases relative to new card originations in the U.S. credit card market?

A: During the post-pandemic period, limit increases produce more than $40 billion of additional available credit per quarter, roughly 60% of the approximately $70 billion created by new card originations. Prior to the pandemic the figure was approximately $30 billion, or about half of new issuance. On a count basis, the number of cards undergoing a limit increase each quarter is on average 30% higher than the number of new cards issued.

Q3. What is the “low-and-grow” strategy, and how large is the subsequent credit expansion?

A: The low-and-grow strategy involves originating higher-risk borrowers at low initial credit limits and then expanding limits based on observed borrowing behavior. For the average subprime credit card, the initial limit of $700 grows to $2,700 by five years after origination (a 285% increase) and to nearly $5,000 by eight years. For superprime borrowers, the initial limit of approximately $12,000 grows only to $15,000 (a 25% increase) by five years and then is approximately unchanged.

Q4. How does a borrower’s revolving status affect the probability of receiving a bank-initiated limit increase?

A: Bank-initiated increases are approximately 1.5–2 times more prevalent among accounts that have revolved at least once in the prior three months, compared to non-revolving accounts. By contrast, consumer-initiated increases show essentially no differential between revolvers and non-revolvers. This reveals a bank-side revealed preference for targeting revolvers.

Q5. What is the shape of the relationship between revolving utilization and the probability of a bank-initiated limit increase, and how large is its economic magnitude?

A: The relationship follows an inverted-U shape. Accounts with revolving utilization in bins between approximately 0.2 and 0.7 have the highest probability of receiving an increase; accounts near zero or near full utilization are as unlikely to receive an increase as zero-utilization accounts. The effect of being in the (0.2, 0.3] revolving utilization bin has approximately the same positive effect on the probability of receiving a limit increase as a 66-point increase in credit score, making it economically large relative to standard risk signals.

Q6. How does transacting utilization relate to bank-initiated limit increases, and how does this differ from revolving utilization?

A: Transacting utilization follows a logistic growth pattern rather than an inverted-U. The probability of receiving a limit increase rises monotonically with transacting utilization until about a utilization of 0.3, above which the probability does not vary with utilization. This contrasts with revolving utilization, where very high utilization (above 0.9) is actually no more predictive than zero utilization.

Q7. What does the event study show about borrowing behavior following credit limit increases?

A: After a bank-initiated limit increase, revolving utilization (as a share of the credit limit) drops mechanically but then rebounds to pre-increase levels within approximately 8 months. On average, revolving balances increase by about 40% of the amount of the limit increase, with approximately 30% of each dollar of new credit limit going toward revolving balances. These magnitudes are somewhat larger than the 13% (Gross and Souleles, 2002) and 18% (Aydin, 2022) found in prior work, which the authors attribute to the non-causal nature of their event study, higher average utilization in their sample, and their focus on revolving rather than total utilization.

Q8. Is the post-increase borrowing rebound driven by liquidity-constrained borrowers?

A: No. The authors show that limiting the sample to accounts with revolving utilization below the pre-increase mean of 0.28 — accounts that are unlikely to be liquidity constrained — yields very similar results. This finding is consistent with the presence of self-control issues rather than binding credit constraints.

A: The model features two types: 60% with standard exponential-discounting preferences (estimated discount factor β = 0.92) and 40% with temptation preferences (β = 0.96, temptation parameter λ = 0.28 set from Kovacs et al., 2021). The 40% tempted share is internally estimated via the Method of Simulated Moments targeting four aggregate moments: share with revolving credit card debt (45% in data, 41.8% in model), utilization rate conditional on debt (35% vs. 28.9%), default probability (0.94% vs. 0.94%), and debt-to-income ratio (8.6% vs. 6.8%).

Q10. How do tempted and standard households differ in their credit card usage within the model?

A: In the model, 76% of tempted agents carry revolving credit card debt, with an average utilization rate of 73.6%, a debt-to-income ratio of 15.4%, and a default probability of 2.22%. Standard agents carry debt only 18.9% of the time, with average utilization of 4.1%, a debt-to-income ratio of 1.1%, and a default probability of 0.08%. Tempted agents also pay a substantially higher share of income on credit card interest.

Q11. How does the model capture the mechanism by which credit limit increases harm tempted households?

A: The Gul–Pesendorfer temptation utility function makes household welfare depend on both actual consumption and the most tempting consumption alternative available (the budget-set maximum). When credit limits rise, the most tempting alternative ˜c_t increases, which raises the utility cost of self-restraint even for households that do not succumb to temptation. This mechanism is distinct from hyperbolic discounting: temptation imposes a psychic cost even on those who ultimately choose not to over-borrow.

Q12. What are the quantitative welfare effects of the UK-style policy prohibiting limit increases for revolving borrowers?

A: The policy yields an overall welfare gain of 1.12% in consumption equivalent variation (CEV) when the social planner internalizes the psychological cost of temptation (0.98% without). Standard households suffer a modest welfare loss of 0.21% from reduced consumption-smoothing flexibility. Tempted households gain approximately 3.12% in CEV, because the benefit from reduced temptation and lower interest expenditure outweighs the cost of reduced credit access.

A: The consent-based policy yields an overall welfare gain of 1.16% in CEV (1.04% without psychological costs). Standard households lose 0.19%, and tempted households gain approximately 3.19%. Under the baseline assumption of fully sophisticated tempted households, results are nearly identical to the UK-style ban.

A: The UK-style ban on limit increases for revolving borrowers remains welfare-improving regardless of whether tempted households are sophisticated or naïve — the welfare impact is approximately flat as the naïve fraction rises from zero to one. The consent-based policy, by contrast, exhibits a negative linear relationship between the naïve fraction and welfare impact, with welfare gains disappearing as the naïve fraction approaches one. Naïve consumers accept limit increases they would regret, so the policy’s effectiveness depends on households accurately recognizing their own self-control issues.

Q15. What happens when the firm is allowed to re-optimize its credit limit increase policy in response to regulation?

A: With firm re-optimization, both counterfactual policies continue to improve welfare but the magnitudes are attenuated. The UK-style policy yields 0.21% CEV overall (tempted: 0.89%) and the consent-based policy yields 0.27% overall (tempted: 0.98%), compared to 1.12% and 1.16% without re-optimization. The re-optimizing firm reallocates more limit increases toward standard consumers, which reduces the number directed at tempted households but also limits the welfare gains from regulation.

Q16. What do lenders’ 10-K filings reveal about the role of AI/ML in targeting revolvers for limit increases?

A: Banks that mention “artificial intelligence” or “machine learning” above the median number of times in their 2024 10-K filings support a higher share of revolving balances through credit limit increases, for all credit score groups. This difference is not driven by differences in credit limits at origination between higher-AI and lower-AI lenders, suggesting that AI/ML adoption affects the targeting of limit increases toward revolvers rather than the initial credit allocation.

Key Concepts

Revolving utilization. In this paper, revolving utilization is defined as the portion of overall credit card utilization attributable to balances that the borrower carries from one month to the next without full repayment, thereby accruing interest. It is measured as revolving balances divided by credit limit, averaged over the prior three months. This is distinct from transacting utilization (new purchases as a share of limit) and is the primary signal banks use — implicitly, via their algorithms — to select accounts for proactive limit increases.

Bank-initiated vs. consumer-initiated credit limit increase. A bank-initiated limit increase is one in which the lender proactively raises a borrower’s credit limit without a request from the borrower. A consumer-initiated increase is one explicitly requested by the borrower. The Y-14M data uniquely identify the source of each change. The paper documents that approximately 75–80% of all limit increases are bank-initiated, and that bank-initiated increases are strongly correlated with revolving utilization whereas consumer-initiated increases are not.

Low-and-grow strategy. The practice of originating higher-risk borrowers at low initial credit limits and then expanding those limits over time based on observed borrowing behavior. In the paper this is a documented empirical pattern, not an assumption: subprime accounts start at an average $700 limit at origination and reach nearly $5,000 by eight years, a 285% increase versus only 25% for superprime accounts over the same horizon.

Temptation preferences (Gul–Pesendorfer). A utility framework in which household welfare depends not only on actual consumption but also on the most tempting consumption alternative within the budget set. The disutility from temptation arises even when the household does not succumb — it reflects the psychological cost of self-restraint. In the paper, λ (set to 0.28) parameterizes the weight of this temptation cost relative to standard utility. Temptation preferences are time-consistent, which facilitates welfare analysis, and are preferred to hyperbolic discounting in this setting because they predict that individuals may pay to have tempting options removed even without acting on them.

Revealed preference for targeting revolvers. The paper’s characterization of banks’ credit limit increase behavior as reflecting a systematic preference for giving increases to revolving borrowers, inferred from the empirical pattern in the Y-14M data (the inverted-U shape between revolving utilization and limit increase probability). Because banks’ algorithms are proprietary and unobserved, the paper interprets the observed allocation of limit increases as a revealed preference, consistent with banks’ profit motive since revolvers generate the majority of credit card interest income.

Consumption equivalent variation (CEV). The welfare metric used throughout the paper’s counterfactual analysis. CEV is defined as the percentage change in consumption in every period and state that would make households indifferent between the baseline policy regime and the counterfactual policy. A positive CEV indicates that the counterfactual policy improves welfare; a negative CEV indicates harm. The paper considers two versions: one in which the social planner internalizes the psychological cost of temptation (consistent with tempted households’ actual preferences), and one in which the planner ignores that cost (λ = 0 for the planner) but households still face temptation.

Persistent revolving debt (UK regulatory definition). In the UK Financial Conduct Authority’s framework, a borrower is considered in “persistent revolving debt” when the cumulative amount paid toward interest and fees exceeds the cumulative amount of principal repaid over a 12-month period. The UK rule prohibits lenders from increasing credit limits for borrowers meeting this definition. The paper models a stylized version: any account currently carrying a revolving balance is ineligible for a bank-initiated limit increase in the UK-style counterfactual.

Automation and Rent Dissipation

Mon, 01 Jan 0001 00:00:00 +0000

Acemoglu and Restrepo examine the effects of automation in economies where labor market distortions cause some workers to earn rents—wages above their opportunity cost or outside option. The central question is how the interplay between automation and these distortions shapes wages, inequality, and productivity. The paper makes three contributions: a theoretical framework identifying a rent dissipation mechanism, reduced-form empirical evidence using US data from 1980 to 2016, and a general equilibrium quantification of automation’s aggregate effects.

The theoretical framework extends the task model of Acemoglu and Restrepo (2022) to incorporate task-specific wage wedges. In this setup, a firm employing labor of type g in task x pays a wage equal to the base wage multiplied by an exogenous wedge capturing rents from efficiency wages, bargaining, licensing, regulations, or norms. Because these wedges artificially inflate labor costs in high-rent tasks, firms have a stronger incentive to automate precisely those tasks—automation saves more in labor costs where rents are highest. Proposition 3 establishes that endogenous adoption decisions are tilted toward high-rent tasks: the rent distribution in automated tasks first-order stochastically dominates the rent distribution across all tasks. This targeting generates the rent dissipation mechanism. The equilibrium is inefficient on both the intensive margin (too little employment in high-rent tasks) and the extensive margin (excessive automation of high-rent tasks that a social planner would prefer to keep labor-intensive).

The rent dissipation mechanism has three consequences identified theoretically. First, it amplifies average wage losses for exposed groups beyond what displacement alone would produce, pushing displaced workers toward lower-paying jobs. Second, it compresses within-group wage dispersion by concentrating losses at higher percentiles of the within-group distribution, generating a U-shaped pattern of wage changes: workers at low percentiles earn no rents and experience only base-wage adjustments, while workers between the 70th and 95th percentiles face the steepest declines due to loss of high-rent jobs. Third, it is inefficient: because the tasks targeted by automation are not those where wages reflect scarcity or skill but rather distortionary rents, a planner would have preferred more labor allocated to these tasks, and rent dissipation offsets part or all of the cost-saving productivity gains from automation.

The empirical analysis covers 500 detailed demographic groups defined by education (five levels), gender, five age groups, five race/ethnicity groups, and nativity. Task displacement is measured as a weighted sum of industry-level automation exposure using three proxies: adjusted industrial robot penetration, specialized software services, and dedicated machinery in value added. Workers in the middle and lower-middle of the wage distribution lost 15–20% of their tasks to automation between 1980 and 2016, while post-college workers saw few tasks automated.

A 10 percentage point increase in task displacement is associated with a 24% decline in group-level relative wages (β = −2.36, s.e. = 0.13), falling to 19% after controlling for gender, education, sectoral demand, and rent shifters (β = −1.90, s.e. = 0.29). The U-shaped pattern in within-group wage changes is clearly visible: wages decline by 25–30% per 10 percentage point task displacement at the 70th–90th percentiles, compared to only 16% at the 5th–40th percentiles. Decomposing the average wage effect, the base-wage component is β = −1.53 (s.e. = 0.33) and the rent-dissipation component is β = −0.37 (s.e. = 0.11), implying a rent dissipation rate of approximately 37%. Across multiple proxies for rents—inter-industry/occupation wage differentials, wage losses after job displacement, and quit rates—the average estimated rent dissipation rate is approximately 35%. Rent dissipation accounts for one-fifth of the overall relative wage decline experienced by groups exposed to automation.

In the general equilibrium quantification (with elasticity of substitution λ = 0.5, average cost savings π = 30%, and average rent in automated tasks of 35%), automation accounts for 52% of the rise in between-group wage inequality since 1980: 42 percentage points via baseline displacement effects on labor demand, and 10 percentage points via rent dissipation. Cost savings from automation increased TFP by approximately 3% between 1980 and 2016, but inefficient rent dissipation offsets 60–90% of these gains, leaving net TFP gains of only 0.3–1.3% and net aggregate consumption gains of only 0.45–1.95% over the 36-year period.

Q: What is the rent dissipation mechanism, and why does it arise? A: Rent dissipation arises because labor market wedges make high-rent tasks artificially costly to staff with workers, giving firms a stronger incentive to automate precisely those tasks. When automation displaces workers from high-rent jobs, workers lose the premium above their opportunity cost that those jobs paid, amplifying wage losses beyond what displacement alone would cause. The mechanism is endogenous: firms do not randomly automate tasks but disproportionately target tasks where rents are highest, since doing so saves the most in labor costs. Proposition 3 formalizes this as first-order stochastic dominance of the rent distribution in automated tasks over the rent distribution in all tasks.

Q: Why is rent dissipation inefficient? A: In a distorted economy, high-rent tasks already feature too little employment at the equilibrium—firms under-hire in these tasks because the wage wedge makes labor artificially expensive. A social planner would want to allocate more labor to these tasks, not less. When automation further removes labor from high-rent tasks, it moves the economy further from the efficient allocation, dissipating rents that reflect distortions rather than true scarcity. The TFP formula shows that this inefficient targeting offsets part or all of the cost-saving gains from automation, and can even reduce aggregate productivity if the cost savings are small relative to the rent losses.

Q: What is the U-shaped pattern of within-group wage changes, and what does it indicate? A: The U-shaped pattern means that wage declines due to automation are smallest at the bottom percentiles of a group’s within-group wage distribution, largest in the 70th–95th percentile range, and then smaller again at the very top. Workers at low percentiles earn no rents, so they experience only the base-wage adjustment from reduced labor demand. Workers in the middle-upper range of the distribution hold the high-rent jobs that are disproportionately automated, so they lose both the base-wage component and the rent component of their wages. This pattern is directly visible in US data 1980–2016, with declines of 25–30% per 10 percentage point task displacement at the 70th–90th percentiles versus 16% at the 5th–40th percentiles.

Q: How is task displacement measured, and which groups are most exposed? A: Task displacement is measured as a weighted sum of industry-level automation exposure, accounting for each demographic group’s specialization in routine tasks within industries. Three proxies are used: the adjusted penetration of industrial robots, the increase in specialized software services, and the increase in dedicated machinery in value added. Workers in the middle and lower-middle of the wage distribution—broadly corresponding to non-college workers—lost 15–20% of their tasks to automation between 1980 and 2016. Post-college degree workers saw few tasks automated.

Q: How large is the rent dissipation rate, and how robust is this estimate? A: The baseline estimate from the U-shaped within-group wage change decomposition implies a rent dissipation rate (μ_Ag/μ_g − 1) of approximately 37% (β = −0.37, s.e. = 0.11). Using inter-industry and occupation wage differentials as a proxy for rents, the estimate is 39% (β = −0.39, s.e. = 0.11). Using wage losses after job displacement, the estimate is 20% (β = −0.20, s.e. = 0.04). After purging compensating differentials from the wage differential proxy the estimate remains 37%; after purging from the displacement-loss proxy it falls to 19%. Quit-rate evidence is consistent with rent dissipation: automation shifts workers toward higher-quit-rate jobs, which are lower-rent jobs. The average across proxies is approximately 35%.

Q: How much of between-group wage inequality since 1980 does automation explain, and what share is due to rent dissipation specifically? A: Automation accounts for 52% of the rise in between-group wage inequality in the US since 1980. Of this 52 percentage points, 42 percentage points are attributable to the baseline displacement effect working through reduced labor demand for exposed groups. The remaining 10 percentage points are attributable to rent dissipation—automation pushing exposed groups away from high-rent tasks into lower-paying employment. Rent dissipation thus accounts for roughly one-fifth (10/52) of automation’s total contribution to between-group inequality.

Q: How large are the productivity gains from automation, and how much does rent dissipation offset them? A: Cost savings from automation increased TFP by approximately 3% between 1980 and 2016. However, inefficient rent dissipation offsets 60–90% of these gains, because automation disproportionately targets high-rent tasks rather than tasks where the efficiency case is strongest. The net TFP increase attributable to automation is only 0.3–1.3% over the 36-year period, and the corresponding net increase in aggregate consumption is only 0.45–1.95%.

Q: How does automation affect within-group versus between-group inequality, and why is this notable? A: Automation increases between-group inequality by reducing relative wages of exposed groups (largely non-college workers) relative to unexposed groups, accounting for 52% of the rise in between-group inequality since 1980. At the same time, automation reduces within-group wage dispersion for exposed groups by compressing wages at higher percentiles. This contrasts with the standard view that inequality is fractal—rising at all levels of aggregation due to skill-biased demand—and helps explain why within-group inequality has risen steadily for college workers since the 1980s while remaining flat and then declining for non-college workers since the 1990s.

Q: What do the propagation matrix and rent-impact matrix represent in the general equilibrium analysis? A: The propagation matrix encodes how task reallocation due to automation in one demographic group creates competition for marginal tasks across other groups, transmitting the wage effects of automation to groups not directly displaced. The rent-impact matrix encodes how this task reallocation changes the rent composition of employment across groups. Both matrices are estimated from US data on task shares and group-level wage elasticities and are used to translate partial-equilibrium estimates of task displacement and rent dissipation into general equilibrium effects on wages and productivity for all demographic groups simultaneously.

Q: What are the policy implications of inefficient rent dissipation? A: Because rent dissipation is inefficient, the social value of automation is lower than what firms and consumers are willing to pay—firms capture all the labor cost savings but do not internalize the welfare cost of destroying high-rent jobs that the distorted equilibrium already under-supplies. Second-best interventions should address the underlying distortions generating rents rather than trying to slow automation directly. The paper suggests that strengthening labor market institutions supporting worker rents in non-automatable tasks could partially counteract the adverse distributional consequences of automation.

Q: How does this paper relate to Bound and Johnson (1992) and Borjas and Ramey (1995)? A: Bound and Johnson (1992) decompose changes in the US wage structure between 1979 and 1988 into technology, supply, and rent components (modeled as exogenous industry wedges), finding that 10–20% of between-group wage changes reflect rent losses. Borjas and Ramey (1995) estimate that trade increased the college premium by 1.3–2.6 log points between 1976 and 1990, with 15–33% due to loss of rents from trade-exposed jobs. Both are comparable to this paper’s finding that rent dissipation accounts for one-fifth of the wage effect of automation, though Bound and Johnson’s estimates include all factors affecting rents while this paper isolates automation specifically.

Worker rents: Wages above a worker’s opportunity cost or outside option, arising from efficiency wages, bargaining, licensing, regulations, or norms. Modeled as task-specific multiplicative wedges (μ_gx ≥ 1) that force firms to pay more than the base wage for labor in particular tasks. Explicitly excludes compensating differentials and skill premia.

Rent dissipation: The loss of above-opportunity-cost wages experienced by workers displaced from high-rent tasks into lower-paying employment. Occurs because automation endogenously targets high-rent tasks where labor is most expensive, and pushes workers into tasks where rents are lower. Quantified as the ratio of average rents in automated tasks to average rents across all tasks, minus one (approximately 35% in US data 1980–2016).

Task displacement: The share of tasks performed by a demographic group that are automated away, measured as a weighted sum of industry-level automation exposure accounting for the group’s specialization in routine tasks. Distinct from employment loss because it captures reallocation of tasks from labor to capital within the production function.

U-shaped within-group wage change profile: The pattern whereby automation generates the largest wage declines at intermediate-to-upper percentiles (70th–95th) of an exposed group’s within-group wage distribution, with smaller declines at the bottom, because high-percentile workers disproportionately hold high-rent jobs targeted by automation. Predicted theoretically and confirmed empirically in US data 1980–2016.

Propagation matrix: A matrix estimated from US data on task shares and group-level wage elasticities that encodes how automation of tasks performed by one demographic group creates competition for marginal tasks with other groups, transmitting wage effects across the demographic distribution in general equilibrium.

Inefficient automation targeting: The mechanism by which labor market distortions cause firms to automate high-rent tasks that a social planner would prefer to keep labor-intensive, since the distorted equilibrium already features too little employment in those tasks. Results in rent dissipation offsetting 60–90% of automation’s direct TFP gains from cost savings.

Rent-impact matrix: A matrix that encodes how task reallocation due to automation changes the rent composition of employment across demographic groups, used alongside the propagation matrix to compute general equilibrium effects of automation on wages and productivity accounting for distortions.

Bargaining and Inequality in the Labor Market

Mon, 01 Jan 0001 00:00:00 +0000

Layer 1 — Overview

Research Question. How prevalent is individual wage bargaining in the labor market, what determines firms’ bargaining strategies, how do bargaining encounters unfold for workers, and does heterogeneity in bargaining behavior translate into wage inequality—including the gender wage gap?

Data and Setting. The paper develops and validates novel linked survey data for Germany. A firm survey was fielded by the ifo Institute to senior HR professionals and managers in two waves (September 2021 and January 2022), yielding 772 complete responses across all major sectors and regions. These responses were linked—with consent obtained from 72% of firms—to German Social Security records (the Integrated Employment Biographies, IEB) covering 416,821 full-time employees at matched firms in 2020, and to Orbis balance sheet data for firm productivity proxies. A separate worker survey was fielded by the IAB to 135,000 full-time German workers, with 9,756 completing it; nearly 10,000 responses were used for analysis, with 7,079 workers employed at surveyed firms. The worker survey elicited detailed bargaining histories for workers who had received an outside offer in the prior six months, bargaining at the start of current employment (for workers with tenure of three years or less), and responses to a hypothetical salary expectation scenario.

Definition of Individual Bargaining. The authors define a firm as having a “bargaining strategy” if it differentiates pay between workers in the same position it perceives to have similar productivity—encompassing both variation in initial offers (which may reflect firms using information on workers’ salary expectations) and back-and-forth negotiation. Elicitation distinguishes four employee groups (recent labor market entrants, experienced non-managers, managers, and bottleneck-occupation workers) and two contexts (new external hires and incumbent workers who receive an outside offer).

Prevalence of Bargaining. Approximately 50% of surveyed firms are willing to differentiate base wages for recent labor market entrants, more than 80% for experienced non-managers and managers, and nearly all for workers in bottleneck occupations they are struggling to fill. For incumbent workers facing outside offers, 57% of firms would increase pay for recent entrants, and more than 80% for experienced incumbents, managers, and bottleneck workers. In total, 80% of workers in the sample are in positions where individual bargaining is possible.

Magnitude of Wage Differentiation. For new external hires, the typical firm expects a gap between the highest and lowest offers of 3% for recent entrants, 5% for experienced non-managers, and 10% for managers (conditional on a gap: 6%, 10%, and 12% respectively). For incumbent workers responding to outside offers, the typical firm will adjust pay by 3% for recent entrants, 6% for experienced non-managers, and 10% for managers (conditional on responding: 6%, 7%, and 14% respectively). Forty-four percent of firms report that variation in initial offers is at least as important as back-and-forth negotiation in determining workers’ final pay.

Predictors of Firm Bargaining Strategies. Contrary to models predicting more productive firms are more likely to bargain (Doniger 2015; Postel-Vinay and Robin 2004; Flinn and Mullins 2021), firms that bargain are not more productive—as proxied by firm age, size, or assets per employee—nor do they pay higher mean wages. A variance decomposition shows that employee-group dummies alone explain 33% of variation in bargaining strategies for new hires, comparable to more than 500 firm dummies. Labor market factors—particularly whether a position is hard to fill—are systematically associated with bargaining willingness. Collective bargaining agreement (CBA) coverage and East German location are negatively correlated with bargaining flexibility.

How Bargaining Unfolds. In 57% of worker-firm interactions, the worker provides salary expectations before the firm makes its initial offer; 29% of firms require this information. About one-third of applicants ask for more after the initial offer, requesting on average a 3% increase; conditional on asking, about half of firms raise the offer, but fewer than one-third match what was requested, with the typical worker improving the offer by 1.5%. The majority of outside offers are rejected: only 9% of workers who received an outside offer in the prior six months chose to move to a new firm. Of the 91% who remained at their incumbent firm, 13% successfully renegotiated their pay. Back-and-forth dynamics—where offers are accepted or rejected only after multiple rounds—are consistent with models of two-sided incomplete information.

Worker Heterogeneity and Wage Inequality. Workers with better self-assessed outside options are 9 percentage points more likely to ask for an increase after the initial offer and 7 percentage points more likely to successfully negotiate a raise, relative to same-occupation coworkers with worse outside options. Women are 6 percentage points less likely to successfully negotiate their pay upward and show lower salary expectation provision rates, including in a hypothetical scenario in which pay range information is equalized. These gender differences in bargaining are not explained by women negotiating more over non-wage amenities; controlling for outside options and risk tolerance shrinks the female coefficient by at most 15%. Among surveyed workers, after controlling for occupation-establishment fixed effects, there is no gender wage gap at firms that do not bargain, but a 4–5 percentage point gender wage gap at firms that do bargain. Across specifications, firms that engage in individual bargaining have a 3 percentage point higher gender wage gap. A simple decomposition suggests that at surveyed firms, 44% of the residual gender pay gap can be attributed to bargaining. For workers at bargaining firms, a 10 percentage point higher pay premium at the prior firm is associated with 0.5 percent higher pay at the current firm, conditional on occupation-establishment fixed effects; this relationship is statistically insignificant for workers at non-bargaining firms.

Scope Conditions. Results apply to full-time private-sector workers in Germany between ages 25 and 50, with the firm sample over-representing medium and large firms (median size 50–249 employees). CBA coverage in the sample (41%) reflects Germany’s institutional context where firms retain the right to pay above CBA floors. Results are robust to re-weighting to match the overall distribution of German firm size and sector.

In depth

Q1. How do the authors define “individual bargaining” and why is this definition broader than standard labor economics usage?

The authors define a firm as having a bargaining strategy if it differentiates pay between workers in the same position it perceives to have similar productivity, covering both tailoring of initial offers and back-and-forth negotiation. Standard labor economics definitions typically condition on wages being set ex post once outside options are revealed, and focus on back-and-forth negotiation alone. The authors’ definition is most analogous to standard definitions of price discrimination. Empirically, the vast majority of firms that differentiate initial offers (93%) are also willing to engage in back-and-forth negotiation.

Q2. How was the firm survey designed to elicit bargaining strategies reliably, and what is the “protocol question”?

The protocol question asked: “How much more could a person maximally receive compared to the fixed compensation you would have offered based on the person’s qualification/fit for the position alone?” with options ranging from “0%/no adjustments possible” to “more than 40%.” Wording was developed through over 100 conversations with HR professionals; “qualifications and fit” was the phrase most closely aligned with HR professionals’ concept of productivity. The survey was fielded by the ifo Institute—an organization with decades of experience surveying this population—with a 51% response rate, 83% completion rate, and median response time of 11 minutes.

Q3. What validation exercises support the reliability of the elicited firm bargaining measures?

Four exercises are reported. First, intra-respondent reliability: the cross-tabulations between the protocol and incidence questions show most mass on or below the diagonal (incidence-implied spread no greater than the protocol-implied flexibility). Second, inter-respondent reliability: among 37 firms with multiple respondents, there is significant overlap in independently provided answers. Third, external validity using publicly available data: for 90% of firms reporting no CBA, no CBA evidence is found; for 99% reporting no pay information in job ads, none is found in online postings; for 82% reporting no salary expectation elicitation, no evidence of it appears in online application forms. Fourth, the elicited firm strategies are highly correlated with the matching workers’ survey responses—e.g., workers at firms stating they elicit salary expectations are significantly more likely to report having provided these expectations.

Q4. Is firm productivity associated with whether a firm engages in individual bargaining?

No. Firms that bargain and those that do not are similar with respect to firm size, firm age, and total assets per employee, and they also do not differ significantly in their AKM wage premium. These findings are inconsistent with theoretical models predicting that more productive firms are more likely to set pay via bargaining (Doniger 2015; Postel-Vinay and Robin 2004; Flinn and Mullins 2021). The result holds for both binary and continuous measures of bargaining, and is not overturned by machine learning prediction attempts.

Q5. What firm characteristics other than productivity predict bargaining strategies?

CBA coverage is negatively correlated with wage flexibility—CBA-covered firms report less flexibility even for managers who are typically exempt from CBAs and for groups not covered by CBAs, suggesting institutional norms or culture matter. Firms headquartered in East Germany are less likely to bargain with workers in all groups. Publicly traded firms (stock-based corporations) are more likely to set wages flexibly. These correlations are consistent with the view that managerial style and firm culture (rather than productivity) shape wage-setting strategies.

Q6. What does the variance decomposition say about the relative importance of firm versus market factors in predicting bargaining strategies?

Employee-group dummies alone explain 33% of the variation in bargaining strategies for new hires. After adjusting for the number of fixed effects used, four employee-group dummies explain as much variation as more than 500 firm dummies. Adding firm characteristics or coarse industry dummies does not significantly improve the adjusted R-squared relative to a model containing only group dummies. This supports models emphasizing market-level factors (worker replaceability, labor market tightness) over firm-level factors.

Q7. How common is it for workers to provide salary expectations before receiving an initial offer, and what do firms do with this information?

In 57% of worker-firm interactions, the worker provides salary expectations before the firm makes its initial offer. Twenty-nine percent of firms require this information; most ask for it. Forty-four percent of firms report that variation in initial offers is at least as important as subsequent back-and-forth negotiations in determining workers’ final pay. HR professionals and prior research indicate firms interpret variation in stated expectations as reflecting outside options rather than productivity.

Q8. What fraction of outside offers are rejected, and what happens when workers stay at the incumbent firm?

Only 9% of workers who received one or more outside offers in the prior six months chose to move to a new firm. Of the 91% who remained at the incumbent firm, 13% successfully renegotiated their pay at the incumbent. A follow-up survey fielded in spring 2024 corroborates this finding, showing approximately 80% of workers who received an outside offer remained at the incumbent firm; even recoding all job-to-job transitions as accepted offers implies no more than 26% of offers lead to a transition.

Q9. What do the back-and-forth dynamics imply for appropriate theoretical models of wage bargaining?

That many offers are accepted or rejected only after multiple rounds of negotiation is difficult to rationalize with models assuming either firms or workers have perfect information, which typically predict immediate acceptance or rejection. The patterns are consistent with models of two-sided incomplete information (Perry 1986; Chatterjee and Samuelson 1983). Sixty-nine percent of HR professionals in the survey report that decision-makers at their firm only have market-level information on wages, not specific information on what competitors pay.

Q10. How do outside options predict worker bargaining behavior and outcomes, controlling for occupation-establishment fixed effects?

Workers who rated it “easy” or “very easy” to obtain a better outside offer are 9 percentage points more likely to ask for an increase after the initial offer and 7 percentage points more likely to successfully negotiate a raise relative to same-occupation-establishment coworkers who rated it “difficult” or “very difficult.” The same pattern persists during the employment spell: workers with better outside options are 9 percentage points more likely to initiate and 8 percentage points more likely to succeed in renegotiation. These workers are not more likely to receive raises without asking.

Q11. How does risk tolerance predict bargaining, and how does it compare to outside options?

Workers with greater risk tolerance (those rating themselves 7 or above on a 10-point scale) are more likely to engage in wage negotiations and more likely to succeed both at the start of and during employment spells. Gaps in successful negotiations are somewhat larger than gaps in attempted negotiations, suggesting risk-tolerant workers also negotiate more effectively. However, outside options explain more of the between-worker variation in bargaining behavior than risk tolerance does.

Q12. What are the gender differences in bargaining behavior, and can they be explained by differences in outside options or risk tolerance?

Women are less likely to engage in back-and-forth negotiations and are 6 percentage points less likely to successfully negotiate pay upward during an employment spell. Women are also less likely to provide salary expectations and provide lower expectations as a fraction of their current salary in the hypothetical scenario, including when the salary range is provided—women are 6 percentage points less likely to provide expectations above the top of the stated range. Controlling for outside options and risk tolerance shrinks the female coefficient by at most 15%. There is no evidence that women substitute toward negotiating for non-wage amenities. The pattern is most consistent with women finding negotiation uncomfortable, not with a belief that it will not pay off or fear of backlash.

Q13. What is the estimated gender wage gap attributable to individual bargaining?

Among surveyed workers, after controlling for occupation-establishment fixed effects, there is no gender wage gap at firms without individual bargaining (coefficient closes to zero), while a 4–5 percentage point gender wage gap persists at firms with individual bargaining. This difference is robust across measures of pay (total daily pay, base pay, pay conditioning on hours worked), alternative fixed effect specifications, and to including non-surveyed workers at surveyed firms. A simple decomposition suggests 44% of the residual gender pay gap at surveyed firms can be attributed to bargaining. Across the interaction specifications, bargaining firms have a 3 percentage point higher gender wage gap and—in one key specification—a 6 percentage point difference between the gender gaps at bargaining and non-bargaining firms.

Q14. How does a worker’s prior firm wage premium affect current wages, and does bargaining status matter?

In a regression of log current wages on the AKM wage premium of the prior firm (conditional on occupation-establishment fixed effects), a 10 percentage point higher pay premium at the prior firm is associated with 0.5 percent higher pay at the new firm for workers at bargaining firms. For workers whose pay is not set via individual bargaining, the relationship between the prior firm’s pay premium and current pay is statistically insignificant. The result is consistent with the idea that during negotiations with a new firm, workers use their prior firm’s pay policy as an outside option.

Q15. How do AKM person effects relate to bargaining behavior?

Higher-person-effect individuals are more likely to have provided salary expectations when applying to their current firm and ask for a larger fraction of their current salary in the hypothetical scenario (conditional on their wage). These differences persist when controlling for occupation-establishment fixed effects and age and experience. Higher-person-effect workers are not more likely to receive raises without asking. These results are inconsistent with AKM person effects reflecting only productivity differences and instead suggest that fixed differences in individual bargaining behavior contribute to the variance in person effects—which Card, Heining, and Kline (2013) estimated explains a large share (40%) of the growth in German wage inequality.

Q16. Are the bargaining patterns found at surveyed firms representative of bargaining more broadly?

Two robustness exercises support broader representativeness. First, similar bargaining dynamics are found when including a random sample of German workers employed at non-surveyed firms. Second, re-weighting the sample to match the overall distribution of firm size and sector in Germany yields similar results. Because medium and large firms are over-represented in the firm sample, and because small firms hire infrequently and are less likely to have formal bargaining strategies, the true prevalence of individual bargaining among all German firms may be somewhat lower.

Key Concepts

Individual Bargaining Strategy (firm-level). A firm has an individual bargaining strategy if it differentiates pay between workers in the same position that it perceives to have similar productivity. This definition encompasses both tailoring of initial offers (based on, e.g., workers’ stated salary expectations) and back-and-forth negotiation. It is analogous to price discrimination rather than to the standard labor economics distinction between wage posting and Nash bargaining.

Protocol Question. The main survey measure of firm bargaining strategies: firms are asked the maximum percentage by which pay could be increased for a new hire above the fixed compensation the firm would have offered based on qualifications and fit alone, with response bins from “0%/no adjustments” to “more than 40%.” A zero response is used to classify a firm as not bargaining.

Incidence Question. A supplementary survey measure eliciting the expected spread (between highest and lowest offers) that the firm would make to ten candidates with identical qualifications and fit but differing stated salary expectations and competing offers. Used to validate the protocol question and to quantify the importance of initial-offer differentiation relative to back-and-forth negotiation.

Bottleneck Occupation. A firm-defined category of workers in positions that are particularly difficult to fill, drawing on an official German Federal Employment Agency designation. In the paper, bargaining willingness is systematically higher for workers in these positions than for other workers at the same firm, providing evidence that labor market tightness drives bargaining strategies.

Outside Offer Renegotiation. Wage renegotiation at the incumbent firm triggered by a worker receiving an outside offer, without a change in job tasks. The paper documents this is empirically more common than actual job-to-job transitions: of workers receiving outside offers, 91% remain at the incumbent firm, and 13% of those who remain successfully renegotiate their pay.

AKM Person Effect. A worker fixed effect estimated from a two-way fixed effects regression of log wages on worker and firm fixed effects (following Abowd, Kramarz, and Margolis 1999). In this paper, AKM person effects are taken from Bellmann et al. (2020), estimated over 2010–2017 German population data. The paper provides evidence that these effects capture, in part, fixed differences in individual bargaining behavior rather than solely differences in productivity.

AKM Firm Effect (Wage Premium). The firm fixed effect from the same two-way fixed effects regression, representing the pay premium a firm pays relative to what would be expected given its workforce composition. The paper uses the prior firm’s AKM effect as a measure of a worker’s outside option quality when testing whether prior-firm pay policy influences current pay under individual bargaining.

Salary Expectations (Gehaltsvorstellungen). The wage figure a worker provides to a prospective employer, typically before the firm’s initial offer. Legally, German firms (like most US states) cannot ask for salary history but can ask for salary expectations. In the paper, 57% of worker-firm interactions begin with the worker providing expectations; firms report using these to tailor initial offers, interpreting variation in stated expectations as reflecting outside options rather than productivity.

Barriers to Global Capital Allocation

Mon, 01 Jan 0001 00:00:00 +0000

Overview

Research Question. Why do observed international investment positions and cross-country differences in rates of return to capital fail to conform to a frictionless capital-market benchmark? The paper asks how large the efficiency and distributional costs of barriers to global capital allocation are, and which frictions — capital income taxes, political risk, and geographic/cultural/linguistic distances — matter most.

Model. The authors develop a multi-country dynamic spatial general equilibrium model in which the entire network of bilateral cross-border investment positions is endogenously determined. Production in each country i follows a three-factor Cobb-Douglas function in reproducible capital, labor, and natural resources, with country-varying income shares. Capital is the only mobile factor. A logit asset demand system governs portfolio shares: the share of country j’s savings invested in country i is proportional to the risk-adjusted expected return on capital in i, scaled by the capital stock of i, and inversely proportional to a bilateral portfolio wedge ∆ij. These wedges can be microfounded via either rational inattention (where wedges reflect the precision of prior beliefs about returns) or extreme-value-distributed transaction costs. The model admits multiple microfoundations but yields the same functional form and the same counterfactual welfare calculations regardless of interpretation.

Frictions measured. Three categories of frictions enter the empirical implementation: (a) bilateral capital income tax rates — a new dataset covering 225 countries (50,625 country pairs), constructed from corporate income tax rates and treaty-adjusted withholding tax rates on dividends and interest, further adjusted for effective tax rates accounting for tax-haven routing; (b) political risk, proxied by an ICRG composite index (excluding socioeconomic conditions) following Alfaro, Kalemli-Ozcan, and Volosovych (2008); (c) geo-political distance, comprising geographic distance, cultural distance (based on 496 World Values Survey questions across 116 countries), and linguistic distance (based on a language-family tree covering 6,737 languages and 242 countries). These distance measures are publicly available at geopoliticaldistance.org. The model covers 96 countries (9,216 dyads), representing 92% of world GDP in 2017.

Gravity Estimation. Bilateral investment data (restated for tax havens using the nationality-basis methodology of Coppola et al. 2020 and Damgaard et al. 2019) are regressed on cultural, geographic, and linguistic distance with origin and destination fixed effects. In OLS, a one-standard-deviation increase in cultural distance (0.023 units) is associated with a 24.0% decrease in foreign assets; geographic distance (0.977 units in logs) with a 78.6% decrease; linguistic distance (0.174 units) with a 51.5% decrease. These magnitudes are robust across OLS, PPML, and IV (using religious distance as an instrument for cultural distance). Under IV, the standardized effect of cultural distance on log foreign assets rises to −76.5%.

Tax haven analysis. A Tobit regression of the share of bilateral investment routed through tax havens on the estimated tax saving from routing through havens yields coefficients of 0.413–0.999 for equity and 1.001–1.777 for debt (across specifications with varying fixed effects), confirming that tax incentives are a primary driver of the discrepancy between residency-based and nationality-based bilateral positions.

Model fit (untargeted moments). The calibrated baseline model produces: (i) a correlation of 0.658 between model-implied and empirical rates of return to capital (vs. 0.325 for the frictionless benchmark), with a standard deviation of 0.417 (vs. 0.091 frictionless; data: 0.496); (ii) a correlation of 0.947 between model-implied and empirical capital per employee (vs. 0.918 frictionless); (iii) a correlation of 0.94 between model-implied and empirical home bias; the model reproduces the mean home bias of 3.973 vs. 4.006 in data and standard deviation of 1.065 vs. 1.224, while the frictionless benchmark produces exactly zero home bias for all countries. Portfolio-share MSE: 1.16 (baseline) vs. 1.86 (frictionless).

Counterfactual findings. Removing all measured barriers raises world GDP by 6.8% relative to the observed equilibrium (equivalent to stating that the distorted equilibrium is 6.8% below the frictionless benchmark). Geo-political distance alone accounts for most of this: when only distance frictions are retained, world GDP is 5.2% below the frictionless level. Capital taxes alone reduce world GDP by 2.6% below frictionless; political risk alone by 0.4%. The standard deviation of log capital per employee is 51.5% higher than it would be without barriers; the standard deviation of log output per employee is 22.5% higher. In the frictionless equilibrium, capital flows from rich to poor countries (the correlation between net foreign assets and development doubles in absolute value), accounting for the Lucas (1990) puzzle. In short-term (one-period) counterfactuals holding wealth fixed, the GDP gain from full barrier removal is 3.6%; the inequality effect remains similar (standard deviation of log capital per employee 48.4% higher with barriers).

Scope conditions. The model focuses on steady-state outcomes; dynamic transition effects are analyzed in extensions but are smaller. Quantitative conclusions are conditioned on: (i) the model sample of 96 countries covering 92% of world GDP in 2017; (ii) the conservative OLS coefficient estimates used for baseline calibration (IV estimates are larger and would amplify results); (iii) the assumption that the logit demand system captures frictions regardless of their microfoundation; (iv) omission of goods-trade frictions from the baseline (when included, the world GDP effect falls to 3.7% and the capital inequality effect to 23.3%).

In depth

Q1. What is the core theoretical prediction about cross-country rates of return when investment barriers exist?

A: In the model’s frictionless benchmark (Propositions 1 and 2), all origin countries hold identical portfolios and risk-adjusted expected returns are equalized across destinations. When bilateral frictions are introduced, countries that are more “peripheral” (harder to access for foreign investors due to high geo-political distance or political risk) receive less inward capital and therefore command higher physical rates of return to capital. Countries that are easily accessible (“central”) attract more capital and exhibit lower rates of return. The Dual Efficiency Theorem establishes that capital is efficiently allocated if and only if marginal products of capital are equalized across countries, which requires that taxes are uniform and that portfolio wedges satisfy a specific cancellation condition.

Q2. How are portfolio wedges measured, and what is the identifying strategy?

A: Portfolio wedges ∆ij are decomposed into a geo-political distance component and a political risk component. The geo-political distance component is specified as a log-linear function of geographic distance, cultural distance, and linguistic distance, with coefficients (β_g, β_c, β_l) estimated from a gravity regression of log bilateral investment on these distances, controlling for origin and destination fixed effects. Because political risk varies only by destination country, it cannot be separately identified from destination fixed effects in the bilateral regression; its elasticity is therefore taken from Alfaro, Kalemli-Ozcan, and Volosovych (2008). The key identification advantage of bilateral data is that origin and destination fixed effects absorb all country-level confounders, so the distance coefficients are identified purely from within-origin, within-destination variation across country pairs.

Q3. What do the OLS gravity regressions find, and are the coefficients stable across specifications?

A: In the baseline OLS specification (Table 2, column 1), the estimated coefficients on cultural distance, geographic distance, and linguistic distance are −11.944, −1.579, and −4.162 respectively (all significant at the 1% level). In standardized terms, a one-standard-deviation increase in cultural distance reduces foreign assets by 24.0%, geographic distance by 78.6%, and linguistic distance by 51.5%. Adding a rich set of control variables (colonial ties, legal origin, currency pegs, trade agreements, effective tax rates) leaves these magnitudes broadly similar: standardized effects on foreign assets are −26.4%, −80.1%, and −47.6%, respectively. Results are also robust across OLS and PPML specifications and across years 2013–2017. Effects are quantitatively similar for foreign equity and foreign debt, though linguistic distance has a somewhat smaller effect on debt.

Q4. How does the instrumental variable strategy address reverse causality in cultural distance, and what does it find?

A: The authors instrument cultural distance with religious distance (based on historical trees of religious affiliation), assuming religious history affects international investment only through its contemporary effect on differences in values and beliefs as captured by the World Values Survey. The instrument is a strong predictor of cultural distance (passes weak-instrument tests comfortably). Under IV, the standardized effect of a one-standard-deviation increase in cultural distance on log foreign assets rises from −24.0% (OLS) to −76.5% (IV). The authors use conservative OLS estimates for their baseline calibration, so the IV results imply the headline counterfactual effects are likely understated.

Q5. How does the model predict home bias, and how well does it match the data?

A: Home bias is defined as the log difference between the domestic portfolio share and the country’s share in the world capital stock. In the frictionless model, Proposition 1 implies that all countries hold identical foreign portfolios, so the model produces exactly zero home bias for every country. The baseline model, by incorporating bilateral frictions, generates home bias endogenously without targeting it. The model-implied home bias correlates with the empirically measured home bias at 0.94 across countries and matches both the mean (3.973 model vs. 4.006 data) and standard deviation (1.065 vs. 1.224) closely. The model also predicts, consistent with Lau, Ng, and Zhang (2010), that home bias and rates of return on capital are positively correlated (model-implied ρ = 0.55), and that rates of return on capital correlate negatively with the log of GDP per employee (model-implied ρ = −0.70).

Q6. What is the quantitative decomposition of the world GDP loss by type of barrier?

A: World GDP in the observed (distorted) equilibrium is measured at $112.9 trillion (PPP), which is 6.8% below the frictionless counterfactual. When all barriers are present except geo-political distance, world GDP is 5.2% below frictionless — meaning distance frictions account for the largest share. When all barriers are present except political risk, world GDP is only 0.4% below frictionless. When all barriers are present except taxes, world GDP is 2.6% below frictionless. These are not exactly additive because the distortions interact; the results confirm that geo-political distance (cultural, linguistic, and geographic) constitutes the dominant source of global capital misallocation among the three measured frictions.

Q7. How do barriers affect the cross-country distribution of capital and income?

A: The standard deviation of log capital per employee is 51.5% higher in the distorted equilibrium than in the frictionless counterfactual; the standard deviation of log output per employee is 22.5% higher. When only geo-political distance distortions are maintained, dispersion in log capital per employee is 38.2% higher and in log output per employee 15.9% higher. Maintaining only taxes raises the dispersion in log capital per employee by 12.9% and log output per employee by 6.0%; maintaining only political risk raises them by 7.3% and 3.8%, respectively. In the frictionless equilibrium, the poorest countries gain the most: some of the poorest countries see capital per employee increase by an order of magnitude and income per employee double.

Q8. Does the model account for the Lucas puzzle (capital not flowing from rich to poor countries)?

A: Yes. In the observed distorted equilibrium, net foreign asset positions correlate only weakly with the level of development, consistent with Lucas’s (1990) observation that capital fails to flow from rich to poor countries. In the frictionless counterfactual, the absolute value of the correlation between net foreign asset positions and log GDP per employee doubles, and capital indeed flows from rich to poor countries as neoclassical theory predicts. The distortions from taxes, political risk, and geo-political distance thus account for the absence of a strong correlation between net positions and development in the data.

Q9. How do extensions incorporating goods-trade frictions, capital controls, and currency hedging costs affect the headline findings?

A: Adding goods-trade frictions (country-specific prices for output and capital installation following Monge-Naranjo et al. 2019) reduces the world GDP effect to 3.7% (from 6.8% baseline) and the dispersion of log capital per employee to 23.3% higher (from 51.5%), but the overall pattern of results is preserved. Replacing political risk with capital controls (using Jahan and Wang 2016 de-jure capital account openness) yields a comparable world GDP loss of 6.6% and a geo-political distance effect of 6.2%, very close to the 6.8% and 5.2% in the baseline. Adding currency hedging costs leaves world GDP loss and inequality effects essentially unchanged relative to baseline. None of these extensions materially alters the headline conclusions.

Q10. How do the authors validate the model against nationality-based versus residency-based bilateral investment data?

A: The model is calibrated to nationality-based positions (restated for tax havens). The MSE for fitting nationality-based external portfolio shares is 1.16, while the MSE for residency-based positions is 1.22. The model was not explicitly designed to distinguish between the two, yet it naturally produces better predictions for nationality-based positions because its frictions incorporate the incentives for indirect investment routing through tax havens. This cross-validation supports the methodological approach of using nationality-restated data and confirms the internal consistency of the model’s treatment of tax-haven routing.

Q11. What are the implications for global tax policy coordination?

A: In the presence of information frictions, simple harmonization of capital tax rates across countries does not improve capital allocation efficiency and could worsen it. The Dual Efficiency Theorem implies that efficient capital allocation in a world with information frictions requires that taxes, risk premia, and information frictions satisfy a joint cancellation condition. From a normative perspective, a global social planner maximizing world GDP should impose lower capital tax rates in countries that are “peripheral” in the network of informational distances, in order to offset the disadvantage created by information frictions for those countries.

Q12. How is the elasticity parameter η calibrated, and how sensitive are the results?

A: The elasticity of substitution among countries’ assets, η, is calibrated at 18.5 based on Koijen and Yogo (2020)’s demand-price elasticities for long-term debt (3.1, converted to a gross-return elasticity of approximately 30), short-term debt (25.2, converted to approximately 24.3), and equity (1.3, converted to approximately 14.8), with weights reflecting the composition of global portfolios. The baseline gravity coefficients are calibrated from OLS with controls (cultural: −13.129, geographic: −1.645, linguistic: −3.850), chosen as conservative estimates relative to IV or PPML. Sensitivity analysis using PPML or IV estimates of β yields broadly similar steady-state GDP losses (around 6%), confirming robustness.

Key Concepts

Portfolio wedge (∆ij): A bilateral distortionary term in the logit asset demand system that captures all frictions reducing the ability of investors from country j to invest in country i. Decomposed empirically into a geo-political distance component and a political risk component. A wedge of 1 means no friction; larger values reduce the share of investment flowing from j to i. Can be interpreted either as prior-belief imprecision under rational inattention or as systematic transaction costs under the extreme-value microfoundation.

Geo-political distance: A composite of geographic distance (population-weighted geodesic distance), cultural distance (expected disagreement in World Values Survey responses between randomly drawn individuals from two countries, constructed with the “flex” method using up to 496 questions), and linguistic distance (normalized tree distance in the Ethnologue language family graph, covering 6,737 languages). Distinct from simple physical distance: it captures the informational and transactional barriers that arise from societal dissimilarity.

Dual Efficiency Theorem: A theoretical result (Theorem in Section 2.8) establishing that capital efficient allocation, equalization of marginal products of capital across countries, and uniform taxes combined with a specific cancellation condition on portfolio wedges are mutually equivalent statements in steady-state equilibrium. This is not a restatement of the First Welfare Theorem; it is a statement about GDP (not welfare) and does not require risk premia to be equalized.

Effective bilateral tax rate (τij): The composite bilateral tax rate on capital after accounting for tax-haven routing. Firms in the destination country optimally choose the share of capital issued through tax havens (solving a quadratic cost optimization), trading off the lower tax rate available through havens against an increasing quadratic routing cost. The effective rate is therefore lower than the statutory (de jure) rate when the tax-haven rate is lower than the statutory rate, with the gap depending on the estimated βth coefficient from the Tobit regressions.

Logit asset demand system: A portfolio allocation rule in which the share of country j’s savings invested in destination country i is proportional to the risk-adjusted expected return raised to the power η (the elasticity of substitution) times the destination capital stock, divided by the portfolio wedge and summed over all destinations. Microfounded either by rational inattention (Matejka and McKay 2015; Pellegrino 2023) or by extreme-value-distributed transaction costs. Produces portfolio gravity analogous to trade gravity when combined with the market clearing conditions.

Home bias: Defined as the log difference between a country’s domestic portfolio share (πii, the share of domestic savings invested at home) and that country’s share of world capital stock (ki/K). In the frictionless benchmark, home bias is exactly zero for all countries by Proposition 1. The baseline model generates home bias endogenously as a consequence of portfolio wedges and reproduces both the level and cross-sectional distribution of empirically observed home bias without targeting these moments directly.

Core-periphery structure: An emergent property of international capital markets under investment barriers: countries that are easily accessible to international investors (low geo-political distance, low political risk, favorable tax treatment) are “central” and attract capital inflows, driving their rates of return to capital lower; “peripheral” countries that are less accessible have smaller capital stocks and higher rates of return, compensating investors for overcoming barriers. This structure generates persistent capital misallocation and cross-country income inequality.

Nationality-based vs. residency-based bilateral investment positions: Residency-based data (e.g., raw IMF CPIS) attributes investment to the immediate counterparty country, including tax-haven shell companies. Nationality-based data (Coppola et al. 2020; Damgaard et al. 2019; Beck et al. 2024) reattributes investment to the country of the ultimate investor and ultimate issuer, bypassing offshore centers. The model fits nationality-based positions better (MSE 1.16 vs. 1.22 for residency-based) because it incorporates frictions that generate incentives for indirect routing, which is what nationality restatement is designed to undo.

Bayesian inference in proxy SVARs with incomplete identification: Re-evaluating the validity of monetary policy instruments

Mon, 01 Jan 0001 00:00:00 +0000

Beyond the headline: How personal exposure to inflation shapes the financial choices of households

Mon, 01 Jan 0001 00:00:00 +0000

Biased expectations and labor market outcomes: Evidence from German survey data and implications for the East–West wage gap

Mon, 01 Jan 0001 00:00:00 +0000

Layer 1 — Overview

Research question. The paper asks two questions: (1) How do workers’ biased expectations about job finding and job separation shape the labor market equilibrium and wages? (2) Are differences in expectation biases across workers a quantitatively important driver of wage differentials, specifically the East–West German wage gap?

Data. The empirical analysis uses the German Socio-Economic Panel (SOEP), a nationally representative longitudinal survey of approximately 30,000 participants per wave. The working-age sample (ages 25–65) covers nine biennial survey waves from 1999 to 2015, yielding 67,772 observations for job separation expectations and 6,423 for job finding expectations. Perceived transition probabilities are reported on a 0–100 scale in steps of 10 percentage points. Actual (statistical) transition probabilities are constructed by estimating probit models that predict realized transitions within 24 months using a rich set of individual, job, and employer characteristics, and are rounded to the nearest decile for consistency with the survey scale.

Main empirical findings. Employed workers in Germany overestimate their job separation probability by 6.4 percentage points on average (perceived: 19.8%; actual: 13.3%), a pessimistic bias significant at the 1% level. Unemployed workers overestimate their job finding probability by 8.2 percentage points on average (perceived: 57.0%; actual: 48.8%), an optimistic bias also significant at the 1% level. The East–West divergence is striking. East German workers exhibit a pessimistic job separation bias of 12.1 percentage points, compared to only 4.7 percentage points in the West, despite broadly similar actual separation rates (15.1% vs. 12.8%). For job finding, West Germans overestimate their probability by 12.9 percentage points, while East Germans overestimate by only 2.0 percentage points — meaning East Germans are also substantially less optimistic about re-employment. These East–West differences survive controls for compositional differences and alternative definitions of job separation (dismissals only; selected reasons; spell-based) and job finding (including those out of the labor force). The biases are stable over the 1999–2015 sample period with no discernible trend. A cohort analysis shows that the excess pessimism in East Germany is concentrated among cohorts who were already in the labor market at the time of German reunification (born in the 1950s and 1960s), consistent with persistent effects of the communist GDR experience. Individuals do not systematically learn over time: mean changes in individual-level absolute deviations between consecutive waves are close to zero. Individual deviations between perceived and actual rates have statistically significant but quantitatively negligible predictive power for subsequent transitions (a 1 pp higher perceived job separation is associated with only a 0.001 pp higher realized separation rate), ruling out private information as a first-order explanation for the biases.

Model. The authors extend the Diamond–Mortensen–Pissarides (DMP) frictional labor market framework by (i) allowing workers to hold biased perceived transition rates (λw for job finding, σw for job separation) while firms have rational expectations, and (ii) introducing wage contracts of explicit length T periods after which parties re-bargain. Common knowledge of each party’s perceived values is assumed, and generalized Nash bargaining is applied. The contract length T is a key parameter: there exists a critical threshold T* such that a pessimistic job separation bias raises the equilibrium wage for T < T* (the continuation-value effect dominates) and lowers it for T ≥ T* (the within-contract discounting effect dominates). An optimistic job finding bias unambiguously raises the equilibrium wage by inflating the perceived value of unemployment and hence the reservation wage.

Quantitative results. The model is calibrated to East Germany. The job separation bias (∆σ = 0.0194) and job finding bias (∆λ = 0.0044) are set to SOEP-based estimates. The critical threshold implied by calibrated parameter values is T* = 10 quarters. The baseline contract length, constructed from the share of permanent (88%) and temporary (12%) contracts in SOEP and average remaining tenure until retirement, is T = 67 quarters (a lower bound). This exceeds T*, so the pessimistic separation bias depresses wages in the baseline. A counterfactual experiment assigns West German bias levels to East German workers, while holding all other parameters fixed. For the preferred calibration range (γ ∈ {0.35, 0.50}, T ∈ {67, 106, 159}), East German wages rise by 1.07 to 2.36 percent. This corresponds to a reduction in the conditional East–West German wage gap (23 percent) of 4.6 to 10.6 percent, and a reduction in the unconditional gap (30 percent) of 3.6 to 7.9 percent. Although wages rise, equilibrium unemployment increases by 0.70 to 1.01 percentage points, widening the already large East–West unemployment gap (approximately 7 percentage points). Net of the unemployment effect, expected lifetime income (computed at actual, unbiased transition rates) rises by 0.7 to 1.88 percent for East German workers under West German biases, implying an unambiguous welfare gain. Under a biennial calibration (robustness), wages increase by up to 3.3 percent and expected lifetime income rises by up to 2.23 percent.

Scope conditions. Results apply to a stationary environment (no aggregate fluctuations). Firms are assumed to have rational expectations; an extension shows results hold provided firm bias is smaller than worker bias. Workers are assumed homogeneous in their bias levels; learning is abstracted from. The quantitative magnitudes are sensitive to the workers’ bargaining power γ and the contract length T, both of which are subject to uncertainty in calibration.

In depth

Q1. How are actual (statistical) transition probabilities constructed, and why are probit-predicted probabilities preferred over realized sample means?

A: Realized transition rates in the sample mix transitions for various idiosyncratic reasons that vary substantially across population groups, so raw sample means do not reflect the probability a given individual faces at interview time. The authors estimate probit models separately for job separation (employed sample) and job finding (unemployed sample), including a rich set of covariates — age, gender, education, tenure, firm size, unemployment experience, industry, survey year, and East Germany indicator, among others — and predict individual-level probabilities at the time of the interview. For consistency with the survey’s discrete response format, probit-predicted probabilities are rounded to the nearest decile (0%, 10%, …, 100%). The bias is computed as the individual-level difference between perceived and probit-predicted actual probabilities, averaged over the sample.

Q2. What is the magnitude and direction of the aggregate expectation biases in Germany?

A: Employed workers overestimate job separation by 6.4 percentage points on average (perceived 19.8% vs. actual 13.3%), a pessimistic bias significant at the 1% level. Unemployed workers overestimate job finding by 8.2 percentage points (perceived 57.0% vs. actual 48.8%), an optimistic bias also significant at the 1% level. Both directions are statistically robust across alternative definitions of separation and finding, as well as to trimming extreme responses (0% and 100% answers) and adjusting for directional rounding.

Q3. How large are the East–West differences in expectation biases, and do they survive controls for compositional differences?

A: East German workers exhibit a pessimistic job separation bias of 12.1 percentage points, more than 2.5 times the West German level of 4.7 percentage points, despite actual separation rates being broadly comparable (15.1% vs. 12.8%). For job finding, West Germans are optimistic by 12.9 percentage points while East Germans are optimistic by only 2.0 percentage points, a difference of 10.9 percentage points. The paper states these differences persist after accounting for compositional differences between regions, and are robust across all alternative definitions of job separation (Dismissals, Selected, Spell) and job finding (out of U or O). The table of robustness results (Table 2) confirms that in all specifications, the pessimistic separation bias is substantially larger in the East and the optimistic finding bias is substantially smaller.

Q4. What cohort analysis is conducted to explore the origins of greater East German pessimism?

A: The authors conduct a regression of the individual-level bias on birth-cohort indicators, controlling for age, demographic, and economic characteristics. They find that the pessimistic job separation bias is most pronounced among cohorts born in the 1950s and 1960s — those who experienced adult working life in the communist GDR and lived through reunification — and is smaller for cohorts born before 1950 and substantially smaller for cohorts born after 1970. For job finding, the optimistic bias is comparably low among cohorts born in the 1960s and earlier, but rises significantly for later-born East German cohorts. This cohort pattern is consistent with a long-lasting “experience effect” of communist institutions and the reunification shock on beliefs, analogous to findings in the broader literature on the persistent effects of communism.

Q5. Is there evidence that individuals update their biased expectations over time?

A: To assess learning, the authors use the panel dimension and compute for each individual in two consecutive survey waves the absolute value of the deviation between perceived and actual transition probabilities, then examine the change in this absolute deviation between waves. The histograms of individual-level changes show substantial dispersion but means close to zero in all four sub-groups (East/West, job separation/finding), indicating no systematic convergence of beliefs toward actual rates. Biases are also stable in the time-series dimension, with perceived and actual rates moving largely in parallel across survey waves from 1999 to 2015, leaving the aggregate bias level roughly constant.

Q6. How does the model rule out private information as an alternative explanation for the biases?

A: If biases reflected private information about idiosyncratic risk not captured by observable characteristics, individual-level deviations between perceived and actual rates should predict subsequent realized transitions. The authors add the individual-level deviation as an additional regressor in the probit transition models. The estimated coefficients are statistically significant and positive, but quantitatively negligible: a 1 percentage point higher expected job separation probability is associated with only a 0.001 percentage point higher realized separation probability, and a 1 percentage point higher expected job finding probability with a 0.002 percentage point higher realized finding probability. These magnitudes are too small to materially alter the interpretation of the biases as reflecting systematic expectation errors rather than private information.

Q7. What is the role of contract length T in the model, and what is the critical threshold T*?

A: The wage contract length T determines which of two opposing effects of pessimistic job separation expectations dominates in bargaining. The first (negative wage) effect: a pessimistic worker discounts future wages within the current contract more heavily than the firm does, so the worker values the contract less and accepts a lower wage. The second (positive wage) effect: a pessimistic worker also discounts the continuation value of future contracts more heavily, making it less attractive to remain in the match, so the firm must offer a higher wage to retain the worker. For short contract lengths (T < T*), the second (positive) effect dominates, so the pessimistic bias raises wages. For long contracts (T ≥ T*), the first (negative) effect dominates, so the pessimistic bias depresses wages. The critical threshold T* is the smallest positive integer such that T*/λw(θ) < β times a weighted sum involving σw and T*. Using calibrated parameter values for East Germany, T* = 10 quarters (2.5 years). The baseline contract length is T = 67 quarters (approximately 16.8 years), well above T*, placing the economy in the regime where pessimism depresses wages.

Q8. How does the optimistic job finding bias affect equilibrium wages and unemployment?

A: An optimistic job finding bias (λw > p(θ)) raises the perceived value of unemployment U because workers expect to escape unemployment sooner. A higher value of unemployment raises the worker’s outside option in bargaining, increases the reservation wage, and thereby pushes up the bargained wage. In general equilibrium, the job creation condition (which is unaffected by worker expectations) is unchanged, so the upward rotation of the wage curve reduces labor market tightness θ, raises equilibrium unemployment, and extends average unemployment duration. This comparative static holds unambiguously for any contract length T.

Q9. What are the quantitative results of the counterfactual experiment assigning West German biases to East German workers?

A: The counterfactual assigns West German bias levels (smaller pessimistic separation bias, larger optimistic finding bias) to East German workers while holding all other parameters at East German calibrated values. For the preferred calibration with γ ∈ {0.35, 0.50} and T ∈ {67, 106, 159}, wages in East Germany rise by 1.07 to 2.36 percent. This implies a reduction in the conditional East–West wage gap (23 percent) of 4.6 to 10.6 percent and a reduction in the unconditional gap (30 percent) of 3.6 to 7.9 percent. Equilibrium unemployment in East Germany rises by 0.70 to 1.01 percentage points as a side effect. Net of the unemployment effect, ex-ante unbiased expected lifetime income rises by 0.7 to 1.88 percent, confirming a positive welfare effect of reducing East German pessimism to West German levels. Under the biennial calibration robustness check, wage increases reach up to 3.3 percent, the conditional wage gap narrows by up to 11 percent, and lifetime income rises by up to 2.23 percent.

Q10. How is the bargaining power parameter γ calibrated and why does it matter for the results?

A: The paper considers a range γ ∈ {0.35, 0.50, 0.65}, rather than a single calibrated value, because γ plays a crucial role in the sensitivity of wages to expectation biases. Lower bargaining power reduces the equilibrium wage directly; however, because lower wages spur job creation, the model requires a higher vacancy cost κ to match the empirical job finding rate, which in turn increases the elasticity of wages with respect to the bias (see the wage equation, which shows that the bias effect scales with κθ/p(θ)). The paper argues that γ = 0.65 is inconsistent with the empirical wage–bias relationship estimated in SOEP data (which is negative and about twice as negative in East Germany as in the West), while γ ∈ {0.35, 0.50} is consistent. Lower bargaining power is also argued to be realistic for East Germany given weaker union representation there relative to the West.

Q11. How does the empirical relationship between the job separation bias and wages serve as a model validation target?

A: Using SOEP data, the authors regress log hourly wages on the individual-level difference between perceived and actual job separation rates, controlling for individual fixed effects and other covariates, and allow the slope to differ between East and West Germany. They find a statistically significant and negative relationship in both regions, with the effect approximately twice as large in East Germany as in the West. The estimate implies that if East German workers’ job separation pessimism were reduced to West German levels, hourly wages in the East would be about 1 percent higher. This empirical gradient is used as an external validation check — not a calibration target — to assess which combinations of (γ, T) in the model are quantitatively plausible.

Q12. What does the model predict about the general equilibrium effects on unemployment from reducing East German pessimism?

A: Reducing East German pessimism — both the pessimistic separation bias and the low optimistic finding bias — shifts the wage curve upward in equilibrium. Because the job creation condition is unaffected by worker beliefs (firms have rational expectations), higher wages reduce the firm’s incentive to post vacancies, lowering labor market tightness θ. This leads to higher equilibrium unemployment and longer average unemployment duration. The counterfactual with West German biases implies that East German unemployment would rise by 0.70 to 1.01 percentage points, further widening the approximately 7 percentage point East–West unemployment gap. The authors note this is a welfare-relevant trade-off, but show that the wage gain dominates the unemployment cost in terms of expected lifetime income.

Q13. What robustness checks are performed on the quantitative results?

A: The paper considers (i) a narrower definition of job separation (dismissals only) to match the most likely interpretation of the survey question; (ii) targeting the officially reported East German unemployment rate (14.5% average from the Federal Employment Agency) rather than the SOEP-implied rate of 8.6% as a calibration target; (iii) a biennial calibration frequency instead of quarterly. The main results — wage increases and narrowing of the wage gap — are quantitatively similar across these alternatives, with one exception: the biennial calibration yields substantially larger wage increases (up to 3.3%), a larger reduction in the conditional wage gap (up to 11%), and larger lifetime income gains (up to 2.23%).

Key Concepts

Expectation bias (job separation / job finding). In this paper, a bias in expectations is defined as a systematic average difference between an individual’s perceived transition probability and the actual (statistically predicted) transition probability for their demographic and job group. A pessimistic job separation bias means workers overestimate the probability of losing their job (σw > σ); an optimistic job finding bias means unemployed workers overestimate the probability of re-employment (λw > p(θ)). Biases are not attributed to private information but to systematic expectation errors.

Actual (statistical) transition probability. The paper defines actual transition probabilities not as raw sample transition rates but as individual-level predicted probabilities from probit models estimated on realized transitions within 24 months, conditional on a comprehensive set of individual, job, and employer characteristics observed at interview time. These are rounded to the nearest decile for comparability with the survey’s discrete response format.

Wage contract length (T). The contract length T is the number of periods for which a bargained wage is fixed before the match parties re-bargain. A job match consists of a sequence of consecutive wage contracts of length T. The paper departs from the standard DMP assumption of period-by-period bargaining (T = 1) and shows that T is central to how job separation expectations feed into the bargained wage. A permanent job approximates T → ∞.

Critical contract length (T).* A theoretically derived threshold: the pessimistic job separation bias raises equilibrium wages for contract lengths T < T* and depresses wages for T ≥ T*. Specifically, T* is the smallest positive integer such that T*/λw(θ) < β times a weighted sum involving β, σw, and T*. In the East German calibration, T* = 10 quarters.

Generalized Nash bargaining with common knowledge / agree to disagree. The model assumes that both the worker and the firm know each other’s perceived values of the job match and outside options and accept them as the basis for bargaining, even though they differ. Workers use their biased perceived transition rates to value employment and unemployment; firms use actual rates. There is no private information. The paper refers to this as workers and firms “agreeing to disagree.”

Ex-ante unbiased expected lifetime income (EI_{W,U}). A welfare measure defined as the present discounted value of income for an individual entering the economy, computed at actual (unbiased) job separation and job finding probabilities rather than at workers’ perceived (biased) rates. This measure captures the net welfare effect of changing expectation biases because it correctly accounts for actual employment transitions, even though the behavioral responses in equilibrium are driven by biased perceptions.

Effective discount factor (β(1 − σw)). When a worker holds pessimistic job separation expectations, future payoffs within the current contract are discounted not at the pure time discount factor β but at β(1 − σw), which is smaller when σw is larger. A more pessimistic worker therefore effectively discounts future wage payments more steeply, and this differential discounting relative to the firm (which uses β(1 − σ)) is the key mechanism generating the contract-length dependence of the wage effect.

Borrowing and Spending in the Money: Debt Substitution and the Cash-Out Refinance Channel of Monetary Policy

Mon, 01 Jan 0001 00:00:00 +0000

Overview

Research Question. Does monetary policy stimulate household borrowing and consumption by enabling cash-out mortgage refinancing (“the cash-out refinance channel”), or does it primarily induce substitution across borrowing products without meaningfully changing total new household borrowing?

Motivation. Prior work (Eichenbaum, Rebelo and Wong 2022; Berger et al. 2021) interprets the strong positive correlation between a borrower’s refinance incentive and cash-out refinancing as evidence of a potent, path-dependent monetary policy transmission channel: when rates fall below a borrower’s outstanding mortgage rate (“in-the-money”), the incentive to refinance generates large cash-out activity and consumption. This interpretation presumes that mortgages are effectively the only household borrowing product and that cash-out refinancing reflects a stimulated demand for new borrowing.

Alternative Hypothesis. The authors argue instead that households have inelastic, exogenous liquidity needs (for consumption smoothing, housing repairs, health shocks, etc.) and satisfy those needs using whichever borrowing product is cheapest given the rate environment. When mortgage rates fall below a borrower’s outstanding rate, cash-out refinancing becomes the least-cost vehicle, so borrowers shift from credit cards, HELOCs, personal loans, and second liens (closed-end seconds) toward cash-out refinancing—substituting borrowing products rather than expanding total borrowing.

Data. The authors use the Equifax Credit Risk Insight Servicing McDash (CRISM) dataset, which anonymously matches credit bureau records to mortgage servicing data (McDash). The main sample is a 16.5% draw of fixed-rate, first-lien mortgage loans observed at monthly frequency during 2013, yielding approximately 35 million loan-month observations. For the long time-series analysis, the full 2006–2021 sample is used. Borrowing events are identified across five credit instruments: cash-out refinance, HELOC, closed-end second (CES), credit card, and personal loan, each requiring at least $5,000 in new credit.

Identification Strategy. The paper uses two complementary approaches to address the endogeneity of mortgage rates and borrower refinance incentives.

Taper Tantrum quasi-experiment (main): In late spring 2013, two FOMC communication events triggered an approximately 80 basis-point increase in the 30-year fixed mortgage rate over the course of one month. Critically, because the shock arose from changes in long-term rate expectations (LSAPs), short-term rates—and thus HELOC and consumer credit rates—were largely unchanged. The authors exploit cross-sectional variation in pre-Taper “rate gaps” (outstanding mortgage rate minus estimated current market rate) using a difference-in-differences design (equation 6) to compare how cash-out and alternative borrowing change after the shock for borrowers with different pre-existing refinance incentives.
Monetary policy surprise IV (2006–2021): Following Berger et al. (2021), the authors instrument for the aggregate share of borrowers with rate gaps between 0 and 2 percentage points using the Bu, Rogers and Wu (2021) (BRW) unified measure of Fed monetary policy shocks, which spans both conventional and unconventional policy. This approach tests whether substitution persists when both long and short rates move together.

Main Findings.

Extensive margin (probability of borrowing): After the Taper Tantrum, the monthly probability of cash-out refinancing declines for all rate gap bins, most strongly for borrowers pushed out of the money by the rate increase (a roughly 0.0012 percentage-point monthly probability decline—more than 85 percent below baseline—for borrowers with pre-Taper rate gaps of approximately 1 percent). Simultaneously, the probability of other borrowing (HELOCs, credit cards, personal loans, CES) rises in a near-mirror image, especially for borrowers at intermediate rate gaps. The combined effect on total borrowing probability is negligible and shows little variation with rate gap.
Intensive margin (amount borrowed conditional on borrowing): Conditional on a cash-out refinance occurring after the Taper, the average extraction amount increases, consistent with a borrower-selection effect: low-liquidity-need borrowers, who face the highest effective borrowing cost increase when they move out of the money, disproportionately exit cash-out refinancing, leaving behind a pool of high-liquidity-need borrowers. For borrowers with pre-Taper rate gaps of around 1 percent, the conditional cash-out amount rises about 20 percent after the Taper.
Aggregate borrowing elasticity: Combining extensive and intensive margin estimates via a hurdle model, a 1 percentage-point increase in mortgage rates reduces total new household borrowing by between 0 and 8 percent (the aggregate borrowing elasticity is not statistically significantly different from zero at the preferred estimate, with a lower-bound of −8 percent), compared with a cash-out probability elasticity of approximately −45 percent in absolute terms.
Debt paydown: About 10–12 percent of new mortgage debt from cash-out refinances is used to pay down other outstanding debt, and this share is constant across rate gap groups and is not affected by the Taper, implying the MPC from cash-out borrowing does not vary with the rate environment.
Conventional monetary policy: Using the BRW IV over 2006–2021, the IV first stage yields an F-statistic of approximately 11. The cash-out extensive margin responds positively to the in-the-money share (elasticity 3.5 in IV), while other borrowing responds negatively (elasticity −0.87 in IV), and the all-borrowing elasticity is 0.09 and statistically insignificant. The intensive margin results are directionally consistent: conditional cash-out amounts fall as more borrowers are in the money, while total borrowing amounts respond positively (but insignificantly). Substitution thus holds even when both long and short rates move together.

Implications for Path Dependence. Because out-of-the-money borrowers substitute toward non-cash-out products, the non-linear dependence of cash-out refinancing on the distribution of outstanding mortgage rates does not translate into a correspondingly path-dependent total borrowing response. A back-of-the-envelope calculation using standard MPC assumptions (100 percent for cash-out, 80 percent for rate-term savings) and empirical refinancing frequencies and amounts (average first-lien equity extraction of $40,000 vs. average annual payment savings of $3,000 from rate-term refinancing, with rate-term frequency about 1.5x higher and semi-elasticity about 2x larger) implies that the potential near-term consumption stimulus from cash-out refinancing is approximately 5.5 times larger than from rate-term refinancing—making cash-out the dominant channel in principle. But because debt substitution substantially offsets the interest-rate sensitivity of cash-out refinancing, and because the path dependence of cash-out refinancing is largely eliminated by borrower substitution, the paper concludes that the overall path dependence of monetary policy is weaker than suggested by Berger et al. (2021) and Eichenbaum, Rebelo and Wong (2022).

In depth

Q1. What is the “rate gap” and why does it capture the cash-out refinance incentive?

The rate gap is defined as a borrower’s outstanding fixed mortgage rate minus an estimate of the 30-year fixed mortgage rate currently available to that borrower if they were to refinance (estimated from a regression of origination-period rates on LTV, credit score, loan type, investor type, and month fixed effects). A positive rate gap means the borrower is “in the money” for a rate-term refinance: they can reset their existing mortgage at a lower rate. The rate gap captures the degree of refinance incentive because resets the interest cost on the entire outstanding balance. Cash-out refinancing is especially attractive when the rate gap is positive because the rate reduction on the existing balance partially subsidizes the new borrowing, lowering its effective cost relative to alternative products.

Q2. What is the conceptual model of debt substitution the authors propose?

The authors model a homeowner with an inelastic liquidity need l that arrives with probability λ. The borrower can satisfy this need through a cash-out refinance at mortgage rate r_m (resetting their entire mortgage at r_m, which implies an interest cost on the existing balance) or through an alternative product at rate r_a > r_m. The key trade-off is that a cash-out refinance saves on the rate for the liquidity need itself but incurs a cost or benefit depending on whether r_m exceeds or falls below the outstanding rate r_0. When the rate gap is negative (r_0 < r_m), the cash-out refinance penalizes the borrower on the existing balance; when the gap is positive (r_0 > r_m), it saves on the existing balance, further lowering the effective cost of the liquidity need. The model predicts that: (i) the probability of cash-out refinancing is nonlinear and step-like in the rate gap; (ii) the probability of alternative borrowing has the opposite pattern; (iii) higher mortgage rates raise the conditional cash-out amount through selection (low-l borrowers exit cash-out); and (iv) total borrowing is relatively insensitive to mortgage rates.

Q3. How does the Taper Tantrum provide exogenous variation, and what are its limitations?

The Taper Tantrum began in late spring 2013 when two FOMC communication events—Chairman Bernanke’s congressional testimony and the subsequent FOMC meeting—shifted market expectations about the pace of tapering large-scale asset purchases (LSAPs). The 30-year fixed mortgage rate rose approximately 80 basis points within one month, driven by changes in long-term rate expectations. Because the shock was unanticipated and FOMC did not announce any concrete policy change, the scope for a “Fed information effect” biasing results is limited. The critical limitation is that the Taper Tantrum affected primarily long-term rates: HELOC rates and consumer credit rates (tied to the federal funds rate and bank prime rate, which were unchanged) were little affected. This means the estimated substitution elasticity holds when the rate spread between mortgage and alternative products widens, which is more directly applicable to unconventional monetary policy (LSAPs) than to conventional policy that moves rates across the full yield curve.

Q4. What do the Taper Tantrum extensive margin results show, and what pattern confirms substitution?

Figure 4 plots the difference-in-differences coefficient β₂ + β₃ by pre-Taper rate gap bin for three outcome variables. The cash-out refinancing probability (blue line) declines for all rate gap bins, most sharply for intermediate rate gap values (borrowers pushed out of the money by the Taper). Borrowers with pre-Taper rate gaps of ~1 percent experience a decline in monthly refinancing probability of about 0.0012, or more than 85 percent below their baseline rate. Other borrowing (black line) shows an almost exact mirror-image pattern: it rises after the Taper, most strongly for the same intermediate rate gap borrowers. The total borrowing probability (red line) shows essentially no response and little variation across rate gap groups, implying substitution nearly completely offsets the cash-out decline.

Q5. How do the intensive margin results for cash-out refinancing compare to the extensive margin, and what explains the difference?

After the Taper, the conditional cash-out amount rises (the intensive margin effect is positive), while the cash-out probability falls (the extensive margin effect is negative). These opposite signs are consistent with borrower selection: borrowers with small liquidity needs face the steepest increase in effective borrowing cost when they move out of the money and so disproportionately exit cash-out refinancing, raising the average extraction amount among those who remain. For borrowers with pre-Taper rate gaps of ~1 percent, the conditional cash-out amount rises approximately 20 percent after the Taper. Figure 6 corroborates this by showing the increase in average extraction is driven by a sharp decline in small extraction amounts (relative to outstanding balance).

Q6. How is the aggregate borrowing elasticity computed and what does it imply about monetary policy transmission?

The authors combine extensive and intensive margin estimates using a two-tiered (hurdle) model that allows the decision to borrow and the decision of how much to borrow to respond differently to covariates. The total expected borrowing amount is the product of the estimated borrowing probability and the expected conditional borrowing amount. Pre- and post-Taper aggregate predicted borrowing is calculated for each rate gap group, and the percentage change is divided by the 80 basis-point rate increase to produce a semi-elasticity. The aggregate borrowing elasticity is not statistically significantly different from zero at the main estimate, and the lower-bound estimate (which avoids reliance on the Post dummy for aggregate borrowing) is at most −8 percent per percentage-point increase in rates. This compares with a cash-out probability elasticity of approximately −45 percent, illustrating that substitution accounts for the overwhelming majority of the observed cash-out response.

Q7. Why is the BRW monetary policy shock IV important for generalizing the Taper Tantrum findings?

The Taper Tantrum moved only long rates, whereas conventional monetary policy moves both long and short rates. When short rates rise, the alternative borrowing products (HELOCs, credit cards, personal loans) become more expensive, which could dampen substitution in two ways: (a) the rate spread between mortgage and alternative products narrows, reducing the range of borrower-amount combinations for which substitution makes financial sense; and (b) higher absolute borrowing costs on alternative products may reduce total borrowing among borrowers who would otherwise substitute. The BRW IV, which spans 2006–2021 and reflects shocks to the full yield curve (conventional and unconventional), addresses whether substitution holds when both rate types move. The IV results in Table II (F-statistic ~11) confirm that the cash-out probability elasticity is 3.5 (IV), the other-borrowing elasticity is −0.87 (IV), and the all-borrowing elasticity is 0.09 and statistically insignificant, broadly consistent with the Taper Tantrum findings.

An event study finds that total household debt increases by about 88 percent of the increase in mortgage balance in the first two months after a cash-out refinance, implying approximately 12 percent debt paydown; by six months out, the net paydown stabilizes at around 8 percent. Crucially, this share is constant across rate gap groups and does not change after the Taper Tantrum. This constancy implies that the marginal propensity to consume (MPC) out of cash-out refinances does not vary with the rate environment, and therefore the path-dependence of the cash-out channel cannot be attributed to compositional changes in how borrowers use extracted funds.

Q9. Why does the paper argue cash-out refinancing has far greater near-term consumption potential than rate-term refinancing, and what are the implications for path dependence?

A back-of-the-envelope calculation uses: (1) empirical frequencies (rate-term refinance probability is ~1.5x higher than cash-out); (2) near-term liquidity per event (average first-lien cash-out extraction ~$40,000 vs. annual payment savings ~$3,000 from rate-term); (3) semi-elasticities (rate-term has ~2x higher semi-elasticity to rates than cash-out per the IV estimates); and (4) standard MPC assumptions (100% for cash-out, 80% for rate-term savings). The calculation implies the consumption stimulus potential from cash-out refinancing is approximately 5.5 times that of rate-term refinancing per percentage-point change in rates. Because the paper shows the path-dependence of cash-out refinancing is largely offset by substitution, and because cash-out is the dominant near-term channel, the overall path-dependence of monetary policy is weaker than prior models predict.

Q10. What are the key robustness checks and how do they address potential confounds?

Three main robustness exercises are reported. First, a QE1 robustness (Appendix) uses the large decline in mortgage rates after the first LSAP announcement in 2008 as an alternative shock, finding consistent substitution patterns (households shift into cash-out refinancing from other borrowing when pushed into the money). Second, a placebo test shifts the sample back six months and estimates the same specification over the twelve months preceding the Taper; Figure 8 shows no differential substitution by rate gap during this stable-rate period, supporting the interpretation that the Taper Tantrum rate increase drives the cross-sectional substitution pattern. The placebo does reveal a negative Post dummy for other borrowing, consistent with a possible pre-trend in other borrowing, which motivates the lower-bound elasticity calculation that avoids reliance on this coefficient. Third, the authors show that results are little changed when adjustable-rate mortgages (~10 percent of outstanding mortgages in 2013) are included in the sample.

Key Concepts

Rate Gap: The difference between a borrower’s outstanding fixed mortgage rate and the estimated current 30-year fixed mortgage rate available to that borrower if they were to refinance (adjusting for borrower-specific LTV and credit score). A positive rate gap means the borrower is “in the money” for a rate-term refinance. This is the paper’s central measure of refinance incentive, determining whether cash-out refinancing or an alternative borrowing product is the cost-minimizing option for satisfying a given liquidity need.

Debt Substitution: The paper’s core mechanism: households shift their new borrowing across products (cash-out refinance, HELOC, CES, credit card, personal loan) in response to changes in relative borrowing costs, without proportionally changing total new borrowing. When the rate gap is positive, cash-out refinancing is the cheapest way to borrow (it lowers the rate on the existing balance while providing liquidity), so borrowers substitute from alternative products into cash-out. When the rate gap is negative or mortgage rates rise, borrowers substitute in the opposite direction, keeping their original mortgage rate intact by using alternative products.

Cash-Out Refinance Channel of Monetary Policy: The theoretical transmission mechanism by which monetary easing lowers mortgage rates, incentivizes in-the-money borrowers to refinance and extract home equity at reduced cost, and thereby stimulates consumption. Prior literature (Eichenbaum, Rebelo and Wong 2022) treats this channel as path-dependent and quantitatively important because it depends on the distribution of outstanding mortgage rates.

Path Dependence of Monetary Policy: The property by which the same monetary policy shock generates different aggregate borrowing or consumption responses depending on the historical distribution of outstanding fixed mortgage rates, which reflects prior monetary policy. A large share of in-the-money borrowers (due to a prior rate-cutting cycle) amplifies the cash-out refinance channel; a large share of out-of-the-money borrowers weakens it. The paper shows this path dependence is substantially attenuated by debt substitution.

In-the-Money Borrower: A borrower whose outstanding mortgage rate exceeds the current market mortgage rate (positive rate gap), creating a financial incentive to refinance. In-the-money status interacts with borrowing product choice because a cash-out refinance resets the interest cost on the entire existing balance, generating implicit savings that partially subsidize new liquidity extraction.

Hurdle (Two-Tiered) Model: An estimation approach that allows the decision to borrow (extensive margin) and the amount borrowed conditional on borrowing (intensive margin) to respond differently to covariates. The authors use this model to combine extensive and intensive margin estimates into a single aggregate borrowing elasticity, avoiding the distortion that arises from using dollar volume as a dependent variable when intensive and extensive margins have opposite responses to the rate gap.

Taper Tantrum (2013): A quasi-experimental shock used as the paper’s main source of exogenous variation. In late spring 2013, Federal Reserve communications about tapering large-scale asset purchases (LSAPs) caused the 30-year fixed mortgage rate to increase approximately 80 basis points within one month. Because the shock operated through long-term rate expectations, it moved mortgage rates without significantly affecting HELOC or consumer credit rates (tied to the unchanged federal funds and bank prime rates), enabling the authors to estimate substitution holding alternative product rates approximately fixed.

Bridging micro and macro production functions: The fiscal multiplier of infrastructure investment

Mon, 01 Jan 0001 00:00:00 +0000

Layer 1 — Overview

Research Question

This paper investigates the fiscal multiplier of infrastructure investment, specifically by incorporating firm-level investment decisions — a dimension absent from prior literature. The central analytical challenge is bridging the micro (firm-level) and macro (state-level) production functions for infrastructure, given that public capital is non-rivalrous: it can be used simultaneously by all firms without being depleted. The paper demonstrates that this non-rivalry generates a systematic discrepancy between firm-level and aggregate-level estimates of the elasticity of substitution between private and public capital, and it shows how this discrepancy shapes the magnitude of the fiscal multiplier.

Data and Methodology

The authors build and estimate a heterogeneous-firm general equilibrium model. Firms operate a constant-elasticity-of-substitution (CES) production function using private capital, non-rivalrous public capital (infrastructure), and labor. Firms are subject to idiosyncratic productivity shocks and make lumpy investment decisions subject to both fixed and convex capital adjustment costs, following Cooper and Haltiwanger (2006) and Winberry (2021). The economy has two regions — one with poor infrastructure and one with good infrastructure — motivated by the near-invariant cross-state distribution of infrastructure spending observed in U.S. data.

The model is estimated via an extended Simulated Method of Moments (SMM) that treats market clearing prices as additional parameters estimated simultaneously with structural parameters, reducing computational cost relative to standard GE estimation. Estimation uses a multi-block Metropolis-Hastings algorithm. Target moments include lumpy investment fraction (0.14, from Zwick and Mahon 2017), average investment-to-capital ratio (0.10), standard deviation of i/k (0.16), private-to-infrastructure capital ratio (0.75, from BEA), high-infrastructure region’s private capital share (0.83, from Census BDS), and total working hours (0.33).

The identification of the key parameter — the firm-level elasticity of substitution between private and public capital (λ) — comes from the relative size of private capital stocks across the two infrastructure groups: under greater complementarity, regions with more infrastructure should hold relatively more private capital.

External validation is provided by estimating the state-level elasticity from the model’s simulated data using a nonlinear least squares method following An et al. (2019), and comparing it to empirical state-level estimates from actual U.S. state data.

Main Findings with Quantitative Magnitudes

Firm-level vs. aggregate-level elasticity gap. The estimated firm-level elasticity of substitution is λ = 1.185, implying gross substitutability between private and public capital at the firm level. The state-level elasticity implied by the same model is 0.48 (or 0.35 in a decreasing-returns-to-scale specification), implying gross complementarity. The empirical state-level counterpart estimated from actual U.S. data is 0.445. The paper proves theoretically (Proposition 1) that, given non-rivalry and under mild conditions, firm-level gross substitutability implies aggregate-level gross complementarity. Proposition 2 further shows that this same mechanism micro-founds the increasing-returns-to-scale assumption in Baxter and King’s (1993) Cobb-Douglas aggregate production function.
Fiscal multiplier (baseline, 2-year horizon). The aggregate output multiplier over a 2-year horizon in the heterogeneous-firm general equilibrium model is 1.088 in response to a one-time unexpected infrastructure spending shock equal to 1% of steady-state GDP, financed by a lump-sum tax. The corresponding partial-equilibrium output multiplier (holding prices fixed at steady state) is 1.858; the gap reflects crowding out of private investment induced by the general equilibrium interest rate response. In the baseline, the interest rate rises by 0.39% after the shock; the investment multiplier is -0.043.
Comparison with representative-agent model. When the same implied returns-to-scale parameters are used in a representative-agent model (following Baxter and King 1993), the output multiplier is 0.991 and the investment multiplier is -0.157, both substantially lower than the heterogeneous-firm baseline. The key mechanism: under convex adjustment costs, the Jensen’s inequality effect implies that heterogeneous firms face a greater average adjustment burden than the representative firm, making their investment less responsive to the general equilibrium crowding-out pressure.
Sensitivity to elasticity of substitution. Across the heterogeneous-firm model: at λ = 3 (high substitutability), the output multiplier falls to 0.672; at λ = 0.5 (complementarity), it rises to 1.364. The multiplier is significantly more sensitive to λ in the heterogeneous-firm model than in the representative-agent model, because non-rivalry amplifies the effect of any given elasticity value through each firm’s production function.
Cross-state distribution of gains. Under the baseline spending allocation (81% to Good states, 19% to Poor states), per $1 of infrastructure spending, Good states receive $1.072 of the $1.088 total output gain, while Poor states receive only $0.016. In a counterfactual with equal spending across states, the total output multiplier falls to 0.873, Good states’ output multiplier falls to 0.810, and Poor states’ output multiplier rises to approximately 0.062 (about four times the baseline level of 0.016). This quantifies a sharp efficiency-equality trade-off in the allocation of infrastructure investment.
Employment and earnings effects. Compared to steady state, the baseline fiscal shock produces an average annual increase of 0.304% in employment and 0.389% in wages, yielding a $0.713 increase in earnings and a $0.148 increase in consumption per $1 of fiscal spending in general equilibrium. In partial equilibrium (no price changes), earnings increase by $1.294 and consumption by $0.605 per $1 spent.

Scope Conditions

Results are conditional on: (i) lump-sum tax financing of the fiscal shock; (ii) a one-time unexpected (MIT) shock with no persistence; (iii) a closed-economy framework with endogenous real interest rate; (iv) the estimated two-region structure calibrated to U.S. state-level infrastructure data; (v) firm-level investment dynamics calibrated to Compustat and BDS moments. The authors note that incorporating time-to-build assumptions (tested in an appendix) reduces the aggregate fiscal multiplier, consistent with Ramey (2020).

In depth

Q1. What is the core theoretical result connecting firm-level and aggregate-level elasticities, and what is the intuition?

A: Proposition 1 proves that, given non-rivalrous public capital and mild data conditions (at least one firm has private capital below total infrastructure, and aggregate private capital exceeds total infrastructure), if the firm-level elasticity of substitution λ ≥ 1 (gross substitutes), then the aggregate-level elasticity ξ < 1 (gross complements). The intuition is that a marginal increase in public capital raises the marginal product of private capital for every firm simultaneously due to non-rivalry; the sum of these MPK gains across all firms exceeds any single firm’s gain. To represent this amplified benefit within an aggregate production function, a stronger complementarity is required than what any single firm faces. Put differently, non-rivalry means aggregate private and public capital “look” more complementary than they truly are at the firm level.

Q2. How does non-rivalry micro-found the Baxter-King aggregate production function?

A: Proposition 2 shows that if firms use a CES production function with gross substitutability (λ ≥ 1) and non-rivalrous public capital, then fitting aggregate output with a Cobb-Douglas production function (as in Baxter and King 1993, H(K,N,L) = zK^α L^{1-α} N^ζ) yields ζ > 0, implying increasing returns to scale (IRS). This is the paper’s micro-foundation for a widely-used but previously ad hoc assumption in the macro-fiscal literature. The corollary states that both gross complementarity in the aggregate CES function and IRS in the aggregate Cobb-Douglas follow from the same non-rivalry mechanism at the firm level.

Q3. Why does the heterogeneous-firm model produce a higher output multiplier than the representative-agent model?

A: Two mechanisms drive the difference. First, due to Jensen’s inequality and the convexity of adjustment costs, heterogeneous firms face a higher average adjustment burden than the representative (average) firm; this means heterogeneous firms are less responsive to interest rate changes that crowd out investment. The investment multiplier is -0.043 in the heterogeneous-agent baseline versus -0.157 in the representative-agent model. Second, the fixed adjustment cost (present in the baseline but absent from the representative-agent model) further dampens investment sensitivity via the extensive margin. Because less private investment is crowded out, more of the direct output boost from infrastructure spending survives into the aggregate multiplier, yielding 1.088 versus 0.991.

Q4. What is the novel estimation procedure and why is it necessary?

A: Standard SMM applied to GE models requires solving for market-clearing prices for every candidate parameter vector, creating a nested optimization loop that is computationally prohibitive. The authors extend SMM by treating market-clearing prices (wage w and marginal utility of consumption p) as additional parameters and appending market-clearing conditions as additional target moments — effectively requiring those moments to equal zero. A multi-block Metropolis-Hastings algorithm jointly draws from the price block and the parameter block. This approach generates posterior draws that simultaneously satisfy market clearing and fit empirical moments, without the inner loop. The resulting market-clearing accuracy is e^{-4} at the posterior mean.

Q5. How is the firm-level elasticity of substitution (λ) identified from the data?

A: λ is identified from the cross-state difference in private capital stocks between high- and low-infrastructure regions. Under the model, if private and public capital are more complementary (lower λ), high-infrastructure regions should attract relatively more private capital. The data moment used is the Good region’s share of aggregate private capital (0.83 from Census BDS data). This identification strategy is analogous to Bartik-instrument approaches in the empirical literature, where a parameter governing cross-state sensitivity to aggregate shocks is identified from cross-sectional variation.

Q6. How is the model validated externally?

A: The authors compute the state-level elasticity from the estimated model by fixing firm-level parameters and re-estimating only the elasticity and regional productivity from the model’s simulated state-level data, using the same NLLS estimator as An et al. (2019). The model-implied state-level elasticity is 0.349 (DRS specification) or 0.482 (CRS specification). The empirical estimate from actual U.S. state-level data following the same estimator is 0.445. Both indicate gross complementarity at the state level, consistent with the theoretical prediction. This external validation is not used in the estimation itself, providing an independent check.

Q7. What are the roles of extensive vs. intensive investment margins in the crowding-out effect?

A: Table 9 decomposes the investment multiplier of -0.043 by investment margin. When only the extensive margin (the discrete decision of whether to invest) is allowed to respond, the investment multiplier is -0.032 — approximately 74% of the baseline crowding-out effect. When only the intensive margin (investment size conditional on adjusting) responds, the multiplier is -0.011 — about 25% of the total. Thus the extensive margin is the dominant channel through which higher interest rates crowd out private investment. When both margins are held fixed, the output multiplier rises to 1.139, confirming that investment crowding-out reduces the output multiplier by about 0.05.

Q8. How does the elasticity of substitution affect the fiscal multiplier quantitatively, and why does this matter more in the heterogeneous-firm model?

A: In the heterogeneous-firm GE model: λ = 3 gives an output multiplier of 0.672, λ = 1.185 (baseline) gives 1.088, and λ = 0.5 gives 1.364 — a range of 0.692. In the representative-agent model, the comparable range across the implied ζ values is much narrower (0.970 to 0.998). The amplification in the heterogeneous-firm model occurs because non-rivalry means each firm’s production function directly incorporates the public capital stock, so the elasticity parameter has first-order consequences for every firm’s investment incentive response to a fiscal shock. This heightened sensitivity underscores why accurately estimating λ at the firm level — rather than importing a state-level estimate — is critical for quantifying infrastructure multipliers.

Q9. What is the efficiency-equality trade-off in cross-state infrastructure allocation?

A: Under the baseline allocation (81% of infrastructure spending to Good states, 19% to Poor states), per $1 of infrastructure spending, the Good states receive $1.072 of output gains and Poor states receive only $0.016. In the equal-spending counterfactual, the total output multiplier falls from 1.088 to 0.873. The Poor states’ output multiplier rises from $0.016 to $0.062 (approximately fourfold), while the Good states’ falls from $1.072 to $0.810. The Poor states also see earnings multipliers more than double (from $0.017 to $0.042). This trade-off arises because Good states have both more private capital (benefiting from non-rivalry) and higher estimated TFP — so each dollar of infrastructure is more productive there. Equal allocation reduces aggregate efficiency while partially mitigating regional inequality.

Q10. How do the paper’s multiplier estimates compare to the existing literature?

A: In partial equilibrium (no GE adjustment), the authors find an output multiplier of 1.858, consistent with Chodorow-Reich’s (2019) cross-sectional multiplier of approximately 1.8. Once the general equilibrium interest rate effect is included, the multiplier falls to 1.09, which falls within the 0.6-1.2 range from Ramey (2011). Literature using representative-agent models without non-rivalry (e.g., Ramey 2020) typically reports multipliers of 0.3 to 0.8 using returns-to-scale parameters of 0.07-0.12; the paper shows these correspond to fiscal multipliers of 0.847-0.882 in the representative-agent framework. The heterogeneous-firm model, once it incorporates the non-rivalry-corrected elasticities, yields a meaningfully higher multiplier of 1.088.

Q11. What role does time-to-build play, and how does the paper handle it?

A: The baseline model assumes a time-to-build period s = 1 year (one-year lag before new infrastructure is productive). The paper notes in Appendix H that incorporating extended time-to-build reduces the aggregate fiscal multiplier, operating through two channels: a news effect (agents adjust behavior upon anticipating future infrastructure) and a general equilibrium effect endogenous to the news effect. This finding is consistent with Ramey (2020). The baseline results are therefore reported under the minimal one-year time-to-build assumption, with longer lags serving as a robustness check.

Q12. What is the role of region-specific TFP heterogeneity in the model?

A: The model includes two regions that differ both in infrastructure levels and in region-specific productivity (TFP) levels. The TFP of the Good region is estimated to be approximately double that of the Poor region (x = 2.064 for Good vs. 1 for Poor). This productivity difference is estimated to partially capture heterogeneous congestion effects (which are not separately modeled) and is estimated jointly with the infrastructure elasticity. The productivity differential is identified from the Good region’s share of aggregate output (0.849 in the data). The large TFP gap is also the reason why equal spending on Poor states generates a much smaller output gain than spending on Good states: not only is infrastructure utilization lower (fewer firms), but underlying productivity is also lower.

Key Concepts

Non-rivalry of public capital: The property by which infrastructure stock (Nj,t) enters each firm’s production function at the full regional level, not divided among firms. Formally, a single marginal unit of public capital raises every firm’s marginal product of private capital simultaneously, so the aggregate marginal product gain summed across firms exceeds any single firm’s gain. This is the central mechanism driving the micro-macro elasticity discrepancy in the paper.

Firm-level elasticity of substitution (λ): The elasticity governing the degree of substitutability between private capital (k) and public infrastructure (N) in the firm’s CES production function. At λ = 1 the production function is Cobb-Douglas; λ > 1 is gross substitutability; λ < 1 is gross complementarity. In the paper’s estimation, λ = 1.185, meaning private and public capital are gross substitutes at the firm level.

Gross substitutability vs. gross complementarity: Two inputs are gross substitutes (complements) if an increase in the quantity of one raises (lowers) the demand for the other, holding output price fixed. In the paper’s framework, private and public capital are gross substitutes at the firm level (λ = 1.185 > 1) but gross complements at the state level (ξ ≈ 0.48 < 1), with non-rivalry explaining the inversion upon aggregation.

Convex adjustment cost: A cost C(I,k) = (µ/2)(I/k)² · k that scales quadratically with the investment rate. In the heterogeneous-firm model, this cost plays a critical role: by Jensen’s inequality, heterogeneous firms’ average adjustment burden under a convex cost exceeds that of the representative (average) firm, making aggregate investment less sensitive to interest rate changes and thereby dampening crowding out.

Fixed adjustment cost (ξ): A one-time overhead cost drawn from a uniform distribution [0, ξ̄], paid only when a firm makes a large-scale investment outside the “inaction band” [−νk, νk]. This cost generates lumpy investment at the firm level, with about 14% of firms making lumpy investments in any given year. It also creates an extensive margin of investment adjustment that accounts for approximately 74% of the baseline crowding-out effect.

Fiscal multiplier (as defined in this paper): The ratio of the present value of aggregate output deviations from steady state to the present value of the fiscal spending shock, both summed over a T-year horizon. For the short run, T = 2 years; for the long run, T = 5 years. This is computed as a perfect-foresight transition path response to a one-time MIT shock equal to 1% of steady-state GDP.

MIT shock (one-time unexpected shock): An unanticipated, non-persistent one-period deviation in infrastructure spending. The term “MIT shock” refers to a deterministic transition experiment where agents have perfect foresight about all future values after the initial shock occurs. This contrasts with persistent policy rules and allows isolating the dynamic effects of a one-time fiscal impulse.

Extended SMM with market-clearing moments: The paper’s estimation innovation. Rather than solving for market-clearing prices at each parameter candidate (the standard costly inner loop), wages (w) and marginal utility of consumption (p) are treated as parameters with associated moments being the market-clearing conditions set to zero. A multi-block Metropolis-Hastings algorithm draws from the price block and the parameter block separately, generating posterior draws that jointly satisfy market clearing and empirical moment conditions.

Can Deficits Finance Themselves?

Mon, 01 Jan 0001 00:00:00 +0000

The paper asks whether a government can run a deficit today — issuing “stimulus checks” — and allow debt to return to its initial level without any future tax hike or spending cut. In environments combining (i) nominal rigidity and (ii) a violation of Ricardian equivalence (due to finite lives or liquidity constraints), this is possible through two complementary self-financing channels: (a) a Keynesian boom in real activity that expands the tax base and automatically raises revenue at existing tax rates; and (b) a surge in inflation that erodes the real value of outstanding nominal government debt. The paper’s headline result is that self-financing increases monotonically as fiscal adjustment is delayed, converging to full self-financing in the limit: if monetary policy does not lean too heavily against the fiscal stimulus, the initial deficit eventually returns debt to trend with no required future adjustment. Calibrated to empirical evidence on intertemporal MPCs, the speed of fiscal adjustment, the Phillips curve slope, and the monetary reaction, the model finds self-financing up to ν ≈ 0.95 — with the tax base channel dominant and inflation contributing negligibly.

Environment (Section 2): Baseline is a perpetual-youth overlapping-generations (OLG) version of the textbook New Keynesian model. Households survive from one period to the next with probability ω ∈ (0,1]; when ω=1 the model reduces to the standard PIH-RANK benchmark in which Ricardian equivalence holds and no self-financing occurs. When ω<1, two properties of consumer demand emerge: (i) consumers discount future disposable income at a rate higher than the interest rate (“discounting”), so a distant future tax hike barely affects today’s spending; (ii) consumers spend transfers relatively quickly (“front-loading”), so the Keynesian boom plays out before the promised tax hike arrives. The supply block is exactly the standard NKPC. Fiscal policy follows a rule in which taxes respond to income through a fixed tax rate τy (tax base channel) and to debt through a speed-of-adjustment coefficient τd ∈ (0,1) (with τd→0 meaning indefinitely delayed adjustment). Monetary policy keeps (expected) real rates constant in the baseline — a “neutral” benchmark that neither offsets nor amplifies the fiscal stimulus.

Self-financing result (Sections 3–4): Starting from a date-0 deficit shock ε0 (lump-sum transfer of 1% of steady-state output), define the degree of self-financing ν as the fraction of ε0 financed by the tax base and debt erosion channels; 1−ν equals the discounted present value of future tax hikes required to stabilize debt. The central results are:

Theorem 1 (baseline, φ=0): If ω<1 and τy>0, ν increases monotonically as τd→0, with ν→1 in the limit. Intuition via two-period analogy: when cumulative short-run MPC → 1, the Keynesian multiplier → 1/τy, and the induced tax revenue → 1 — exactly financing the original ε0.
Proposition 3: For any given τd or delay H, ν is strictly decreasing in ω: larger departures from permanent income (smaller ω) deliver faster and larger Keynesian booms and hence greater self-financing.
Theorem 2 (general monetary policy): Under a general real rate rule rt = φ·yt, there exists a threshold φ̄ ∈ (0, τy/(β·D^ss/Y^ss)) such that: if φ<φ̄, full self-financing is achieved in the limit; if φ>φ̄, ν is bounded strictly below 1 by ν̄(φ). If the monetary authority perfectly stabilizes output and inflation (φ→∞), ν=0 by construction.
Theorem 3 (general aggregate demand): With generalized demand ct = Md·dt + My·(yt−tt) + δ·Et[Σ(βω)^k(yt+k−tt+k)], self-financing holds whenever (i) ω<1 and (ii) Md>1−β and My·(1 + δ·βω/(1−βω)) ≥ 1. This nests the baseline OLG model, hybrid spender-OLG models, and approximately represents quantitative HANK models.

Distinction from FTPL: The Fiscal Theory of the Price Level (Cochrane) breaks Ricardian equivalence through equilibrium selection in a PIH-RANK setting; the self-financing here operates under the conventional equilibrium, with an active monetary authority and passive fiscal authority. The inflation channel is not the focal mechanism — the tax base channel is dominant.

Calibration (Table 1, hybrid OLG-spender model, quarterly frequency):

Consumer spending: share of hand-to-mouth (HtM) spenders µ = 0.073; OLG survival rate ω = 0.865; jointly matched to average MPC = 0.2 and short-run MPC slope from Fagereng, Holm, and Natvik (2021)
Fiscal adjustment: τd ∈ {0.085, 0.026, 0.004} (fast to slow; from Galí et al. 2007, Bianchi-Melosi 2017, Auclert-Rognlie 2020 respectively; equivalent to H ∈ {12, 23, 43} quarters under the non-Markovian rule)
Monetary policy: real rate feedback φ = 0 (neutral baseline)
Nominal rigidities: NKPC slope κ = 0.0062 (Hazell et al. 2022 point estimate)
Standard parameters: EIS σ=1 (log utility); β = 0.998 (1% annual real rate); tax feedback τy = 0.33 (DeLong-Summers benchmark: 33 cents of surplus per dollar of output); liquid wealth D^ss/Y^ss = 1.04 (Kaplan et al. 2018)

Quantitative results (Figure 3, Table 2):

For empirically calibrated τd range, ν reaches up to 0.95, nearly full self-financing in the most realistic (slow adjustment) specification
Virtually all self-financing (≈95–100%) occurs through the tax base channel — the flat NKPC (κ=0.0062) limits inflation and debt erosion to a negligible share; with steeper NKPC (κ=0.1), about 20% of self-financing comes through date-0 inflation
The quantitative fiscal multiplier at τd=0.085 is 1.11, consistent with Ramey (2011) empirical estimates for transfers with relatively quick adjustment
Table 2 (νmax as function of monetary ψ and NKPC κ): Full self-financing (νmax = 1) is attainable when ψ ≤ 1.25 and κ = 0.0062; drops to νmax = 0.63 at ψ=1.5 and κ=0.0062; drops to νmax = 0.22 with κ=0.1 and ψ=1; approaches 0 with both aggressive monetary and flexible prices. Key lesson: moderate monetary reaction combined with flat NKPC (consistent with evidence) supports near-full self-financing.

Robustness:

HANK model: same conclusions as hybrid spender-OLG; intertemporal MPCs nearly identical (Wolf, 2021; Auclert et al., 2023)
Distortionary fiscal adjustment: negligible impact, since the required adjustment itself vanishes in the limit
Government purchases: same self-financing logic applies (Keynesian boom raises tax revenue)
Investment: Keynesian cross applies to consumption; net of investment aggregate demand follows the same law of motion — self-financing result unchanged

Scope conditions: Self-financing requires Ricardian equivalence to fail (ω<1); in the PIH-RANK benchmark (ω=1), neither self-financing channel is operative. Monetary accommodation is assumed neutral or weak; aggressive offsetting (φ>φ̄) prevents full self-financing. The paper is purely positive: whether deficits are optimal is a separate normative question. Results are log-linearized dynamics; the quantitative conclusions depend on discipline from empirical MPC evidence, NKPC estimates, and fiscal adjustment speed. The self-financing mechanism operates through aggregate demand and is not driven by r<g or by seigniorage from a convenience yield.

In depth

Q1. What is the two-period intuition for full self-financing?

In a two-period economy with fully myopic consumers (MPC=1), a date-0 transfer of ε stimulates output by y = MPC/(1−MPC·(1−τy)) · ε, generating tax revenue τy·y; with MPC→1 the output multiplier converges to 1/τy and tax revenue converges to exactly ε — full self-financing via the tax base. The infinite-horizon economy with ω<1 mirrors this intuition when fiscal adjustment is delayed far enough: the “short run” cumulative MPC approaches 1 (by discounting and front-loading), the Keynesian cross delivers a multiplier of 1/τy, and the additional tax revenue precisely repays the deficit, with no future tax hike needed.

Q2. Why does the degree of self-financing ν increase as fiscal adjustment is delayed?

As the gap H between the date-0 transfer and the promised future tax hike widens, two effects amplify the Keynesian boom: (i) near-term demand is less dampened by anticipation of the future tax hike (discounting makes far-ahead taxes nearly irrelevant to today’s spending); and (ii) the general equilibrium income feedback — the Keynesian cross — has more time to play out before being curtailed by the eventual tax hike, amplifying the total output and revenue response. The longer the delay, the larger the short-run cumulative MPC, and the larger the fraction of the deficit self-financed through the tax base.

Q3. Why does aggressive monetary policy block self-financing?

If the monetary authority raises real interest rates in response to the fiscal boom (φ>0), it discourages household spending, slowing and shrinking the Keynesian boom; above the threshold φ̄, the real rate increase is strong enough to counteract the tax base feedback before the cumulative MPC can converge to 1, meaning full self-financing becomes impossible and some future fiscal adjustment is always required. Conversely, monetary accommodation (φ<0) accelerates the boom and permits full self-financing with less delay, while perfectly stabilizing output and inflation (φ→∞) entirely shuts down both self-financing channels.

Q4. What is the role of the NKPC slope in determining which channel operates?

When the NKPC is flat (κ=0.0062, the Hazell et al. 2022 estimate), a large output boom generates negligible inflation, so debt erosion contributes almost nothing and the tax base channel carries essentially all the self-financing; when the NKPC is steep (κ=0.1, consistent with supply-constrained post-COVID), the same boom generates materially more inflation, shifting the financing split so that ~20% comes through debt erosion while ~80% still comes through the tax base. The overall degree of self-financing ν is affected only through the monetary response: a steeper NKPC triggers a more aggressive real rate response, moderating the boom, but this is captured in the analysis of Theorem 2 and Table 2.

Q5. How does this paper relate to and differ from the Fiscal Theory of the Price Level (FTPL)?

The FTPL (Cochrane) achieves deficit financing through inflation in a PIH-RANK environment by abandoning the Taylor principle and exploiting equilibrium selection; this paper requires no such departure — both monetary and fiscal policy follow conventional active/passive assignments, and the equilibrium studied is the unique bounded one. The key difference is in the consumer block: Ricardian equivalence fails here through finite lives or liquidity constraints (empirically grounded), not through equilibrium selection. Moreover, while FTPL highlights the debt erosion (inflation) channel, this paper finds the tax base (real activity) channel is dominant under empirically calibrated flat Phillips curves.

Q6. What new conditions on aggregate demand ensure self-financing extends beyond the OLG baseline?

Theorem 3 identifies two sufficient conditions: (1) “positive geometric discounting” (ω<1 in the generalized demand block), ensuring that far-ahead future taxes have negligible effect on current demand; and (2) “sufficient front-loading” (Md > 1−β and My·(1 + δ·βω/(1−βω)) ≥ 1), ensuring that income is spent quickly enough for the Keynesian feedback to deliver self-financing before debt explodes. The classical PIH-RANK fails condition (1); the spender-saver model with any margin of PIH consumers fails condition (2); the OLG baseline satisfies both; and the hybrid spender-OLG (the quantitative workhorse) satisfies both for any ω<1.

Q7. Is a margin of truly PIH consumers fatal for self-financing?

Yes — introducing any strictly positive mass of PIH consumers breaks self-financing entirely, creating a discontinuity: ν=0 whenever µ_PIH > 0, no matter how small. The intuition is that PIH consumers never fully spend any income received in finite time (they smooth it across their infinite horizon), so the cumulative MPC never reaches 1 and the Keynesian boom cannot fully finance the deficit. However, the discontinuity is fragile: replacing literal PIH consumers with “near-PIH” consumers (finite but large ω) restores ν→1 in the limit as H→∞ and is consistent with empirical evidence on high MPCs for liquid households.

Key concepts

fiscal self-financing : the property that a deficit-financed government transfer raises output and inflation sufficiently to replenish government revenue (via the tax base channel) and reduce the real debt burden (via the inflation/debt erosion channel), allowing debt to return to steady state without future tax increases; the degree ν ∈ [0,1] measures what fraction of the initial deficit is self-financed.

tax base channel : the mechanism by which a Keynesian boom in real activity — triggered by the deficit-financed transfer — automatically raises tax revenue (by τy dollars per dollar of additional output) without any change in tax rates; dominant over the debt erosion channel whenever the NKPC is flat (empirically, κ ≈ 0.006).

discounting and front-loading : the two consumer demand properties necessary for self-financing; “discounting” (ω<1) means far-ahead future taxes barely affect current spending, allowing the deficit to stimulate demand even with a promised future tax hike; “front-loading” means the income response is spent quickly, so the Keynesian boom plays out before the delayed tax hike arrives, raising tax revenue sufficiently to finance the deficit.

speed of fiscal adjustment (τd) : the quarterly feedback from public debt to tax revenue in the fiscal rule; τd→0 means indefinitely delayed adjustment and maximum self-financing; empirically disciplined values range from τd=0.085 (fast, Galí et al. 2007) to τd=0.004 (slow, Auclert-Rognlie 2020), with νmax ≈ 0.95 across this range under neutral monetary policy and flat NKPC.

hybrid spender-OLG model : the paper’s quantitative workhorse, combining a fraction µ of hand-to-mouth spenders with OLG perpetual-youth consumers; jointly calibrated to match the impact and short-run MPCs from Fagereng et al. (2021), while also providing a close proxy for aggregate demand in quantitative HANK models (Auclert et al. 2023; Wolf 2021).

Can Trade Policy Mitigate Climate Change?

Mon, 01 Jan 0001 00:00:00 +0000

Overview

Farrokhi and Lashkaripour (2025) study the interaction between trade policy and climate change. The central research question is whether and how countries can use trade policy — specifically import tariffs — to address carbon leakage arising from domestic carbon pricing. When a country prices carbon domestically, production and emissions can shift to countries without carbon pricing, partially offsetting domestic emissions reductions. The paper asks how optimal import tariffs should be designed to internalize this leakage, how they relate to standard terms-of-trade tariffs, and what additional gains multilateral coordination can deliver.

Methodology and Data. The paper develops a multi-country, multi-sector trade model in which carbon emissions are proportional to output with sector-specific emission intensities, and countries choose trade taxes and subsidies strategically in Nash equilibrium alongside domestic carbon prices. The model is calibrated to 43 countries and 56 sectors using the 2014 baseline from the World Input-Output Database (WIOD 2016) for trade flows and input-output linkages, IEA data for sector-level carbon emissions, and GTAP for trade elasticities.

Main Findings. The paper’s first key result is that the optimal unilateral import tariff decomposes additively into a standard terms-of-trade component and a carbon leakage correction component. The carbon leakage correction is proportional to the emission intensity of imports from the exporting country in that sector and to the gap between the social cost of carbon and the actual domestic carbon price in the exporting country, divided by the import price. This decomposition implies that countries have incentives to impose import tariffs beyond those justified by standard terms-of-trade arguments, specifically to correct for the carbon embodied in imports from countries with insufficient carbon pricing.

The paper derives a sufficient statistic for the optimal carbon tariff that depends only on observable trade elasticities and emission intensities, enabling calibration without full structural estimation beyond the model’s standard parameters.

Quantitative Magnitudes. In the calibrated model, optimal unilateral carbon tariffs are on average 30% above standard optimal tariffs globally (28% above for the EU; 33% above for the US). The excess is largest in carbon-intensive sectors: petroleum products (41% above standard optimal), cement and non-metallic minerals (45% above standard optimal), basic metals (38% above standard optimal), and chemicals (32% above standard optimal). Imposing the optimal unilateral carbon tariff yields a welfare gain of +0.8% consumption equivalent for the imposing country, with trading partners losing on average 0.3%, and a net global gain of +0.4%.

Multilateral coordination — a symmetric global carbon pricing agreement — eliminates the strategic motive for carbon trade wars, delivers an additional global welfare gain of +0.6% above the unilateral optimum, and eliminates 85% of the carbon leakage remaining under unilateral policy.

CBAM Analysis. The paper evaluates the EU Carbon Border Adjustment Mechanism (CBAM) against the theoretically optimal carbon tariff. The EU CBAM as currently implemented — covering only direct emissions — captures 60% of the theoretically optimal carbon tariff. Extending coverage to indirect (supply-chain) emissions would capture 85% of optimal. The welfare gain to the EU from CBAM relative to no border adjustment is +0.4%.

Scope Conditions and Robustness. Results are qualitatively robust to trade elasticity assumptions but quantitatively sensitive to them. Optimal carbon tariffs are regressive with respect to developing countries; multilateral coordination mitigates this distributional effect via income transfers. General equilibrium labor market effects reduce welfare gains by approximately 20% but do not change the qualitative ranking of policies.

In depth

Q1. What is the formal structure of the optimal unilateral import tariff in the presence of carbon externalities?

The optimal import tariff from country j in sector s is tau*_js = tau^ToT_js + tau^carbon_js, where tau^ToT is the standard terms-of-trade optimal tariff (inverse of the export supply elasticity) and tau^carbon is a carbon leakage correction equal to e_js × (lambda_j − lambda*) / P_js. Here e_js is the emission intensity of country j in sector s, lambda_j is the social cost of carbon in the importing country, lambda* is the actual domestic carbon price in the exporting country, and P_js is the import price. Countries therefore have two distinct and additive incentives to impose import tariffs: the classical terms-of-trade motive and a novel carbon leakage correction motive.

Q2. What is the sufficient statistic result and why does it matter for implementation?

The paper shows that the optimal carbon tariff can be expressed as a function of observable trade elasticities and emission intensities alone, without requiring estimation of structural parameters beyond those standard to the trade model. This sufficient statistic result matters because it means regulators can in principle calculate and implement the theoretically optimal carbon border adjustment using data that are already collected — sectoral emission intensities and trade elasticities — rather than relying on unobservable structural primitives.

Q3. By how much do optimal carbon tariffs exceed standard optimal tariffs in the aggregate and in the most carbon-intensive sectors?

Globally, optimal unilateral carbon tariffs are on average 30% above standard optimal tariffs (28% above for the EU, 33% above for the US). The excess is largest in highly carbon-intensive sectors: cement and non-metallic minerals (45% above), petroleum products (41% above), basic metals (38% above), and chemicals (32% above). These are precisely the sectors where emission intensities are highest, consistent with the carbon leakage correction being proportional to emission intensity.

Q4. What are the welfare effects of unilateral optimal carbon tariff policy?

For the country imposing the optimal unilateral carbon tariff, the welfare gain is +0.8% in consumption-equivalent terms relative to no carbon tariff. Trading partners lose on average 0.3%. The net global welfare gain is +0.4%. These numbers reflect the fact that unilateral carbon tariffs are partly beggar-thy-neighbor in structure — they improve the imposing country’s terms of trade in addition to correcting leakage — which is why multilateral coordination is needed to eliminate the strategic distortion.

Q5. What additional gains does multilateral coordination deliver over unilateral policy?

Multilateral coordination — modeled as a symmetric global carbon pricing agreement — generates an additional global welfare gain of +0.6% above the unilateral optimum. It also eliminates 85% of the carbon leakage that persists under unilateral policy. The mechanism is that coordination removes the strategic motive for trade wars over carbon policy: under unilateral policy, each country has an incentive to impose carbon tariffs partly for terms-of-trade reasons, but under a coordinated agreement these beggar-thy-neighbor components are internalized.

Q6. How well does the EU’s CBAM as actually implemented capture the theoretically optimal carbon border adjustment?

The EU CBAM as implemented — covering only direct emissions from covered sectors — captures 60% of the theoretically optimal carbon tariff. Extending the CBAM to include indirect emissions embedded in supply chains would raise this to 85% of optimal. The remaining gap (15% under the extended CBAM) reflects the difficulty of accounting for all upstream emission intensities across complex global supply chains.

Q7. What is the welfare gain to the EU from CBAM relative to no border adjustment?

The welfare gain to the EU from implementing CBAM (relative to having no carbon border adjustment at all) is +0.4% in consumption-equivalent terms. This figure corresponds to the direct CBAM as implemented, covering only direct emissions.

Q8. How sensitive are the results to trade elasticity assumptions, and what are the distributional implications for developing countries?

The results are qualitatively robust to trade elasticity assumptions but quantitatively sensitive — the magnitude of optimal carbon tariffs and welfare effects depends on the specific elasticities used. On distributional grounds, optimal carbon tariffs are regressive with respect to developing countries, meaning developing economies bear disproportionate costs from carbon border adjustments. Multilateral coordination partially mitigates this distributional concern through income transfers implied by the symmetric global agreement.

Q9. How do general equilibrium labor market effects alter the conclusions?

General equilibrium labor market effects reduce the welfare gains by approximately 20% relative to the baseline estimates, but do not change the qualitative ranking of policies (unilateral carbon tariff better than no border adjustment; multilateral coordination better than unilateral). This suggests that the core policy conclusions are robust to incorporating labor market general equilibrium effects, even if the precise magnitudes are somewhat smaller.

Key Concepts

Carbon Leakage. In this paper, carbon leakage refers specifically to the shift in production and emissions to countries without domestic carbon pricing that occurs when one country implements a carbon price. It is the mechanism by which domestic carbon pricing is partially offset, motivating the use of trade policy as a complementary instrument.

Carbon Leakage Correction (tau^carbon). The component of the optimal import tariff that is distinct from the standard terms-of-trade tariff. It equals emission intensity × (social cost of carbon − domestic carbon price in exporter) / import price. It corrects for the fact that imports from countries with insufficient carbon pricing embody unpriced carbon externalities.

Terms-of-Trade Tariff (tau^ToT). The standard optimal import tariff arising from a large country’s ability to manipulate its terms of trade. Equal to the inverse of the export supply elasticity of the trading partner. The paper establishes that carbon tariffs add to — rather than replace — this classical component.

Sufficient Statistic for Optimal Carbon Tariff. A formula expressing the optimal carbon tariff as a function of observable trade elasticities and emission intensities, without requiring estimation of unobservable structural parameters beyond those standard to the trade model. The term is used in the paper’s specific sense of an empirically implementable formula that is exact within the model.

Emission Intensity. Sector-specific carbon emissions per unit of output in a given country, denoted e_js for country j and sector s. Used as the key observable that scales the carbon leakage correction component of the optimal tariff.

Multilateral Coordination. Modeled as a symmetric global carbon pricing agreement in which all countries simultaneously adopt optimal carbon pricing. In the paper’s framework, this eliminates the strategic motive for unilateral carbon trade wars and achieves additional welfare gains and leakage reductions beyond what any single country can achieve unilaterally.

Carbon Border Adjustment Mechanism (CBAM). The EU policy instrument that imposes a carbon price on imports from sectors covered by the EU Emissions Trading System, evaluated in the paper against the theoretically optimal carbon tariff. The paper distinguishes between the direct-emissions-only CBAM as implemented (capturing 60% of optimal) and a hypothetical full CBAM including indirect supply-chain emissions (capturing 85% of optimal).

Cap‐and‐Trade and Carbon Tax Meet Arrow–Debreu

Mon, 01 Jan 0001 00:00:00 +0000

Layer 1 — Overview

Research Question

Anderson and Duanmu (2025) ask how general equilibrium (GE) interactions — factor reallocation across sectors, capital misallocation under climate uncertainty, and the distributional incidence of damages — alter the social cost of carbon (SCC) relative to the partial equilibrium (PE) estimates embedded in standard integrated assessment models (IAMs). The paper also characterizes conditions for Pareto improvements through climate policy and derives the optimal carbon tax in second-best environments with pre-existing distortions.

Framework

The authors build a dynamic Arrow-Debreu economy with L goods, K capital stocks (including climate stocks), and T periods. The climate module specifies that the carbon stock evolves as S_{t+1} = S_t + sum_j e_j(q_j) − alpha·S_t, and climate damage functions D_j(S_t) = 1 − d_j·(S_t − S_0) reduce sector-specific production possibilities sets. Firms and households take the climate trajectory as given and do not internalize their own emissions’ impact, generating the externality. Under standard regularity conditions, the authors prove existence of a competitive equilibrium and establish that it is inefficient: output is too high and climate-intensive sectors are too large relative to the social optimum.

General Formula for the SCC

The paper derives a general SCC formula — SCC_t = Sum_{tau >= t} beta^(tau−t) · [dW/dS_tau / (dW/dY_t)] — that decomposes into four components: (1) the standard direct productivity-loss term, (2) a GE factor-reallocation term capturing inefficient reallocation as damages shift relative prices, (3) a capital-misallocation term reflecting distortions in investment from climate uncertainty, and (4) a distribution term reflecting the welfare losses from the regressive incidence of climate damages. All three correction terms are positive under standard conditions, so the GE SCC exceeds the PE SCC. The paper shows that this formula nests existing IAM frameworks as special cases.

Quantitative Findings

Calibrating to three leading IAMs, the authors find that general equilibrium interactions raise the SCC by 15–40% above standard PE estimates:

DICE-calibrated: GE correction of 18% above the PE estimate.
FUND-calibrated: GE correction of 15% above the PE estimate.
PAGE-calibrated: GE correction of 40% above the PE estimate, the largest correction owing to greater sector heterogeneity in that model.
Median calibration: a PE SCC of $51/tCO₂ rises to a GE SCC of $62/tCO₂.

Decomposing the aggregate GE correction: factor reallocation across sectors accounts for 55%, capital misallocation due to climate uncertainty for 30%, and the distributional regressivity of damages for 15%.

Second-Best Policy and Uncertainty

In environments with pre-existing distortions, the optimal carbon tax deviates from the SCC: revenue recycling through labor tax cuts generates additional welfare gains of 10–15% of carbon tax revenue; undertaxed capital implies the optimal carbon tax should be set above the SCC (double dividend); and in monopolistically competitive sectors the optimal carbon tax is below the SCC because the carbon tax amplifies monopoly distortions. Under climate uncertainty, the SCC carries a risk premium proportional to the variance of damage estimates times the coefficient of relative risk aversion, estimated at +$8–15/tCO₂ (15–25% of the base SCC).

Scope Conditions

The quantitative corrections are calibrated to DICE, FUND, and PAGE and therefore inherit those models’ parameterizations of damage functions and discount rates. The GE factor-reallocation and capital-misallocation channels are larger when sectors are more heterogeneous in damage exposure — as is explicit in the PAGE result. Second-best corrections depend on the sign and magnitude of pre-existing distortions (labor taxes, capital taxes, market structure).

In depth

Q1. What is the core inefficiency result, and what does it imply about the competitive equilibrium?

The paper’s efficiency theorem establishes that the competitive equilibrium is Pareto inefficient because firms and households take the climate trajectory as given and do not internalize the impact of their own emissions on the carbon stock. As a consequence, output is too high and climate-intensive sectors are too large relative to the social optimum. This externality is the fundamental justification for climate policy in the model.

Q2. How does the paper’s general SCC formula extend existing approaches, and what are the novel terms?

The general formula SCC_t = Sum_{tau >= t} beta^(tau−t) · [dW/dS_tau / (dW/dY_t)] nests standard IAM SCC formulas as special cases. The novel terms relative to partial equilibrium are: (i) a GE reallocation term capturing losses from inefficient factor reallocation as climate damages change relative prices across sectors; (ii) a capital-misallocation term capturing distortions in investment arising from climate uncertainty; and (iii) a distribution term capturing welfare losses from the regressive incidence of damages. All three terms are positive under standard conditions, implying GE SCC > PE SCC in all calibrations.

Q3. How are the quantitative GE corrections decomposed, and which channel dominates?

Of the total GE correction above the PE baseline, factor reallocation across sectors contributes 55%, capital misallocation due to climate uncertainty contributes 30%, and the distributional regressivity of damages contributes 15%. Factor reallocation is the dominant channel because, as climate damages alter relative prices, production shifts toward less-damaged sectors in ways that are distorted by the original carbon externality — generating second-order losses absent from PE damage functions.

Q4. Why does the PAGE calibration produce a larger GE correction (40%) than DICE (18%) or FUND (15%)?

The paper attributes PAGE’s larger GE correction to greater sector heterogeneity in that model’s parameterization. When damage exposure is more heterogeneous across sectors, the relative-price effects of marginal carbon are larger, amplifying the factor-reallocation channel. DICE and FUND, with more uniform sector-level damage structures, exhibit smaller reallocation corrections.

Q5. What is the median-calibration implication for the SCC in dollar terms?

In the median calibration, a PE SCC of $51/tCO₂ rises to a GE SCC of $62/tCO₂, an increase of roughly $11/tCO₂ or approximately 22%. This figure is directly computable from observable trade elasticities and sector-level damage estimates.

Q6. How should the carbon tax be adjusted when pre-existing labor market distortions are present, and what is the magnitude of the welfare gain from revenue recycling?

When labor taxes create a pre-existing wedge, using carbon tax revenue to reduce labor taxes generates additional welfare gains of 10–15% of total carbon tax revenue — the double dividend in the labor market dimension. The optimal carbon tax in this case includes the SCC plus a correction term for the labor-market distortion.

Q7. How do capital market distortions alter the optimal carbon tax relative to the SCC?

If capital is undertaxed (a pre-existing distortion in capital markets), the optimal carbon tax is set above the SCC. The intuition is that a higher carbon tax partially offsets the under-taxation of capital by raising the effective cost of carbon-intensive investment, capturing a double-dividend in the capital market.

Q8. How does monopolistic competition modify the optimal carbon tax?

For monopolistically competitive sectors, the optimal carbon tax is below the SCC. The reasoning is that applying a carbon tax to these sectors amplifies existing monopoly markups and associated distortions, so the social cost of the carbon tax exceeds the raw SCC in those sectors. The optimal policy trades off carbon correction against monopoly amplification.

Q9. What is the risk premium in the SCC under climate uncertainty, and how is it estimated?

The paper adds a term to the SCC proportional to the variance of damage estimates times the coefficient of relative risk aversion. Using empirical estimates of damage uncertainty, this risk premium is estimated at +$8–15/tCO₂, representing 15–25% of the base SCC. This term is absent from deterministic SCC calculations and constitutes a further reason standard PE estimates understate the true social cost.

Q10. What is the paper’s claim regarding computability of the GE correction?

The paper states that the novel GE terms are computable from observable trade elasticities and sector-level damage estimates, implying the GE correction is not merely a theoretical construct but can be implemented in quantitative policy analysis using data sources already available to researchers and policymakers.

Key Concepts

Social Cost of Carbon (General Equilibrium Formula) Defined in the paper as SCC_t = Sum_{tau >= t} beta^(tau−t) · [dW/dS_tau / (dW/dY_t)], the present discounted value of the marginal welfare loss from an additional unit of carbon, expressed relative to the marginal utility of current output. The paper’s version adds GE reallocation, capital-misallocation, and distributional terms absent from standard PE formulations.

GE Adjustment Factor The ratio of the general equilibrium SCC to the partial equilibrium SCC, expressed as GE/PE = 1 + phi_realloc + phi_capital + phi_distribution. Under standard conditions all three phi terms are positive, so the GE SCC strictly exceeds the PE SCC.

Climate Damage Function (Sector-Specific) Specified as D_j(S_t) = 1 − d_j·(S_t − S_0), a sector-specific multiplicative reduction in the production possibilities set as the carbon stock rises above the pre-industrial level S_0. Heterogeneity in d_j across sectors is the driver of the factor-reallocation GE correction.

Carbon Stock Evolution S_{t+1} = S_t + sum_j e_j(q_j) − alpha·S_t, where alpha is the natural decay rate of atmospheric carbon and e_j(q_j) is sectoral emissions as a function of output. Firms and households treat S_t as exogenous, generating the externality.

Double Dividend In second-best environments, a carbon tax can generate two welfare gains simultaneously: correcting the carbon externality and reducing the deadweight loss from a pre-existing distortion (labor or capital tax). The paper finds revenue recycling via labor tax cuts yields 10–15% of carbon tax revenue as additional welfare gain; undertaxed capital implies the optimal carbon tax is set above the SCC.

Risk Premium in the SCC An additive term in the SCC under climate uncertainty, proportional to the variance of damage estimates times the coefficient of relative risk aversion. Empirically estimated at +$8–15/tCO₂, representing 15–25% of the base SCC.

Second-Best Optimal Carbon Tax Written as tau*_carbon = SCC + CORRECTION, where the correction depends on the sign and magnitude of pre-existing distortions. The correction is positive under undertaxed capital (raise above SCC), negative under monopolistic competition (lower below SCC), and augmented by revenue-recycling gains when labor taxes are present.

Cash or card? A structural model of payment choices

Mon, 01 Jan 0001 00:00:00 +0000

Lippi and Moracci (2026) ask how euro area households choose between cash and card payments, and whether existing theoretical models can explain observed behavior. They draw on ECB payment diary surveys (SUCH and SPACE waves I–III, 2015–2024) covering transaction-level records that include purchase size, payment method chosen, cash on hand before each transaction, and merchant acceptance of cards. This granular data allows the authors to isolate unforced payment choices — transactions in which the consumer had sufficient cash, the merchant accepted cards, and the consumer held a card — from mechanically constrained ones.

The authors document three empirical patterns. First, roughly 39% of individuals in the sample violate the simple transaction-size threshold rule of Whitesell (1989): their largest unforced cash payment exceeds their smallest unforced card payment. Second, between 27% and 49% of unforced transactions are settled by card across survey waves, contradicting the “cash burns” policy of Alvarez and Lippi (2017) under which cards are used only when cash is exhausted. Third, and most novel, the probability of card use rises sharply as implied residual cash holdings (m′ = m − s) approach zero — that is, when a cash payment would nearly deplete the wallet. This suggests a precautionary motive: consumers maintain a cash buffer to cover purchases at merchants who do not accept cards.

To rationalize these facts, the authors build an inventory-theoretic model with a compound Poisson expenditure flow (random arrival times and random transaction sizes drawn from a lognormal distribution), imperfect card acceptance (fraction ϕ of merchants accept cards, set at 0.89 for 2023–24), a fixed cost b per cash withdrawal, a fixed cost κ per card transaction (sign unrestricted), and a utility penalty u per missed purchase. The optimal policy takes an (s,S) form for withdrawals and a state-dependent threshold for payment choice. When 0 < κ < b, the agent uses cards for purchases large enough that paying cash would push balances below a threshold m̃, thereby avoiding a costly withdrawal or the risk of missing a future purchase. The critical transaction size above which cards are used, s(m), rises with cash on hand, generating the interaction the data reveals.

The model is calibrated by minimum distance to four moments from the 2023–24 SPACE wave: average cash balances relative to daily expenditure, annual withdrawal frequency, the unforced card expenditure share, and realized purchase frequency. The estimated annual cost of managing consumption transactions for the average euro area household is approximately 15 euros — a remarkably small burden. Three counterfactual experiments quantify welfare implications. Removing card access raises the annual cost from 15 to about 50 euros, implying a card ownership value of roughly 35 euros per year. Near-universal card acceptance (ϕ = 0.99) reduces the annual cost by nearly 75%, from 15 to about 4 euros, while average cash holdings fall from 130% to about 20% of daily expenditure. A complete ban on cash would cost the average consumer approximately 60 euros per year more than the current mixed system. A cashless equilibrium requires both near-universal acceptance (ϕ above 99%) and card costs at or below zero (κ ≤ 0); neither condition alone is sufficient given the estimated magnitude of the missed-purchase cost u.

Q: What is the central empirical puzzle the paper addresses? A: Existing models predict either a pure transaction-size threshold (Whitesell 1989) or a pure cash-burns rule (Alvarez and Lippi 2017). The data shows both rules are violated: 39% of individuals with observed unforced transactions of both types violate the threshold rule, and 27–49% of unforced transactions are paid by card despite available cash. Neither model alone accounts for the novel finding that card usage spikes precisely when a cash payment would nearly exhaust the wallet.

Q: What data does the paper use and what is its key advantage? A: The authors use ECB payment diaries from four survey waves: SUCH (2015–16) and SPACE I, II, III (2019, 2021–22, 2023–24). For each transaction the diary records payment method, purchase size, and cash on hand, along with merchant acceptance of each payment method. Critically, the combined information on cash holdings and acceptance allows the authors to distinguish forced from unforced payment choices, which is essential for identifying the behavioral determinants of payment method selection.

Q: What is the novel empirical fact the paper contributes? A: The paper documents that the probability of card use increases sharply as implied residual cash (m′ = m − s) approaches zero. This pattern holds across all survey waves. It is consistent with a precautionary motive: consumers use cards to avoid depleting a cash buffer that provides insurance for encounters with merchants who do not accept cards.

Q: How does the theoretical model generate the precautionary motive for cash? A: Cards are accepted in only fraction ϕ of stores; when a merchant does not accept cards and the consumer lacks cash, the purchase is missed at utility cost u. This creates an incentive to maintain positive cash balances. Combined with a fixed withdrawal cost b and a fixed card cost κ, the agent optimally targets a cash level m* and withdraws before the wallet empties (trigger m̄ > 0), holding a buffer against card-rejection events.

Q: What is the key proposition characterizing the optimal payment policy? A: Proposition 1 establishes three regimes. When κ ≤ 0, the card always dominates and is used for all purchases. When κ ≥ b, cash always dominates and cards are used only for forced transactions. In the intermediate case 0 < κ < b, a threshold m̃ ∈ (m̄, m*) divides behavior: for m < m̃ the agent uses cash for all transactions; for m ≥ m̃ the agent uses a card for any purchase exceeding a size threshold s(m), where s(m) is increasing in m. The threshold s(m) distinguishes this policy from Whitesell (1989)’s fixed threshold.

Q: How does the payment threshold s(m) vary with cash on hand, and why? A: s(m) is the purchase size above which the value loss from paying cash — pushing the agent closer to m̄ and raising the probability of a missed purchase or costly withdrawal — exceeds the fixed card cost κ. As m rises, a larger cash payment is needed to trigger this concern, so s(m) increases. This means card use becomes less frequent as cash balances grow for most of the state space, consistent with the empirical finding that cash probability rises with cash on hand.

Q: What are the calibrated parameter values and what do they imply? A: The withdrawal cost b is estimated at 0.003 EUR — very small. The per-transaction card cost κ is about 60% of b, meaning cards are cheaper to use per transaction than visiting an ATM. The cost of a missed purchase u is approximately 1 EUR. The arrival rate λ is calibrated so that about 2% of purchase opportunities are missed under the estimated card acceptance rate of 0.89. These values imply that the payment system imposes a small but non-trivial welfare burden, concentrated in the precautionary costs of maintaining cash.

Q: What is the estimated annual cost of managing consumption transactions? A: Under the optimal policy for 2023–24 parameters, the annual cost C is approximately 15 euros per household. This decomposes into opportunity costs of holding cash (RM), withdrawal costs (bn), card usage costs, and the disutility from missed purchases. The authors characterize this as “remarkably small,” suggesting the current payment system is relatively efficient from the household’s perspective.

Q: How does this cost compare across demographic groups and over time? A: Until 2019 the estimated annual cost was around 20 euros; it stabilized around 15 euros from 2021–22 onward, with the decline driven primarily by households holding less cash in the post-pandemic period. Across age groups, education levels, income brackets, and gender, each subgroup faces a very similar cost as a proportion of their expenditure, indicating limited distributional variation in payment system costs.

Q: What is the welfare value of owning a payment card? A: Setting ϕ = 0 (cash-only economy), the annual cost rises from 15 to approximately 50 euros. The value of card ownership is therefore approximately 35 euros per year. The savings come primarily from lower opportunity costs of holding cash (since card access reduces the precautionary motive) and lower disutility from missed purchases; withdrawal cost reductions play a negligible role.

Q: What happens under near-universal card acceptance (ϕ = 0.99)? A: Average cash holdings fall from about 130% of daily expenditure to about 20% of daily expenditure, a reduction of approximately 110 percentage points. The unconditional card expenditure share rises by 17 percentage points to about 93%, mostly through an increase in forced card transactions (agents more often lack cash). Unforced card expenditure falls by about 10 percentage points because the precautionary motive for using cards — preserving a cash buffer — weakens when acceptance is near-universal. The annual management cost falls by nearly 75%, from 15 to approximately 4 euros.

Q: Under what conditions does a cashless economy emerge? A: The model identifies two jointly necessary conditions: card acceptance near universal (ϕ above 99%) and card costs at or below zero (κ ≤ 0). Raising ϕ alone from the estimated 0.89 to 0.99 reduces cash use substantially but does not eliminate it, because the estimated cost of missed purchases u is large enough that consumers still maintain a small cash buffer. For κ ≤ 0, cash holdings M/e are insensitive to κ and depend only on ϕ. With current card usage costs, even near-universal acceptance would not produce a cashless economy.

Q: What is the cost of a complete cash ban? A: Under a cashless policy, the annual cost is approximately 75 euros — about 5 times the 15-euro baseline and about 25 euros more than the cash-only cost of 50 euros. A complete ban on cash would increase transaction management costs by approximately 60 euros per year for the average consumer. This is because at ϕ = 0.89, nearly 11% of purchase encounters would result in missed transactions.

Q: How does card acceptance affect cash management in the model and data? A: As ϕ falls, the precautionary motive for holding cash strengthens: the withdrawal trigger m̄ rises, average cash holdings increase, and withdrawals occur when the wallet is still substantially full. This prediction is qualitatively consistent with the empirical finding that in areas with lower card acceptance, individuals hold higher cash balances and withdraw at higher residual cash levels.

Q: What are the main limitations the authors acknowledge? A: Three caveats are identified. First, the model has no exogenous cash inflows (wage payments, gifts); incorporating Miller-Orr-style inflows could affect cash resilience estimates. Second, the card cost κ is fixed and independent of transaction size s; allowing κ(s) = κ₀ + κₛ·s would better capture reward-program economies relevant for the US. Third, merchant card acceptance is treated as exogenous; endogenizing it as a game between merchants would allow a joint welfare evaluation of acceptance decisions, payment choices, and cash management.

Unforced transactions: Transactions in which both cash and card payments are feasible — specifically, cash holdings exceed the purchase size, the merchant accepts cards, and the consumer holds a card. Isolating unforced transactions is necessary to identify behavioral determinants of payment choice, stripping out mechanical constraints imposed by cash insufficiency or merchant non-acceptance.
Precautionary cash buffer: A positive cash balance maintained above the withdrawal trigger (m̄ > 0) to insure against purchases at merchants who do not accept cards. In the model, this buffer arises because card non-acceptance combined with insufficient cash results in a missed purchase at utility cost u; the precautionary motive is stronger when ϕ is lower.
Transaction-size threshold s(m): The purchase size above which a consumer with cash holdings m optimally pays by card (when cards are available and 0 < κ < b). Unlike the fixed threshold of Whitesell (1989), s(m) is increasing in m, generating a novel interaction between cash on hand and payment method choice that the ECB diary data confirms.
Cash burns policy: The policy of Alvarez and Lippi (2017) in which cards are used only when cash is fully exhausted (m = 0). The paper documents that 27–49% of unforced transactions are settled by card across survey waves, constituting a systematic violation of this rule that the model resolves by introducing transaction-size heterogeneity and a precautionary motive.
Imperfect card acceptance (ϕ): The exogenous fraction of merchants willing to accept card payments, set at 0.89 for 2023–24 in the calibration. Imperfect acceptance is the primary driver of the precautionary demand for cash; it also determines the frequency of missed purchases under a cashless policy and is the key parameter governing whether a cashless economy can emerge.
Annual transaction management cost (C): The total yearly household cost of operating within the payment system, defined as C = RM + bn + κ·(number of card purchases) + u·(number of missed purchases). Estimated at approximately 15 euros for the average euro area household in 2023–24, decomposed across opportunity costs of cash holdings, withdrawal costs, card usage costs, and missed-purchase disutility.
Ss withdrawal policy: The optimal cash replenishment rule characterized by a trigger level m̄ and a target level m*. The agent withdraws whenever cash falls to m̄, resetting balances to m*. A strictly positive trigger (m̄ > 0) reflects the precautionary motive: the agent refills before cash is exhausted in order to maintain insurance against card non-acceptance events.

Central bank reputation with noise

Mon, 01 Jan 0001 00:00:00 +0000

Layer 1 — Overview

Research Question. How does noise in the mapping from central bank actions to realized inflation affect the existence and character of reputational equilibria in monetary policy? Specifically, can a central bank that faces uncertainty about whether it is perceived as “hawkish” or “dovish” sustain a pure strategy separating equilibrium, and how should each type behave as a function of its current reputation?

Model and Methodology. Amador and Phelan build on the monopolistic-competition, cash-in-advance framework of Chari, Christiano, and Eichenbaum (1998) and extend it to allow for (i) two central bank types — hawkish (type 1, high penalty γ₁ for inflationary actions) and dovish (type 2, lower penalty γ₂ < γ₁) — whose identity is private information; (ii) type switching governed by a Markov process, with probability δ that a hawkish bank is replaced by a dovish one and probability ε that a dovish bank is replaced by a hawkish one; and (iii) noise between the central bank’s chosen action μᵢ and realized money growth μₐ, which is drawn from a density f(μₐ|μᵢ) with full support. The equilibrium concept is pure symmetric Markov perfect equilibrium, in which all strategies are functions only of the public Bayesian posterior ρ that the current central bank is hawkish. The paper proceeds analytically to characterize no-pooling results and then computationally to demonstrate existence of separating equilibria.

Main Findings.

No pooling equilibria exist (analytical). Propositions 2 and 3 establish that no pure symmetric Markov equilibrium can have both types choosing the same positive action for any reputation ρ, as long as γ₁ ≠ γ₂ and Assumption 1 (pricing distortion sufficiently severe) holds. The intuition: if both types pool, realized inflation is uninformative, reputation does not change, and there are no dynamic incentives — but different static incentives (γ₁ ≠ γ₂) then imply different optimal actions, a contradiction.
Without sufficient noise, separating equilibria also fail to exist. In the no-noise limit, Bayesian updating forces the dovish bank’s reputation to jump to its maximum after one period of mimicking the hawkish action, making mimicry cheap when the discount factor β is high or the type-persistence probability ε is low. This makes the incentive-compatibility constraint for the dovish bank very difficult to satisfy, potentially precluding existence of a separating equilibrium.
With sufficient noise, pure strategy separating equilibria exist and have appealing properties (computational). The benchmark parameterization sets α = 1, σ = 5, β = 0.99, h(μ) = 0.5μ², ε = δ = 0.02, and the noise distribution such that the hawkish type’s unconstrained target would deliver mean inflation of 2% and the dovish type’s 3%. Under these parameters:
- In the full-information (known-type) world: price P = 1.313 for the hawkish type and P = 1.338 for the dovish type, with E[log(c) − αc] = −1.0297 and −1.0320 respectively, versus the efficient benchmark of −1.
- In the reputational equilibrium, both types choose lower inflationary actions than they would absent reputation considerations — because reputation is valuable (higher ρ lowers household prices and thus improves welfare for both types).
- Both types’ optimal actions are U-shaped in reputation ρ: they are most restrained — choosing the lowest inflationary actions — when ρ is middling (interior), because Bayesian updating is most sensitive (and thus the reputation cost of inflating is greatest) at interior beliefs, while it is difficult to move extreme beliefs.
- Average equilibrium inflation is 2.1%, which lies below the weighted average of unconstrained type targets (2.5% given equal switching probabilities), demonstrating that reputation concerns compress inflation outcomes.
Ergodic distribution of reputation remains interior. Starting from ρ = 0.5, expected reputation conditional on being hawkish stays below 0.63 and conditional on being dovish stays above 0.38, reflecting that noise and type switching prevent reputation from collapsing to its extremes.
Welfare implications. The hawkish type is made worse off by ongoing household uncertainty (relative to the reference game in which type is immediately revealed), while the dovish type is made better off. Households are better off under continuing uncertainty than under immediate revelation, unless reputation is near its maximum — because uncertainty suppresses inflationary temptations for both types.

Scope Conditions. Results apply within a monopolistic-competition, cash-in-advance economy with discrete time, infinite horizon, and Markov strategies. The no-pooling result requires Assumption 1 (the pricing distortion is sufficiently severe that the central bank has a positive incentive to inflate from μ = 0). The no-noise existence failure is an informal argument holding fixed discount and type-switching parameters. Computational results are specific to the benchmark parameterization but are verified to be robust to variation in β, σ, γ₁, γ₂, ε, and δ.

In depth

Q1. What is the fundamental time-inconsistency problem in the underlying Chari et al. (1998) economy, and how does the paper extend it?

A1: In the Chari et al. (1998) monopolistic-competition cash-in-advance economy, households exploit market power when setting prices, and the cash-in-advance constraint depresses consumption efficiency; this creates an ex-post temptation for the central bank to inflate and partially offset these distortions, even though in equilibrium such inflation is anticipated and only worsens inefficiencies. Equilibrium consumption equals (1/α) × ((σ−1)/σ) × (β/(1+μ)), compounding a monopoly distortion (σ−1)/σ < 1 and a cash-in-advance distortion β/(1+μ) < 1 below the efficient level 1/α. Amador and Phelan add household uncertainty about the central bank’s type — captured by the Bayesian posterior ρ that the bank is hawkish — allowing reputation to be endogenously determined and to feed back into equilibrium pricing.

Q2. Why does reputation matter only through differences in inflation costs γᵢ and not through differences in effective discount factors alone?

A2: Proposition 1 establishes that if γ₁ = γ₂ (equal inflation penalties), then even if the two types have different effective discount factors β₁ = β(1−δ) ≠ β₂ = β(1−ε), there exists a pooling Markov equilibrium in which both types choose the same action μ* and reputation plays no role. When both types have identical static incentives, they will always choose the same action given that reputation doesn’t affect payoffs in such an equilibrium. Hence the relevant dimension of heterogeneity for reputation to matter is the inflation cost parameter γᵢ, not patience.

Q3. What is the formal argument that no pooling equilibrium can exist when γ₁ ≠ γ₂?

A3: Propositions 2 and 3 provide the formal argument. If both types pool at any reputation ρ with a common positive action μ, Bayesian updating implies that ρ⁺ is independent of the money growth realization μₐ. The first-order condition for type i then reduces to the static condition (∂E[log(c) − αc|μ]/∂μ) = γᵢh’(μ), which cannot hold simultaneously for types 1 and 2 since γ₁ ≠ γ₂ and h’(μ) > 0 for μ > 0. This logic rules out pooling at the stationary reputation ρ* = ε/(δ+ε) in Proposition 2 and at any reputation where μ > 0 in Proposition 3.

Q4. Why does noise facilitate the existence of separating equilibria?

A4: Without noise, if types separate, observing the hawkish action reveals the bank is hawkish with certainty, pushing reputation to its maximum (1−δ) in a single period. This makes mimicry extremely cheap for the dovish type when β₂ is large or ε is small: the incentive compatibility condition requires that the dovish type’s static gain from choosing its own action exceeds the value gain from jumping to the best possible reputation, which is a very stringent requirement. With noise, mimicry generates only a probabilistic shift in beliefs rather than a discrete jump to the extreme, so the dovish type must maintain the hawkish action repeatedly to achieve a reputational gain — making mimicry costly enough that the incentive compatibility condition can be satisfied.

Q5. What is the “reference game” and what analytical purpose does it serve?

A5: The reference game is a variant in which the central bank’s type is fixed and is revealed to households immediately after they set prices at date t = 0. From t = 1 onward, the game reduces to the full-information, single-type game of Section 4. This allows the authors to isolate the “direct” effect of reputation — the fact that expected type affects equilibrium prices today — from the “indirect” or strategic effect of the central bank actively managing its reputation. In the numerical example, the reference-game prices form the upper dashed line in Figure 1, while the actual game’s prices form the lower solid line, with the gap between them attributable to the central bank’s incentive to restrain inflation in order to protect reputation.

Q6. What are the equilibrium price and welfare levels in the benchmark numerical example, and how do they compare to efficient and full-information benchmarks?

A6: The efficient benchmark delivers log(c) − αc = −1 with consumption c* = 1/α = 1. Under full information with only the hawkish type present, P = 1.313 and E[log(c) − αc] = −1.0297; under only the dovish type, P = 1.338 and E[log(c) − αc] = −1.0320. In the reputational equilibrium, prices lie below the full-information mixed benchmark for any given ρ (the solid line in Figure 1 lies below the dashed reference-game line), reflecting that the central banks’ desire to maintain reputation leads both types to restrain inflation beyond what the direct price effect alone would induce.

Q7. How does the U-shape of optimal central bank actions in reputation arise, and what does it imply for policy?

A7: The U-shape arises because Bayesian updating is most powerful at interior beliefs: for extreme reputations (near ε or 1−δ), any given realization of money growth moves the posterior relatively little, so the reputational cost of inflating is small. For interior (middling) reputations, the same action shifts the posterior substantially, making reputation more sensitive to inflation choices and thus increasing the marginal cost of inflating. Both types therefore choose their minimum inflationary actions at middling reputations. The policy implication is that a hawkish central bank with a very low reputation (following a run of high realized inflation outcomes) should not dramatically tighten, because further contraction does relatively little for its reputation until nature delivers enough favorable realizations to move it to a more interior range.

Q8. What happens to the ergodic distribution of reputation and inflation, and what does this imply about the persistence of reputational dynamics?

A8: Starting from ρ = 0.5, expected reputation remains in the interior: above 0.38 for the dovish type and below 0.63 for the hawkish type. The ergodic distribution of ρ (Figure 5) concentrates at interior values rather than the poles, showing that noise and type switching prevent reputation from stabilizing at extremes. The ergodic inflation distribution (Figure 6) has an average of 2.1%, compared to 2% under an all-hawkish world and 3% under an all-dovish world. Because ε = δ (types are equally likely in the long run), the unconstrained-type-weighted average would be 2.5%, so reputational incentives reduce equilibrium average inflation by approximately 0.4 percentage points.

Q9. Who gains and who loses from ongoing type uncertainty relative to immediate revelation?

A9: The hawkish type’s value function (Figure 3a) lies below the reference-game dashed line for intermediate reputations, indicating that the hawkish type is made worse off by uncertainty — it must bear the cost of restraining inflation beyond what is statically optimal in order to signal its type, but the households partially “blame” it for high realized inflation regardless. The dovish type (Figure 3b) is made better off under continuing uncertainty because its reputation benefits from households’ inability to perfectly distinguish types. Households (Figure 3c) are better off under uncertainty unless reputation is very high, because uncertainty suppresses inflation temptations for both types and keeps prices lower.

Q10. What happens to equilibrium behavior under robustness checks on key parameters?

A10: When the discount factor β or the elasticity of substitution σ decreases, both types inflate more and prices rise. When the hawkish type’s penalty γ₁ decreases (becomes less hawkish), both types inflate more and prices rise. When the dovish type’s penalty γ₂ decreases (becomes more dovish), the dovish type inflates more and, somewhat counterintuitively, the hawkish type inflates less, leaving prices roughly unchanged but slightly higher. When switching probabilities ε or δ increase, prices rise and both types inflate more, analogously to a decrease in β. Across all robustness exercises, the dovish type never inflates less than the hawkish type — consistent with Proposition 1’s implication that the inflation-cost difference γ₁ − γ₂ is the fundamental driver of separation.

Key Concepts

Hawkish type (type 1): A central bank that receives a relatively large negative payoff γ₁h(μᵢ) for taking inflationary actions, where γ₁ > γ₂. In the paper’s own sense, this type is not behavioral — it optimizes fully and can choose any action — but has a strong intrinsic cost to inflation, making it prefer lower money growth rates ceteris paribus.

Dovish type (type 2): A central bank with a lower penalty parameter γ₂ < γ₁ for inflationary actions. Like the hawkish type, it is fully strategic and optimizing, differing only in the magnitude of its intrinsic inflation cost.

Reputation (ρ): The Bayesian posterior probability that households assign to the current central bank being the hawkish type. It is the single payoff-relevant state variable in the Markov equilibrium, evolving through Bayes’ rule applied to realized money growth and type-switching probabilities.

Pure symmetric Markov perfect equilibrium: An equilibrium in which all households set the same price and consume the same amount (symmetry), and all strategies — prices P(ρ), central bank actions μ₁(ρ) and μ₂(ρ), and household consumption c(μₐ, ρ) — depend on history only through the current reputation ρ (Markov). The paper focuses exclusively on pure (non-mixed) strategy equilibria.

Pooling equilibrium: An equilibrium in which both types choose the same action μ₁(ρ) = μ₂(ρ) at some reputation ρ. The paper proves analytically that no pooling equilibrium can exist when γ₁ ≠ γ₂ and the pricing distortion is sufficiently severe (Assumption 1).

Separating equilibrium: An equilibrium in which μ₁(ρ) ≠ μ₂(ρ) for all ρ, so that realized money growth outcomes are informative about type and reputation evolves non-trivially. The paper argues that sufficient noise is necessary for such equilibria to exist.

Effective discount factor (βᵢ): The discount factor net of type-switching: β₁ = β(1−δ) for the hawkish type (which survives as hawkish with probability 1−δ) and β₂ = β(1−ε) for the dovish type. Central banks care only about payoffs while they are active, so effective discounting captures both time preference and expected tenure.

Noise (disconnection between actions and outcomes): The stochastic wedge between the central bank’s chosen action μᵢ and realized money growth μₐ, governed by a density f(μₐ|μᵢ) with full support. In the paper’s framework, noise is not merely a nuisance but a structural feature that makes reputational equilibria possible by preventing single-period complete revelation of type.

Changing Opportunity: Sociological Mechanisms Underlying Growing Class Gaps

Mon, 01 Jan 0001 00:00:00 +0000

This paper documents sharp divergent trends in intergenerational economic mobility by race and class in the United States across the 1978 to 1992 birth cohorts, and investigates the causal mechanisms driving those changes. The core empirical facts are two: between 1978 and 1992 birth cohorts, the earnings gap between white children from high-income versus low-income families grew by approximately 28–30% (the “white class gap”), while the earnings gap between white and Black children from low-income families shrank by approximately 27–30% (the “white-Black race gap”). These twin trends — growing class gaps and shrinking race gaps — appear consistently across earnings, employment rates, educational attainment, SAT/ACT scores, incarceration, marriage, and mortality, and they hold in nearly every region of the country.

The data are drawn from de-identified federal income tax returns linked to decennial census records and the Numident database, covering 57 million children born between 1978 and 1992, with information on parental and child incomes, employment, marital status, mortality, and residential location, supplemented by ACS educational attainment and linked SAT/ACT records covering 24.8 million students. Children’s outcomes are measured primarily as household income percentile ranks at age 27.

In dollar terms, the white class gap (mean income difference between children raised at the 25th vs. 75th parental income percentile) grew from $17,720 to $20,950 in real 2023 dollars, while the white-Black race gap for low-income families fell from $20,810 to $14,910. The intergenerational rank-rank slope for white children increased from 0.23 to 0.29. The racial gap in intergenerational persistence of poverty — the probability of a child born to the bottom income quintile remaining there — shrank from 14.7 percentage points to 4.1 percentage points (a 72% reduction), driven roughly equally by improvement in Black children’s chances of escaping poverty and deterioration in low-income white children’s chances. The white class gap in early-adulthood mortality more than doubled, while the white-Black race gap in mortality fell by 77%.

The paper systematically rules out three alternative explanations. Observable family characteristics (parental education, wealth, occupation, and marital status) explain only 7% of the growing white class gap and none of the shrinking white-Black race gap. Neighborhood-level common shocks, tested by including childhood county or Census tract-by-cohort fixed effects, similarly explain only 7% of the class gap and none of the race gap. The divergent trends persist even among children raised in the same Census tract, pointing to forces that operate differentially across race and class groups within the same neighborhood.

The paper’s central finding is that changes in children’s outcomes across cohorts are strongly and positively correlated (r = 0.91 across subgroups) with changes in parental employment rates within the child’s social community, defined as families sharing the same race, class, and childhood county. Low-income white communities experienced sharp relative declines in parental employment rates; low-income Black communities experienced relative improvements. These community-level parental employment changes account for nearly all of the divergent trends.

To establish causation, the paper exploits variation in the age at which children move to counties with changing parental employment rates. Children who moved at younger ages (before age 8) to counties where parental employment was increasing experienced larger improvements in earnings than those who moved at older ages (after age 13), consistent with a causal exposure effect with greater impact for longer durations of exposure. Sibling comparisons — comparing outcomes of younger versus older siblings who moved together — confirm that the age gradient reflects causal exposure rather than family-level selection.

The social interaction mechanism is supported by two sources of variation: children’s outcomes are more strongly related to parental employment rates of their own birth cohort than adjacent cohorts (cohort specificity unlikely to be explained by resources), and outcomes are primarily driven by the employment rates of same-race, same-class community members, with cross-racial influence appearing only in counties where cross-racial interaction is greater (counties with small Black population shares or higher interracial marriage rates). The unified explanation the paper proposes is that children’s outcomes mimic those of the adults in their social communities, following Borjas (1992).

Q: What are the precise magnitudes of the growing white class gap and shrinking white-Black race gap in income percentile ranks? A: The white class gap — the difference in mean household income ranks between white children raised at the 25th versus 75th parental income percentiles — increased from 11.1 to 14.1 percentile ranks between the 1978 and 1992 birth cohorts, a 28% increase. The white-Black race gap for children from low-income families fell from 14.9 to 10.9 percentile ranks, a 27% decrease. The intergenerational rank-rank slope for white children increased from 0.23 to 0.29 (a 28% rise in persistence).

Q: How did the trends in poverty persistence versus upward mobility differ? A: The convergence in white-Black outcomes was driven almost entirely by changes in poverty persistence rather than upward mobility. The racial gap in the probability of remaining in the bottom income quintile shrank from 14.7 percentage points to 4.1 percentage points (a 72% reduction), with roughly half from Black children being less likely to remain at the bottom and half from white children being more likely to remain. By contrast, the white-Black gap in the probability of rising from the bottom quintile to the top quintile fell by only 1.9 percentage points (17%).

Q: How widespread geographically were the divergent trends? A: Outcomes declined for low-income white families in nearly every county, but the largest declines occurred in historically high-mobility areas such as the Great Plains and the coasts. For low-income Black families, outcomes improved in most areas, with the largest gains in historically low-mobility regions including the Southeast and the industrial Midwest. The correlation between county-level changes for low-income white versus low-income Black children is a positive 0.58, meaning the areas where Black families improved most tended to be areas where white families declined least, not most.

Q: Do the trends persist when using non-rank, inflation-adjusted dollar outcomes? A: Yes. The white class gap in mean household income grew from $17,720 to $20,950 in real 2023 dollars, and the white-Black race gap for low-income families narrowed from $20,810 to $14,910. The paper also reports similar patterns for individual earnings (as opposed to household income), ruling out changes in household composition as a driver.

Q: What do the pre-labor-market outcomes show? A: The divergent trends emerge before children enter the labor market. The white class gap in educational attainment grew by 20%, driven by growing gaps in four-year college completion. The white-Black race gap in educational attainment disappeared by the 1992 cohort, driven by narrowing gaps in high school graduation. The white class gap in the share of students taking the SAT/ACT increased by 12.1 percentage points between the 1980 and 1991 birth cohorts, while the white-Black race gap in SAT/ACT-taking decreased by 20.3 percentage points. The white class gap in mean SAT/ACT scores grew by 62% between the 1980 and 1997 birth cohorts among test-takers.

Q: How large is the mortality dimension of these trends? A: The white class gap in early-adulthood mortality (ages 24–27) more than doubled between the 1978 and 1992 birth cohorts, while the white-Black race gap in early-adulthood mortality decreased by 77%. These non-monetary outcomes are invariant to inflation and income measurement choices, confirming the robustness of the broader trends.

Q: How much do family-level characteristics explain? A: Controlling jointly for parental education, wealth, occupation, and marital status reduces the estimated growth in the white class gap by only 7% (from 3.37 to 3.13 percentile ranks). The same controls do not explain the shrinking white-Black race gap — the estimated reduction in the race gap actually becomes slightly larger (4.56 rather than 4.16 percentiles) after controlling for family characteristics, indicating that observable family factors work against the observed convergence.

Q: How much do neighborhood-level common shocks explain? A: Including childhood county fixed effects interacted with birth cohort explains only 7% of the growing white class gap and none of the shrinking white-Black race gap. Including Census tract fixed effects yields essentially identical results. The divergent trends persist among children growing up in the same Census tract, ruling out explanations based on differential exposure to neighborhood-level economic shocks.

Q: What is the community-level parental employment correlation, and what does it explain? A: Changes in children’s earnings, SAT/ACT scores, and educational attainment across cohorts are strongly positively correlated with changes in parental employment rates within the child’s community (same race, same class, same county), controlling for the employment status of the child’s own parents. The correlation between changes in children’s outcomes and changes in community parental employment rates across all race and class subgroups is 0.91. This single community-level factor — as proxied by parental employment rates — accounts for nearly all of the divergent trends by race and class.

Q: What is the quasi-experimental design for estimating causal effects, and what does it assume? A: The paper compares outcomes of children who moved to counties with increasing parental employment rates at younger versus older ages, across earlier versus later birth cohorts. The identification assumption is “constant selection by age”: any selection of families into moving to a given county in years when parental employment is higher may differ across cohorts, but those selection differences must not themselves vary systematically with the age at which children move. The paper treats this as a “constant selection by age” assumption standard in the neighborhood effects literature.

Q: What do the causal exposure results show? A: Children who moved before age 8 to communities where parental employment was increasing show systematically higher earnings in later birth cohorts, while children who made the same move after age 13 show little difference in earnings across cohorts. This pattern — larger effects at younger ages — is consistent with a causal exposure effect of growing up in an improving community, with effects proportional to the duration of exposure.

Q: How do sibling comparisons validate the identification assumption? A: When siblings move together to a community with increasing parental employment rates, the younger sibling — who receives more years of exposure to the higher-employment environment — earns significantly more than the older sibling. The earnings difference is proportional to the age gap between siblings. This rules out explanations based on fixed unobserved family characteristics and supports the constant-selection-by-age assumption.

Q: What evidence distinguishes social interaction mechanisms from economic resource mechanisms? A: Two sources of variation are used. First, children’s outcomes are much more strongly related to the parental employment rates of peers in their own birth cohort than peers in adjacent cohorts — a cohort-specificity that is implausible for economic resource channels (school budgets, local tax bases) which would not vary sharply across adjacent cohorts. Second, outcomes of low-income white children are driven primarily by the employment rates of low-income white parents, not by low-income Black or high-income white parents’ employment rates, and vice versa for low-income Black children — consistent with interaction patterns being stratified by race and class.

Q: What role does cross-racial interaction play? A: In counties where Black children constitute a small share of the population (making cross-racial interaction more likely), Black children’s outcomes are also related to low-income white parental employment rates. Similarly, in counties with higher interracial marriage rates (a proxy for cross-racial interaction), Black children’s outcomes are related to white parental employment rates even after controlling for racial composition. This cross-sectional variation supports the interpretation that the influence channel is social interaction rather than parallel economic shocks.

Q: How do the findings for Hispanic, Asian, and AIAN children compare? A: Changes in economic mobility for Hispanic, Asian, and AIAN children between 1978 and 1992 birth cohorts were much more modest than for white and Black children. For children from low-income families, mean household income ranks were essentially unchanged for Asian children and rose by only about 0.5 percentiles for Hispanic and AIAN children. However, the same community-level parental employment rate mechanism explains the (smaller) changes for these groups as well; the correlation between changes in children’s outcomes and changes in community parental employment rates is 0.91 across all subgroups.

Q: What is the paper’s unified theoretical account of all the divergent trends? A: The paper concludes that a parsimonious theory — that children’s outcomes mimic those of the parents in their social communities, following Borjas (1992) — explains the divergent trends by race and class. Because social interaction is stratified by race and class even within neighborhoods, changes in parental outcomes in the parent generation propagate differentially to white versus Black and high-income versus low-income children, producing growing class gaps and shrinking race gaps through the same underlying mechanism.

Q: What does the paper imply about the malleability of economic mobility disparities? A: Because the causal exposure effects of community environments on children’s outcomes can be detected within a 14-year span (1978 to 1992 birth cohorts), the paper implies that differences in economic mobility by race and class may be malleable in policy-relevant timeframes. This is despite the fact that long-standing disparities partly trace back to historical factors such as slavery, Jim Crow laws, redlining, and the Great Migration.

White class gap: The difference in mean household income ranks in adulthood for white children born to families at the 25th versus 75th percentiles of the national parental income distribution; increased from 11.1 to 14.1 percentile ranks (28%) between the 1978 and 1992 birth cohorts.

White-Black race gap: The difference in mean household income ranks in adulthood for white versus Black children born to families at the 25th percentile of the national parental income distribution; decreased from 14.9 to 10.9 percentile ranks (27%) between the 1978 and 1992 birth cohorts.

Social community: In this paper’s usage, other families who share the same race, class category, and childhood county as a given child; the unit within which community-level parental employment rates are measured and found to be predictive of children’s outcomes.

Causal exposure effect: The effect on a child’s adult outcomes of an additional year spent growing up in a community with higher parental employment rates, estimated quasi-experimentally by comparing children who moved to counties with changing parental employment rates at younger versus older ages; larger effects at younger ages imply a causal, duration-sensitive exposure channel.

Constant selection by age: The identification assumption underlying the quasi-experimental design; requires that any systematic differences in the types of families who move to a county when parental employment is high versus low do not themselves vary with the age at which children move to that county.

Intergenerational rank-rank slope: The OLS slope coefficient from regressing child income percentile rank on parental income percentile rank; for white children, increased from 0.23 in the 1978 birth cohort to 0.29 in the 1992 birth cohort, indicating greater persistence of economic status.

Cohort-specificity of community effects: The empirical pattern that children’s outcomes are more strongly related to the parental employment rates of peers in their own birth cohort than those of adjacent cohorts, used in the paper as evidence favoring social interaction over economic resource channels as the mediating mechanism.

Civil War–Induced Displacement and Human Capital

Mon, 01 Jan 0001 00:00:00 +0000

This paper examines the impact of conflict-driven forced displacement on human capital accumulation using the Mozambican civil war (1977–1992) as the empirical setting. During this war, over four million civilians — roughly a third of the population — fled to rural areas, cities, neighboring countries, or UN-managed refugee camps. The study advances on prior work in three dimensions: it uses the full post-war population census (12 million individuals) rather than a small survey; it studies multiple displacement trajectories in a single framework; and it separately identifies place-based exposure effects from a general uprootedness effect.

The primary data source is the 1997 Mozambican census, which records each individual’s place of birth, residence in 1992 (the war’s end), and residence in 1997. Key outcomes are educational attainment and sectoral employment (agricultural versus services). The authors supplement the census with digitized colonial road and school maps, georeferenced conflict events, and landmine contamination data.

The main identification strategy compares approximately 135,000 siblings (from 45,000 families) separated during the war, using the sibling who stayed behind as a within-family counterfactual. This design controls for household-level characteristics including religious and ethnic background, aspirations, and exposure to violence.

The key findings are as follows. First, rural-born IDPs displaced to cities have a 7.3 percentage point higher likelihood of attending primary school and 0.53 more years of schooling compared to their siblings who stayed behind — roughly one-third of the non-displaced mean. Rural-born IDPs displaced to other rural areas also show gains, with a 3 percentage point higher likelihood of attending school and 0.24 additional years, supporting the uprootedness hypothesis even for displacements that did not reach urban centers. Urban-born IDPs forcibly relocated to the countryside — primarily through FRELIMO’s villagization scheme — experienced 9 percentage point lower primary school attendance and approximately 0.5 fewer years of schooling relative to siblings who remained in cities.

External displacement (to camps in Malawi or Zimbabwe) generated no significant schooling gains relative to staying siblings, despite UN-built schools in camps, likely because scarce employment opportunities reduced perceived returns to education.

Second, the paper jointly estimates place-based and uprootedness effects in a single within-family framework. Place effects are statistically significant: displacement to a district one standard deviation more developed than one’s birthplace raises schooling likelihood by approximately 3 percentage points (OLS) to 5 percentage points (2SLS reduced form). Crucially, a residual uprootedness effect of approximately 2–4 percentage points persists even after controlling fully for destination-origin differences in development and conflict intensity. This uprootedness effect is quantitatively comparable to being displaced to a district one standard deviation more developed than one’s birthplace.

Third, a primary survey of 208 Nampula residents conducted in early 2020 — three decades after the war — confirms lasting educational gains. IDPs displaced to Nampula have a 10 percentage point higher likelihood of completing primary school relative to their siblings who stayed in the countryside, and their educational attainment converged to levels of urban-born, never-displaced residents despite large urban-rural education gaps. However, IDPs report significantly lower social capital, civic participation, and community trust than urban-born respondents, and score significantly worse on mental health indicators, including depression, loneliness, and pessimism. These psychosocial costs persist three decades after the war’s end.

The findings apply to a low-income, post-colonial African setting characterized by widespread illiteracy (over 60%) and subsistence agriculture (over 85% of employment) at the war’s close. The results are robust to alternative age restrictions, extended family comparisons, dropping the oldest sibling, same-sex sibling pairs, and narrowing the age gap between sibling pairs to as few as two years.

Q: What is the core identification strategy and why is it preferred over cross-sectional estimates? A: The authors compare siblings within the same household who experienced different displacement trajectories during the war. Because siblings share household-level characteristics — parental preferences for education, ethnic and religious background, wealth, and local conflict exposure — the within-family design controls for confounders that would bias cross-sectional estimates. The within-family estimates are systematically smaller than cross-sectional ones (e.g., 7.3 pps vs. 24–30 pps for rural-to-urban displacement in primary school attendance), confirming that sorting was present even in the unpredictable civil war setting.

Q: What do the results show for rural-born IDPs displaced to urban centers? A: Within the sibling-pair framework, rural-born IDPs displaced to cities and towns have a 7.3 percentage point higher likelihood of attending primary school and 0.53 more years of schooling compared to their siblings who stayed in rural birthplaces, against a non-displaced sibling mean of approximately 20% primary school access and one year of formal schooling. These IDPs also show a 4 percentage point higher likelihood of non-agricultural employment five years after the war’s end.

Q: What do the results show for rural-born IDPs displaced to other rural areas? A: Even displacement to a different rural district — not a city — generates modest but statistically significant gains: a 3 percentage point higher likelihood of attending school and 0.24 additional years of schooling relative to siblings staying in their birthplace rural district. The authors interpret this as evidence for the uprootedness hypothesis, since rural Mozambique at the time was among the most impoverished and insecure environments in the world, meaning destination quality alone cannot explain the gain.

Q: What do the results show for externally displaced refugees? A: Refugees displaced to camps and settlements in Malawi, Zimbabwe, Tanzania, Zambia, and Swaziland show schooling levels statistically similar to their siblings who remained in their rural birthplaces, despite UN-built primary schools in camps. The authors attribute the absence of gains to low perceived returns to education stemming from scarce employment opportunities at displacement destinations. Externally displaced individuals do show a 5 percentage point lower likelihood of agricultural employment relative to staying siblings.

Q: What are the consequences of urban-to-rural forced displacement? A: Urban-born individuals forcibly relocated to the countryside — primarily through FRELIMO’s villagization and food production programs — have approximately 9 percentage point lower likelihood of attending primary school and 0.5 fewer years of schooling compared to siblings who remained in urban areas. These results indicate that FRELIMO’s coercive relocation policies imposed material human capital costs on the displaced.

Q: How are place-based and uprootedness effects separated empirically? A: The authors construct principal component indices for destination-origin differences in regional development (aggregating population density, Portuguese-speaking share, offspring mortality, road density, colonial market density, and school density) and conflict intensity (conflict events per capita and landmine contamination per capita). They then include these continuous exposure measures alongside a binary displacement indicator in within-family regressions. The coefficient on the binary displacement indicator — conditional on destination-origin development and conflict differences — isolates the uprootedness effect for individuals displaced to districts with identical characteristics to their birthplace.

Q: What are the magnitudes of the place-based and uprootedness effects? A: Under OLS, displacement to a district one standard deviation more developed than one’s birthplace raises schooling likelihood by approximately 3 percentage points. The residual uprootedness effect — displacement per se, controlling for destination quality — raises schooling likelihood by approximately 2 percentage points. Under 2SLS (instrumenting destination-origin development differences with the development of districts within 100 km of birthplace), the place-based effect rises to approximately 5 percentage points in the reduced form, and the uprootedness effect remains significant at approximately 4 percentage points. Both the uprootedness and place-based effects are of comparable magnitude.

Q: What instrument is used in the 2SLS specifications and what is its first-stage strength? A: The instrument exploits the fact that Mozambique’s heavily mined and rudimentary transportation network constrained civilian movement — the median displaced sibling ended up roughly 97 kilometers from birthplace. The authors instrument actual destination-origin development and conflict differences with the predicted differences based on the characteristics of districts within 100 km of the birthplace. The first-stage elasticity between actual and proximity-predicted differences in development is 0.86, and for conflict is 0.88, both precisely estimated.

Q: What do the long-run survey results from Nampula show about educational persistence? A: In a 2020 survey of 208 Nampula residents aged over 35, IDPs who fled to Nampula during the war have a 10 percentage point higher likelihood of completing primary school relative to their siblings who stayed in the countryside. Their educational attainment converges to the level of urban-born, never-displaced Nampula residents, despite large historical and contemporary urban-rural education gaps in northern Mozambique. The majority of IDPs (73%) report that extended relatives or friends advised them to attend school upon arriving in the city, and most believed education was necessary for urban employment.

Q: What are the long-run psychosocial costs documented in the Nampula survey? A: Even three decades after the war’s end, IDPs in Nampula report significantly lower social capital, civic participation, and community trust compared to urban-born never-displaced residents. IDPs also score significantly worse on mental health indicators including depression, loneliness, and pessimism. These findings suggest that forced displacement imposes persistent psychosocial costs that are not remediated by economic or educational convergence.

Q: What drives displacement in the data, and does selection threaten identification? A: Linear probability and multinomial logit models show that conflict intensity and geographic proximity (distance to the border for external displacement; distance to cities for urban displacement) are the primary correlates of displacement type, while differences in destination development are uncorrelated with displacement. Nevertheless, the overall explanatory power of these models is low, confirming many idiosyncratic and unpredictable features of the war. The within-family design addresses residual selection on household characteristics, and the 2SLS design addresses selection on destination-specific characteristics.

Q: How do educational gains translate into sectoral employment outcomes? A: Across specifications, gains in schooling move in tandem with a shift out of agriculture into services. Rural-to-urban IDPs have a 4 percentage point higher likelihood of non-agricultural employment five years after the war, while externally displaced show a 5 percentage point lower likelihood of agricultural employment. Urban-born IDPs displaced to the countryside are more likely to work in agriculture after the war. The authors interpret this co-movement as suggesting that conflict-driven human capital accumulation may contribute to structural transformation away from subsistence agriculture.

Q: How robust are the within-family estimates? A: The authors conduct six sensitivity checks: adding family fixed effects to cross-sectional regressions, restricting to individuals aged 12–18 in 1997 to address co-habitation concerns, extending comparisons to cousins and other relatives, dropping the oldest male sibling to minimize favoritism concerns, restricting to same-sex sibling pairs, and narrowing the age gap to two years. Across all permutations, the qualitative ordering is preserved: refugees show no significant schooling gains, rural-to-urban IDPs show gains of 5–6 percentage points in primary attendance and 0.35–0.5 extra years, rural-to-rural IDPs show small positive gains, and urban-to-rural IDPs show losses.

Uprootedness hypothesis: The idea, traced in the paper to Stigler and Becker (1977) and earlier scholars, that forced displacement incentivizes human capital investment precisely because education is a mobile asset that cannot be expropriated — distinct from place-based effects of destination quality.

Place-based (exposure) effects: The impact on human capital outcomes attributable to differences between the development level and conflict intensity of the displacement destination and the individual’s birthplace, measured as destination-origin differences in a principal component index of regional development.

Separated siblings design: An identification strategy that compares siblings from the same household who experienced different displacement trajectories during the war, holding constant all household-level characteristics including parental preferences, ethnicity, religion, wealth, and local conflict exposure.

Internal displacement (IDP): Conflict-driven movement within national borders to either rural areas or urban centers, constituting approximately 60% of global forced displacement and the majority of displacement in the Mozambican civil war context.

Source text origin: A categorization of the working paper text used for summarization — distinguishing full PDF or HTML text from abstract-only text. Abstract-only text is a hard block for summary generation in the pipeline.

Structural transformation: In this paper’s usage, the shift of workers out of subsistence agriculture into services associated with human capital accumulation triggered by conflict-driven displacement, treated as a potential mechanism of post-conflict recovery.

Psychosocial costs of displacement: Long-run deficits in social capital, civic engagement, community trust, and mental health (depression, loneliness, pessimism) reported by IDPs three decades after displacement, persisting despite convergence in educational attainment and employment.

Climate change and the macroeconomics of bank capital regulation

Mon, 01 Jan 0001 00:00:00 +0000

Layer 1 — Overview

Research Question

This paper asks two related questions about the intersection of climate policy and bank capital regulation. First, can differentiated bank capital requirements — imposing higher equity charges on loans to fossil energy firms — serve as a quantitatively meaningful climate policy instrument, in particular relative to carbon taxes? Second, how should optimal bank capital requirements respond to a carbon-tax-induced clean energy transition?

Methodology

The authors build a quantitative multi-sector DSGE model with two layers of default: corporate default at the firm level and bank failure at the bank level. Three intermediate goods sectors are modeled — non-energy, fossil energy, and clean energy — linked via a nested CES final-good production structure. Banks collect deposits from households (who value deposits for liquidity services) and issue defaultable loans to all three sectors. Deposit insurance, combined with limited liability for bank owners, generates an inefficiently high bank risk-taking motive, creating a role for capital regulation. The Ramsey-optimal capital requirement balances the social benefit of liquid deposit provision to households against the social cost of bank failure.

The model is calibrated to quarterly data, targeting a 0.7% annualized bank failure rate, a 2% annualized corporate default rate, a 30% loan recovery rate, a deposit spread of -100 basis points, and a baseline Ramsey-optimal equity requirement of 8% (consistent with Basel III). Sectoral parameters follow Bartocci, Notarpietro, and Pisani (2022) and Fried, Novan, and Peterman (2022): the energy-to-non-energy elasticity of substitution is 0.2, the clean-to-fossil energy elasticity is 3, and full abatement occurs at carbon taxes exceeding 125 $/tonne of carbon (ToC). The clean transition experiment imposes a linear carbon tax path from zero to 10 $/ToC over 40 quarters, announced as an unanticipated but fully credible shock.

Main Findings

Finding 1 — Fossil-penalizing capital requirements are quantitatively negligible as climate policy. Raising the capital requirement on fossil loans from the baseline 8% to 12% (a 150% risk-weight, consistent with current BB- treatment) reduces the fossil capital share within the energy sector by only 0.06 percentage points (from 80.00% to 79.94%) and cuts aggregate emissions by only 0.08%. A 1 $/ToC carbon tax, by contrast, achieves a 5.23% emission reduction while modestly reducing the fossil capital share to 79.80%. The difference arises because capital requirements affect only the size and financing cost of fossil firms, leaving abatement incentives unchanged; the loan-rate effect on fossil firms is small (loan rate rises from 124 bps to 128 bps), consistent with Kashyap, Stein, and Hanson (2010).

Finding 2 — Sustainability-linked capital requirements remain insufficient. Conditioning the fossil capital requirement on firms’ abatement effort (κ_f = 0.12 − η_t) induces an optimal abatement effort of 2.69% and an effective fossil requirement of approximately 9.5%. The implied emission reduction remains far below even a modest carbon tax: the authors state the induced emission reduction falls short by a factor of almost 100 relative to full abatement.

Finding 3 — Ramsey-optimal capital requirements decline monotonically along the transition (in the baseline real model). When a carbon tax gradually rises from zero to 10 $/ToC over 40 quarters, aggregate loan demand contracts permanently because clean, fossil, and non-energy goods are imperfect substitutes and the shock is recessionary for GDP. Banks reduce balance sheets, deposit supply falls, the deposit spread widens by approximately 8 basis points in the long run, and corporate default rates across all sectors rise by almost 0.1 percentage points from the baseline of 2.05% (in steady state). To counteract the deposit scarcity and associated firm risk-taking, the Ramsey-optimal capital requirement declines symmetrically and monotonically to a lower long-run level. Bank capital regulation cannot affect impact default rates because leverage decisions are made before the transition is announced.

Finding 4 — Nominal rigidities produce a temporary tightening before the long-run relaxation. When debt is denominated in nominal terms and Rotemberg price adjustment costs are added, the clean transition is inflationary in the short run (consistent with Ciccarelli and Marotta 2021). Inflation makes deposit financing more attractive, inducing firms to temporarily increase nominal loan issuance; real deposits rise briefly, the deposit spread narrows by around 2 basis points, and the optimal capital requirement tightens over the initial phase of the transition before converging to the same lenient long-run level as the baseline. The short-run tightening is followed by a permanent relaxation.

Finding 5 — Differentiated sector-specific capital requirements are only warranted when banks are not diversified across sectors. In the baseline, perfectly diversified banks face a symmetric aggregate loan demand contraction, so uniform adjustment suffices. When sector-specific banks are introduced (an extreme case meant to bound concentration effects), fossil banks experience a strong reduction in deposit supply while clean banks experience the opposite. The optimal response is temporarily tighter capital requirements for clean banks and relaxed requirements for fossil banks. In the long run, both converge to an aggregate risk-weight of approximately 99.85% relative to the baseline (a small but symmetric relaxation), very close to the diversified baseline.

Scope Conditions

All results are derived within a model calibrated to match broad financial-market and macroeconomic regularities rather than a specific country. Physical risk from climate change is abstracted away throughout. The carbon tax is set exogenously (not derived from a climate policy optimum). Firms cannot switch technologies, providing a conservative lower bound on the sectoral reallocation. Results are robust to halving the deposit demand elasticity parameter (γ_D = 0.6 versus 1.5 in the baseline) and to raising the energy/non-energy substitution elasticity to 3 from 0.2.

In depth

Q1. What is the core trade-off that determines the optimal level of bank capital requirements in this model?

A: The optimal capital requirement balances two welfare-relevant effects of bank leverage. Tighter requirements reduce bank failure rates, limiting the resource losses (proportional to deposits under DIA management) and the inefficient risk-taking that deposit insurance induces. At the same time, tighter requirements force banks to reduce deposit-financed lending, shrinking the supply of liquid deposits that households value directly in utility. The Ramsey planner chooses the capital requirement that equates the marginal welfare benefit of lower bank failure against the marginal welfare cost of reduced deposit provision. In the baseline calibration this optimum is at 8%.

Q2. Why does raising capital requirements on fossil loans have such a small effect on carbon emissions?

A: Capital requirements affect the deposit-financing wedge for fossil loans — the share of loans that can be funded via cheap, deposit-financed sources — but they do not enter firms’ first-order condition for abatement. Firms respond by modestly reducing leverage and investment (the loan rate for fossil energy firms rises from 124 bps to 128 bps), but the emission intensity of fossil production is unchanged. In equilibrium, the fossil capital share within the energy sector declines by only 0.06 percentage points (from 80.00% to 79.94%), reducing total emissions by 0.08%. A 1 $/ToC carbon tax produces a 5.23% emission reduction, many times larger, because carbon taxes directly alter the return to abatement and the profitability of fossil relative to clean production.

Q3. How does the sustainability-linked capital requirement work and why is it still insufficient?

A: Under sustainability-linked capital requirements, the fossil loan charge is set as κ_f = κ̃ − η_t, so firms that abate more face lower capital requirements on their loans and thus lower financing costs. This creates a direct financial incentive for abatement that the simple penalizing factor lacks. With κ̃ = 0.12, the equilibrium abatement effort is 2.69% and the effective fossil requirement falls to approximately 9.5%. Despite this improvement relative to the plain fossil factor, the climate impact remains far smaller than even a modest carbon tax: the induced emission reduction falls short by a factor of almost 100 relative to full abatement. The fundamental limitation is that the feedback from abatement to financing cost is attenuated by deposit-financing wedge mechanics, making the instrument too weak to substitute for direct carbon pricing.

Q4. What are the impact, short-run, and long-run effects of the clean transition on default rates and bank failure?

A: On impact, the unexpected compliance cost increase raises fossil firms’ default threshold, causing a sharp but short-lived uptick in fossil firm default rates (from 2.05% to approximately 2.08% in the baseline transition) and a brief increase in bank failure. Clean firm defaults fall slightly on impact due to higher clean energy prices. In the short run, clean firms increase risk-taking (higher leverage) because the relative attractiveness of debt financing improves as deposit spreads widen; fossil firms deleverage. In the long run, aggregate corporate default rates rise by almost 0.1 percentage points from the baseline of 2.05% (equivalently 2.7% in the Appendix B long-run analysis), driven by the widening of the deposit spread (approximately 8 bps), which raises the deposit financing wedge for all firms. Bank failure rates are always tied to binding capital requirements and revert quickly to their steady-state level.

Q5. Why can bank capital regulation not mitigate the impact default spike when the transition is announced?

A: At the moment of announcement, leverage decisions for the current period have already been made. The bank capital requirement binds on new lending decisions but cannot alter the existing capital structure of banks or firms. Therefore the regulator faces a “bygone” on impact: changing the capital requirement in the announcement period does not affect current corporate default rates or bank failure rates. The regulator’s tool only becomes effective for lending decisions going forward, implying that the transition-induced impact default surge cannot be smoothed by macroprudential policy.

Q6. Why do Ramsey-optimal capital requirements decline along the transition rather than tighten to address higher default risk?

A: The key channel is that aggregate loan demand contracts permanently as imperfect substitutability across sectors makes the carbon tax recessionary. Banks shrink their balance sheets, reducing deposit supply. The resulting deposit scarcity makes deposits more valuable to households (widening the spread), which also makes deposit financing cheaper for banks, partially offsetting the loan demand decline but at the cost of higher corporate leverage. The welfare loss from reduced liquidity provision and higher firm default rates dominates, so the planner relaxes capital requirements to stimulate deposit supply. The dominant effect is the large, permanent decline in credit demand, which makes it welfare-improving to allow banks to operate at lower capital ratios to rebuild deposit provision.

Q7. What is the role of the deposit financing wedge in transmitting carbon tax shocks to the entire corporate sector?

A: The deposit financing wedge (Ξ_t) reflects the benefit for banks of funding loans through deposits rather than equity, combining the liquidity premium households pay on deposits and the deposit insurance put (expected repayment is only 1 − F(μ_{t+1}) per unit of deposits issued). When aggregate loan demand falls due to carbon taxes, deposits become scarcer relative to their steady-state level, making the wedge larger. Through the loan pricing condition, all sectors — not just fossil — face more attractive deposit-financed debt, causing clean and non-energy firms to also increase their leverage and default risk along the transition. This is the mechanism through which a sector-specific shock has symmetric aggregate effects that shape optimal bank regulation.

Q8. How do nominal rigidities change the optimal path of capital requirements along the clean transition?

A: With Rotemberg price adjustment costs and nominally denominated debt, the clean transition is inflationary in the short run (consistent with empirical evidence in Ciccarelli and Marotta 2021). Inflation lowers the real value of outstanding nominal loan obligations, incentivizing firms across all sectors to temporarily increase nominal borrowing. Banks accommodate this demand by increasing deposit issuance, which briefly narrows the deposit spread by around 2 basis points. With deposit supply temporarily elevated, the regulator’s trade-off tilts toward reducing bank failure rather than stimulating deposit provision, so optimal capital requirements tighten during the inflationary phase before reverting to the lenient long-run path of the baseline model. The long-run level is unchanged.

Q9. Under what conditions are sector-specific capital requirements welfare-improving?

A: Sector-specific requirements are only welfare-improving when banks are not perfectly diversified across sectors, so that the transition has heterogeneous effects on sector-specific deposit supply and bank failure rates. In the baseline with perfectly diversified banks, the loan demand decline affects all banks uniformly, so a symmetric uniform adjustment is optimal. When sector-specific banks are introduced as an extreme case of carbon concentration, fossil banks experience a sharp reduction in deposit provision while clean banks see deposits temporarily increase. The planner responds by temporarily relaxing requirements for fossil banks and tightening them for clean banks. In the long run, both converge to approximately the same aggregate relaxation as the diversified baseline (aggregate risk-weight of 99.85%).

Q10. How does the carbon tax shock experiment relate to the perfect-foresight transition analysis?

A: In the carbon tax shock experiment, the tax level follows an AR(1) process with persistence ρ_τ = 0.9, starting from a long-run level of 10 $/ToC, with a one-standard-deviation shock implying an additional 10 $/ToC on impact. Fossil firm default rates spike from 2% to approximately 2.8% on impact and revert relatively quickly. Emissions decline by slightly more than 10% on impact and revert as the shock dissipates. The macroeconomic dynamics — GDP, investment, loan demand, and bank failure rate responses — closely resemble the impact and short-run effects of the perfect-foresight transition. Optimal capital requirements decline temporarily in both cases, confirming that the transition-path results are not an artifact of the specific perfect-foresight assumption.

Q11. What is the “forced safety effect” and how does it interact with the model’s capital requirement trade-off?

A: The “forced safety effect” (following Bahaj and Malherbe 2020) refers to the positive effect of tighter capital requirements on loan supply that operates through reducing bank failure probability. When banks are less likely to fail (lower F(μ_{t+1})), the expected bank productivity conditional on not failing — (1 − G(μ_{t+1})) — rises toward one, reducing the discount applied to future loan payoffs in the bank’s stochastic discount factor. This improves the profitability of lending and expands loan supply. In the model, this effect partially offsets the direct loan-supply reduction from higher equity requirements but does not dominate, so the overall effect of tighter requirements on deposit supply is still negative, preserving the core trade-off.

Q12. What robustness checks are performed and do they materially change the main results?

A: The authors consider three main robustness checks. First, reducing the deposit demand elasticity parameter from γ_D = 1.5 to γ_D = 0.6 (recalibrating ω_D = 0.012 to preserve the -100 bp deposit spread target) has almost no effect on the optimal path of capital requirements. Second, raising the energy/non-energy substitution elasticity from ε̃ = 0.2 to ε̃ = 3 (and adjusting the energy weight to maintain a 10% energy share) produces much stronger fossil investment declines and smaller clean investment responses, but aggregate loan demand and bank deposits contract only slightly less, so the relaxation in capital requirements is slightly smaller than in the baseline. Third, recalibrating to a 2% annualized bank failure rate (versus the baseline 0.7%) does not materially change results. The conclusion that capital requirements should decline along the transition is robust across all specifications.

Key Concepts

Deposit financing wedge (Ξ_t): The gain for banks from funding loans via deposits rather than equity. It comprises two components: (i) the liquidity premium — households value deposits for their liquidity services, so the deposit rate lies below the risk-free rate; and (ii) the deposit insurance put — the expected repayment obligation per unit of deposits is only 1 − F(μ_{t+1}), not one, since the DIA covers depositors in the event of bank failure. A larger wedge makes deposit-financed lending more profitable, expanding loan supply. In this paper the wedge is the central transmission mechanism through which capital requirements and aggregate loan demand interact.

Bank failure threshold (μ_t): The realization of the bank-specific idiosyncratic risk shock below which a bank cannot service depositors and transfers all assets and liabilities to the deposit insurance agency. It depends on the ratio of deposit repayment obligations to the aggregate realized loan portfolio return. In the model the threshold increases when aggregate loan payoffs fall (as in a carbon tax shock), temporarily raising bank failure rates.

Ramsey-optimal capital requirement: The sequence of sector-specific (or uniform) capital ratios chosen by a benevolent government planner to maximize household welfare, treating the capital requirement as the sole policy instrument. In this model the Ramsey problem is solved nonlinearly along the perfect-foresight transition path. The planner internalizes that tighter requirements simultaneously reduce bank failure probability and shrink deposit supply; the optimum trades off these two objectives.

Sustainability-linked capital requirement: A capital requirement on fossil loans that explicitly depends on the abatement effort undertaken by fossil firms (κ_f = κ̃ − η_t), creating a direct financing-cost incentive for emission reduction. This contrasts with a plain fossil penalizing factor, which affects only the financing cost of fossil capital without altering abatement incentives. The paper shows that even sustainability-linked requirements are quantitatively negligible as climate policy relative to carbon taxes.

Carbon compliance cost per unit of fossil production (ξ_t): A summary statistic combining the direct carbon tax payment and the abatement cost at the optimal abatement effort. It measures the total policy-induced wedge that reduces the profitability of fossil capital and raises fossil firms’ break-even default threshold. In the transition experiment, compliance costs rise from zero to approximately 4% of fossil production value as the tax increases from 0 to 10 $/ToC.

Asset stranding channel: The mechanism through which an unanticipated tightening of carbon policy raises fossil firms’ default probability on impact (by increasing compliance costs above the level priced into existing loan contracts) and subsequently reduces their loan demand permanently. The paper contrasts its treatment of this channel — where stranding affects bank regulation through aggregate deposit supply effects — against models (such as Carattini, Melkadze, and Heutel 2023) where stranding causes an inefficient credit crunch via a financial accelerator.

Deposit spread (s^D_t): Defined as the annualized difference between the deposit rate and the risk-free rate, expressed in basis points. Because households value deposits for liquidity services, the deposit rate lies permanently below the risk-free rate (spread is negative). In the baseline calibration the target is -100 bps. The spread widens (becomes less negative) when deposits become scarcer, which is the case along the carbon tax transition as bank balance sheets contract.

Comment on 'Asset Bubbles and Overlapping Generations' by Tirole

Mon, 01 Jan 0001 00:00:00 +0000

Tirole (1985) studied an overlapping generations model with capital accumulation and showed that the emergence of asset bubbles can resolve the capital over-accumulation problem when the economy is dynamically inefficient. His Proposition 1(c) claims that a bubble can emerge if and only if the dividend growth rate exceeds the bubbleless steady-state interest rate. This comment identifies an error in that proposition: the stated condition is necessary but not sufficient for bubble existence. The paper constructs an explicit counterexample in which the dividend growth rate exceeds the bubbleless interest rate but no bubble equilibrium exists, and separately constructs a case in which a bubble exists even when the condition in Proposition 1(c) fails. Corrected necessary and sufficient conditions for bubble existence are derived, and the implications of the correction for the welfare results and the relationship between dynamic inefficiency and bubbles are characterized.

In depth

Q1. What is the error in Tirole’s Proposition 1(c)?

The error is in the sufficiency direction: Tirole argued that whenever the dividend growth rate exceeds the bubbleless interest rate, a bubble equilibrium exists; Pham and Toda construct a parameter configuration satisfying this condition where no bubble equilibrium exists, because the continuity argument used in Tirole’s proof fails at boundary parameter values. The necessity direction — that bubble existence requires this rate comparison — is not challenged.

Q2. How do the corrected conditions change the interpretation of dynamic inefficiency?

Tirole’s original result linked bubbles tightly to dynamic inefficiency (r < g), providing a clean condition for when bubbles are both feasible and welfare-improving by absorbing excess saving. The correction weakens this link: bubble existence requires additional structural conditions beyond the rate comparison, meaning dynamic inefficiency is a necessary but not sufficient condition for bubbles in the Tirole framework. Policy prescriptions based on the r < g condition for bubble welfare analysis need qualification.

Key concepts

dynamic inefficiency : the OLG condition in which the interest rate falls below the growth rate, making intergenerational transfers from young to old welfare-improving; related to but not sufficient for bubble existence under the corrected Tirole conditions.

bubble existence condition : the necessary and sufficient conditions under which an asset bubble can emerge and persist in the Tirole OLG model; the corrected version requires more than the dividend-growth-rate-exceeds-interest-rate comparison of the original Proposition 1(c).

Comment on "Artificial Intelligence and Technological Unemployment" by Wang and Wong

Mon, 01 Jan 0001 00:00:00 +0000

This comment, written by J. Carter Braxton (University of Wisconsin), discusses the paper “Artificial Intelligence and Technological Unemployment” by Wang and Wong (2025), which develops and quantifies an equilibrium labor search model to evaluate the employment effects of spreading AI. Wang and Wong’s central finding is that improvements in AI quality will increase productivity by a factor of three while reducing employment by 23%, with approximately half of the employment decline occurring within the next five years. Braxton’s comment serves two purposes: first, to clarify the model’s structural channels through which AI affects employment; and second, to bring empirical evidence from the spread of computers in the 1980s–2000s to bear on the relative magnitude of those channels.

Braxton identifies two competing forces within Wang and Wong’s framework. The job destruction channel arises from endogenous separations: as AI quality improves, firms increasingly replace matched workers with AI, raising outflows from employment. The job creation channel arises from the free-entry condition: rising AI quality increases firm profits on all matches, inducing firms to post more vacancies, which raises workers’ job-finding rates and employment inflows. Whether aggregate employment rises or falls depends on which channel dominates — a quantitative question the authors resolve through calibration, finding the job destruction channel dominant. Braxton notes that three modeling choices (learning-by-using, the requirement that firms must be matched with a worker to adopt AI, and disembodied technological change) each push against the job-destruction result, making the authors’ findings more striking.

Braxton then evaluates the relative strength of these channels using the historical spread of personal computers. Drawing on Bick, Blandin, and Deming (2024), he notes that workplace AI adoption in 2024 follows nearly the same time trend and income-distribution profile as computer adoption in 1984, making computers a plausible historical analog. Using the CPS Computer Supplement (1984–2003), Braxton measures the change in computer usage by occupation and regresses it against the change in employment-to-unemployment (EU) transition rates by occupation. The estimated coefficient is 0.0146 (robust SE 0.0064), indicating that occupations with higher computer adoption rates saw higher flows into unemployment — confirming that a job destruction channel was active during the computer era. However, regressing the change in log occupation-level employment (1980–2000 Census) on the change in computer usage yields a coefficient of 0.7761 (robust SE 0.2658), with a positive slope indicating that occupations more exposed to computers saw higher employment growth. For the computer episode, therefore, the job creation channel dominated the job destruction channel — the opposite of Wang and Wong’s AI projection.

Braxton also cites his own prior work showing that even when job creation and destruction balance in aggregate, workers displaced by technological change face lasting earnings losses and elevated permanent income risk, raising the question of how to optimally insure these workers.

The comment concludes by identifying avenues for future research: introducing occupational heterogeneity (with some occupations more exposed to AI than others) and worker heterogeneity (skills that are complements versus substitutes to AI). The central open question is whether AI is qualitatively different from prior episodes of technological change, and if so, why.

Q1: What are the two central channels through which AI quality affects employment in Wang and Wong’s model, and how do they operate? The job destruction channel operates through endogenous separations: as AI quality (At) improves, firms that are matched with workers are more likely to replace them with AI at rate ρ, adding the term ρµAt Ht It to outflows from employment in the law of motion for employment. The job creation channel operates through the free-entry condition: higher AI quality raises firm profits on all existing matches (because technological change is disembodied, benefiting matches formed today with future AI gains), inducing firms to post more vacancies, which via free entry reduces the firm’s matching probability but raises the worker’s job-finding rate αt and thereby increases employment inflows. The net employment effect depends on which channel quantitatively dominates.

Q2: What is Wang and Wong’s quantitative finding about the aggregate employment and productivity effects of AI? Using a calibrated equilibrium labor search model, Wang and Wong find that the spread of AI will increase productivity by a factor of three while reducing employment by 23%. Approximately half of the employment decline is projected to occur within the next five years. A version of the model holding job-finding rates fixed yields a similar result, indicating that through the lens of their model the job creation channel is quantitatively small and the job destruction channel dominates.

Q3: What three modeling choices push against Wang and Wong’s job-destruction result, and why does Braxton view this as making the finding more striking? First, AI improves through “learning by using” — it learns from all output being produced — which creates an incentive for employment to remain elevated to accelerate AI learning, dampening job destruction. Second, firms can only adopt AI if currently matched with a worker, which creates an incentive for vacancy posting and pushes in favor of job creation. Third, AI improvements are disembodied (raising productivity in all matches, including those formed before the improvement), which increases the value of forming new matches today and strengthens job creation. Because each of these assumptions pushes against the job destruction result, Braxton argues that finding job destruction dominant despite these model features makes the result more striking.

Q4: How does Braxton use the historical spread of computers to assess the job destruction and job creation channels? Braxton measures occupation-level computer adoption as the change in the share of CPS Computer Supplement respondents who reported using a computer at work between 1984 and 2003 (denoted ΔCPUo,84–03), using occupation codes from Autor and Dorn (2013). He then regresses the occupation-level change in EU transition rates (ΔEUo,84–03, from monthly CPS micro data) on ΔCPUo,84–03 to measure the job destruction channel, and separately regresses the change in log occupation-level employment (Δlog Eo,80–00, from the 1980 and 2000 Census IPUMS) on ΔCPUo,84–03 to assess the net employment effect. A positive coefficient on the employment regression indicates job creation dominates; a negative coefficient indicates job destruction dominates.

Q5: What do the regression results show about the job destruction and job creation channels during the computer era? The job destruction regression yields a coefficient of β = 0.0146 (robust SE = 0.0064, R² = 0.0178), indicating that occupations with higher computer adoption rates did see higher employment-to-unemployment transition rates — the job destruction channel was present. However, the employment-level regression yields a coefficient of β = 0.7761 (robust SE = 0.2658, R² = 0.0348), with a positive slope indicating that occupations more exposed to computers experienced higher employment growth between 1980 and 2000. Thus, for the computer episode, the job creation channel dominated the job destruction channel — the opposite of what Wang and Wong project for AI.

Q6: What is the basis for treating the computer episode as a relevant analog to the spread of AI? Braxton cites Bick, Blandin, and Deming (2024), who show that AI adoption in the workplace in 2024 is following nearly the same aggregate time trend as the spread of personal computers in the early 1980s. Moreover, the distribution of AI usage across the income distribution in 2024 is nearly identical to computer usage across the income distribution in 1984: for both technologies, workplace usage peaks between the 80th and 90th percentiles of the income distribution before declining modestly at the top. Bick et al. (2024) also show the similarities hold by education level and age.

Q7: Even if job creation and destruction balance in aggregate, what does prior work suggest about the distributional consequences for workers? Braxton and Taska (2023) show that workers in occupations more exposed to technological change (measured by changes in computer and software task requirements) suffered larger earnings losses following displacement. Braxton, Herkenhoff, Rothbaum, and Schmidt (2024, forthcoming AER) show that workers in occupations more exposed to technological change experienced larger increases in permanent income risk between the 1980s and 2010s. These findings imply that even if AI does not reduce aggregate employment, workers who are displaced will face deteriorating labor market prospects, raising the question of how to optimally provide insurance.

Q8: What policy implication does Braxton draw from the distributional consequences of technological change? Braxton and Taska (2025, forthcoming Review of Economic Dynamics) show that technological change expands the motive for governments to provide retraining subsidies. Braxton argues that if AI represents an acceleration of technological change, even larger retraining subsidies — and potentially other forms of insurance — may be needed for displaced workers.

Q9: What are the main avenues for future research identified in the comment? Braxton identifies two principal directions. First, introducing occupational heterogeneity into the Wang-Wong framework, so that some occupations are more exposed to AI displacement than others, would allow the model to generate richer distributional implications. Second, allowing worker heterogeneity in skills — distinguishing skill dimensions that are complements to AI from those that are substitutes — would permit the model to capture differential effects across the workforce. The overarching research question is whether AI is qualitatively different from prior technological change episodes, and if so, to identify the precise mechanisms that make it different.

Job destruction channel: In the Wang-Wong model, the increase in endogenous separations driven by firms replacing matched workers with AI as AI quality improves. Formally, this is the term ρµAt Ht It in the law of motion for employment, representing separations that occur when a firm adopts AI and the worker and firm cannot renegotiate a mutually acceptable wage.

Job creation channel: The increase in vacancy posting and worker job-finding rates induced by rising AI quality. Because higher AI quality raises firm profits on all matches (via disembodied technological change), the free-entry condition implies firms post more vacancies, lowering the firm’s matching probability but raising the worker’s job-finding rate αt, increasing employment inflows.

Free-entry condition: The equilibrium condition equating the cost of posting a vacancy (κt) to the expected benefit (the probability of matching ft times the firm’s match surplus Πt). This condition pins down the job-finding rate for workers: when firms find it more profitable to post vacancies, αt rises.

Disembodied technological change: The modeling assumption that AI quality improvements raise productivity in all existing matches, not just those formed after the improvement. This means future AI gains benefit matches formed today, increasing the incentive to create new matches and pushing in favor of the job creation channel.

Learning by using: The mechanism in Wang-Wong whereby AI quality (At) improves as a function of current aggregate employment (Ht) and the learning rate µ. Because AI learns from all output being produced, maintaining higher employment accelerates AI improvement, creating a motive that partially offsets the job destruction channel.

Employment-to-unemployment (EU) transition rate: The rate at which employed workers flow into unemployment in a given occupation, used by Braxton as the empirical measure of the job destruction channel during the computer episode. Measured from monthly CPS micro data.

Capitalization effect: The tendency for firms to post more vacancies today in anticipation of future productivity improvements, because the cost of posting is paid upfront while the benefits of a future-better-AI accrue to the match going forward. Referenced by Braxton as relevant to understanding the job creation channel in Wang-Wong’s framework (citing Pissarides (2000), Chapter 3).

Comment on: Is it AI or data that drives market power?

Mon, 01 Jan 0001 00:00:00 +0000

This paper is a published comment by Miao Ben Zhang (USC Marshall School of Business) on Mihet, Rishabh, and Gomes (2025), “Is It AI or Data That Drives Market Power?” Zhang identifies three contributions of the commented paper and benchmarks each against the existing literature, offering targeted suggestions for strengthening the analysis.

The first contribution Zhang discusses is the commented paper’s distinction between raw data, AI capability, and processed data. Raw data is modeled as a by-product of production linearly related to firm size; processed data is modeled as the abundance of signals improving the precision of firms’ next-period productivity predictions. The commented paper’s key modeling innovation is a formula linking raw data (n_{i,t}), firm-level AI capability (z_i), and processed data (n_{i,t}-tilde): processed data equals a weighted sum of an information entropy effect — e^(-z_i) * (-n_{i,t} * ln(n_{i,t})) — and an AI capability effect — (1 - e^(-z_i)) * n_{i,t} * e^(n_{i,t}). Zhang notes this formula implies that the marginal value of raw data can turn negative for firms with low AI capability, consistent with information-theoretic constraints from the rational inattention literature (Sims, 2003). Zhang requests more empirical support for this equation, specifically asking whether low-AI firms exhibit lower TFP than high-AI firms at similar data-intensity levels, and encouraging discussion of existing measures of data-processing ability such as human capital in data engineering and ML pipeline automation.

The second contribution is the commented paper’s modeling of a secondary market for trading processed data among firms. Zhang notes that facilitating processed data markets — for example via APIs or structured knowledge sharing — can, per the commented paper’s simulation and empirical analysis, democratize innovation and reduce market concentration, enabling even low-AI firms to compete. Zhang flags that the paper is silent on firm acquisition as an alternative channel for accessing processed data, arguing this omission is significant given that processed data, unlike ideas or technologies, is less portable and cannot be obtained simply by poaching skilled employees.

The third contribution is the commented paper’s empirical strategy. The commented paper constructs firm-level proxies for AI intensity and data intensity, then exploits two exogenous technological shocks — the advent of AWS cloud computing and transformer-based architectures — to identify causal effects of improvements in compute and processed data accessibility. The evidence shows that compute improvements disproportionately benefit data-rich firms, while processed data access disproportionately benefits low-AI firms. The central empirical message is that access to raw data tends to foster market concentration, whereas access to processed data tends to reduce market concentration. Zhang raises a measurement concern: the commented paper relies on firm-level Herfindahl-Hirschman Index (HHI) calculations based on time-varying, text-based industry definitions (Hoberg and Phillips, 2016). Zhang argues a positive effect on this HHI could reflect either genuine firm growth relative to competitors or reclassification of the firm into different, possibly more concentrated, sectors — making the HHI measure alone insufficient to support claims about product market concentration. Zhang recommends complementing this with industry-level concentration measures anchored to fixed baseline industry codes (FIC codes from Hoberg and Phillips, 2016), constructed at the FIC-year level, following the approach of Gutierrez and Philippon (2017) on industries’ growth and median Q.

No quantitative magnitudes from regressions or calibrations are reported in the comment itself, as this is a discussion piece rather than an original empirical paper. All claims above are drawn directly from the text.

Q: What are the three contributions of Mihet, Rishabh, and Gomes (2025) that Zhang identifies? A: First, the paper explicitly models the distinct roles of raw data, AI capability, and processed data, linking the information entropy literature to firm production. Second, it models a secondary market for trading processed data among firms, relevant for policy on data sharing platforms. Third, it empirically tests the model’s predictions using firm-level proxies and two exogenous technological shocks.

Q: What is the core formula linking raw data, AI capability, and processed data in the commented paper? A: Processed data (n_{i,t}-tilde) equals e^(-z_i) * (-n_{i,t} * ln(n_{i,t})) plus (1 - e^(-z_i)) * n_{i,t} * e^(n_{i,t}), where z_i is firm-level AI capability and n_{i,t} is raw data. The first term captures the information entropy effect (which can reduce or negate the value of raw data for low-AI firms) and the second captures the AI capability effect (where AI turns raw data into abundant useful signals).

Q: Why can the marginal value of raw data turn negative, according to the framework? A: Information-theoretic constraints — long studied through concepts like Shannon entropy and Sims’s rational inattention — imply that unprocessed raw data may harm rather than help firms that lack adequate processing capabilities. Zhang situates this in the broader macro-finance literature on information choice (Sims, 2003; Veldkamp, 2011).

Q: What empirical suggestion does Zhang make regarding the raw data versus AI capability distinction? A: Zhang asks whether, in the commented paper’s sample of publicly-traded firms with measures of data intensity and AI intensity, low-AI firms exhibit lower TFP (following Imrohoroglu and Tuzel, 2012) than high-AI firms when controlling for similar levels of data intensity. Zhang also encourages discussion of anecdotal evidence for negative information entropy effects and of existing measures of data processing ability such as human capital in data engineering, annotation, cleaning, or ML pipeline automation (Abis and Veldkamp, 2024).

Q: What is the policy relevance of the secondary market for processed data? A: The commented paper’s simulation and empirical analysis shows that facilitating processed data markets (e.g., via APIs or structured knowledge sharing) can democratize innovation and reduce market concentration, enabling even low-AI firms to compete. This aligns with recent literature on secondary markets for structured data and foundation model outputs (Gans, 2018, 2024; Conti et al., 2023, 2024; Athey, 2019). Platforms may have incentives to restrict processed data access, potentially reinforcing incumbent power (Carballa Smichowski et al., 2023).

Q: What channel does Zhang argue the commented paper neglects in its analysis of market concentration? A: Zhang argues the paper is silent on firm acquisition as an alternative means by which firms access processed data, noting that processed data is less portable than ideas or technologies — it cannot be obtained simply by poaching a skilled employee. Zhang contends this acquisition channel appears central to the paper’s focus on market concentration and encourages the authors to include a discussion of it.

Q: What is the central empirical finding of the commented paper regarding raw versus processed data and market concentration? A: Access to raw data tends to foster market concentration, while access to processed data tends to reduce market concentration. The evidence shows that compute improvements (proxied by the AWS shock) disproportionately benefit data-rich firms, while processed data accessibility (proxied by the transformer architecture shock) disproportionately benefits low-AI firms, consistent with theoretical predictions.

Q: What is Zhang’s specific concern about the HHI measure used in the commented paper? A: The commented paper constructs firm-level HHI using time-varying, text-based industry definitions (Hoberg and Phillips, 2016). Zhang argues a positive effect on this HHI is ambiguous: it could reflect genuine firm growth relative to competitors or reclassification of the firm into different, possibly more concentrated, sectors. Zhang concludes that the HHI measure alone is not strong enough to support claims about product market concentration.

Q: What robustness check does Zhang recommend for the empirical analysis? A: Zhang recommends constructing industry-level concentration measures at the FIC-year level using fixed baseline FIC codes from Hoberg and Phillips (2016), available at the Hoberg-Phillips Data Library. The authors could then analyze how industries with high versus low average or median AI intensity and data intensity respond to the two technological shocks in terms of concentration. Zhang cites Gutierrez and Philippon (2017) as an example of this approach and notes it would help distinguish within-industry dynamics from shifts in firm business focus, aligning with best practices from De Loecker, Eeckhout, and Unger (2020) on persistent market power.

Raw data: A by-product of firms’ production, modeled as linearly related to firm size; represents unprocessed observations that have not yet been transformed into useful signals. Distinguished from processed data, which is what actually improves productivity predictions.

Processed data: Modeled as the abundance of signals that improves the precision of firms’ predictions of their next-period productivity (following Farboodi and Veldkamp, 2022). Unlike ideas or technologies, processed data is less portable and cannot easily be transferred by poaching skilled employees.

AI capability (z_i): Firm-level ability to transform raw data into processed data. Firms with low AI capability may receive negative marginal value from additional raw data due to information entropy effects; firms with high AI capability extract large gains from the same raw data.

Information entropy effect: The component of the raw-to-processed-data transformation — e^(-z_i) * (-n_{i,t} * ln(n_{i,t})) — that captures the information-theoretic cost of possessing raw data without adequate processing capability. At low AI capability, this effect can reduce or negate the precision of signals.

Secondary market for processed data: A market in which firms trade processed data, modeled in the commented paper as a platform or API-based exchange. The commented paper’s analysis shows this market can democratize innovation and reduce market concentration by enabling low-AI firms to access processed data they cannot produce internally.

Firm-level HHI (text-based): Herfindahl-Hirschman Index calculated using time-varying, text-based industry definitions (Hoberg and Phillips, 2016). Zhang identifies a measurement ambiguity: a positive effect on this measure could reflect genuine competitive gains or reclassification into more concentrated sectors.

Committed to flexible fiscal rules

Mon, 01 Jan 0001 00:00:00 +0000

A central debate in fiscal policy is whether fiscal rules—numerical constraints on budget deficits or debt levels—impair a government’s ability to respond to adverse economic shocks, creating a fundamental trade-off between debt stabilization and macroeconomic stabilization. This paper uses data on large, random natural disasters as exogenous shocks to address the endogeneity of rule adoption and provides new empirical and theoretical evidence on this trade-off. Contrary to the trade-off hypothesis, countries with fiscal rules perform significantly better following such disasters than countries without rules: GDP and private consumption are persistently higher, and fiscal policy is significantly more expansionary. The superior performance is shown to depend on the existence of prior fiscal space and the presence of escape clauses in the rules. A model of sovereign default with endogenous fiscal space and tax plans rationalizes these findings: tight rules prevent myopic governments from accumulating excessive debt in good times, which creates fiscal space for deficit spending when disasters strike, keeping sovereign spreads lower and enabling more expansionary fiscal responses.

In depth

Q1. What is the identification strategy and why do natural disasters solve the endogeneity problem?

Large natural disasters serve as a source of exogenous, random adverse economic shocks; by interacting disaster exposure with the presence or absence of fiscal rules, the paper identifies the effect of rules on macroeconomic performance without confounding from the non-random adoption of rules. Endogeneity is a central concern in the fiscal rules literature because countries that adopt rules may differ in politically or economically relevant ways from those that do not (e.g., more disciplined political environments, stronger institutions). Using large disasters as quasi-experimental variation removes this concern: the timing and magnitude of natural disasters are uncorrelated with which countries happened to adopt fiscal rules, isolating the effect of rules on crisis response.

Q2. What are the main empirical findings?

Countries with fiscal rules show significantly higher output and private consumption following large natural disasters, and implement significantly more expansionary fiscal policy, compared to countries without rules—holding over a 1970Q1–2018Q4 quarterly panel—with confidence bands at the 68% and 90% levels based on 500 Monte Carlo draws. The result directly contradicts the commonly held view that fiscal rules restrict governments’ ability to respond to shocks. Moreover, the paper finds that the superior performance of rule-constrained countries is conditional on two features: the existence of fiscal space prior to the shock (low debt or deficit positions), and the presence of escape clauses that allow rules to be suspended during severe adverse events.

Q3. What is the model mechanism?

In the sovereign default model, a fiscal rule prevents a myopic government from over-borrowing in good times out of political economy considerations (e.g., electoral incentives to spend); this forced restraint creates fiscal space—lower debt, lower sovereign spreads—which allows the government to run deficits when a shock hits without triggering a default episode or a sharp rise in borrowing costs. The model predicts that, relative to a no-rule economy, when a disaster strikes in a rule-constrained economy: sovereign spreads spike by less, the fiscal policy response is more expansionary, and output and consumption are higher. Escape clauses in the rules are important: they allow the government to depart from the rule explicitly in crisis situations without destroying the credibility of the rule in normal times.

Q4. What is the policy implication for the COVID-19 fiscal response?

The paper’s findings directly address the suspension of fiscal rules during COVID-19: the theoretical and empirical results suggest that rules with escape clauses do not impair crisis response and may actually improve it, by ensuring fiscal space is available when needed. The paper’s evidence implies that the COVID-era suspension of rules in many countries (including the EU’s Stability and Growth Pact) was not necessarily required to enable expansionary fiscal responses—countries with well-designed rules including escape clauses could have responded expansively while maintaining rule credibility.

Key concepts

escape clause : a provision in a fiscal rule that explicitly permits departure from the rule’s numerical target under defined circumstances (severe recessions, natural disasters, etc.); the paper finds that the presence of escape clauses is one of the two conditions for rule-constrained countries to outperform non-rule countries after adverse shocks.

fiscal space : the buffer of low debt and deficit levels that allows a government to increase spending or cut taxes during a shock without triggering unsustainable debt dynamics or elevated sovereign spreads; the paper shows fiscal space is created by rules in good times and consumed in bad times.

Competition and the Phillips curve

Mon, 01 Jan 0001 00:00:00 +0000

Fujiwara and Matsuyama ask whether the well-documented flattening of the New Keynesian Phillips curve (NKPC) and the concurrent rise in market concentration and markup rates are causally linked or merely coincidental. Under the canonical New Keynesian model with CES demand, competition is irrelevant to the Phillips curve regardless of whether entry is endogenous — concentration neither changes its slope nor affects inflation directly. This paper overturns that irrelevance result by extending the canonical model in two directions: (1) incorporating endogenous firm entry and exit following Bilbiie, Ghironi, and Melitz (2008) and Bilbiie, Fujiwara, and Ghironi (2014), and (2) replacing CES with the Homothetic Single Aggregator (HSA) demand system (Matsuyama and Ushchev 2017, 2020b), a flexible, tractable class of homothetic demand systems that nests CES and Translog as special cases.

The paper’s theoretical results depend on two of Marshall’s laws of demand. The Second law states that the price elasticity of demand rises with the firm’s own price; the Third law states that the rate of increase in that elasticity falls with price. Together these conditions imply that the markup rate and pass-through rate are endogenous to the competitive environment.

The main findings, delivered under both Rotemberg (1982) and Calvo (1983) pricing, are that higher entry costs — leading to market concentration — cause Phillips curve flattening through two distinct, complementary channels:

Structural (steady-state) effect. Under Rotemberg pricing, the slope of the NKPC is proportional to the price elasticity zeta(z); market concentration reduces z, hence reduces zeta(z) under the Second law, directly flattening the curve. Under Calvo pricing, the slope is proportional to the pass-through rate rho(z); the Third law implies that concentration reduces rho(z), again flattening the curve. The Calvo–Rotemberg equivalence, which holds under CES to first order (Roberts 1995), breaks down under HSA: each pricing mechanism highlights a different channel.
Observational (omitted variable bias) effect. Endogenous entry generates an endogenous cost-push shock through strategic complementarity in price setting. Because the number of firms N_t is omitted from a naive regression of inflation on real marginal cost, and because N_t is positively correlated with the marginal cost under the Second law, the omitted variable bias is negative — the estimated slope is biased downward. This bias is amplified with greater concentration under the Third law (Rotemberg case) and under both the Second and Third laws (Calvo case).

Quantitatively, the paper simulates under three parametric HSA families — CES, Translog, and Co-PaTh (Constant Pass-Through). De Loecker, Eeckhout, and Unger (2020) document that aggregate markups rose from 21% above marginal cost to 61% — a rise of approximately 40 percentage points. The authors’ simulations imply this increase corresponds to an entry cost roughly 3.5 times higher under Translog and roughly 2.5 times higher under Co-PaTh with pass-through rate rho = 0.5. Under these parameterizations, the accompanying market concentration can halve the slope of the NKPC. Impulse responses confirm that the responses of inflation to both technology shocks and monetary policy shocks become smaller as market concentration deepens.

Scope conditions: results require departure from CES (the Second and/or Third law must hold); endogenous entry is necessary for the dynamic cost-push channel; the structural flattening requires only the Second law under Rotemberg but additionally the Third law under Calvo; the omitted variable bias requires the Second law under Rotemberg and both laws under Calvo. The model is closed-economy, with symmetric monopolistic competition and Rotemberg or Calvo price adjustment.

Q1: What is the irrelevance result the paper overturns, and why does CES produce it? Under CES, the market share function takes the form s(z) = gamma * z^(1-theta), yielding a constant price elasticity zeta = theta and a pass-through rate rho = 1, regardless of the number of firms or entry costs. As a result, concentration neither alters the slope of the NKPC nor generates any endogenous cost-push shock; competition is simply irrelevant to inflation dynamics. This irrelevance holds even with endogenous entry under CES.

Q2: What is the Homothetic Single Aggregator (HSA) and why is it used? HSA is a class of homothetic demand systems, originally proposed by Matsuyama and Ushchev (2017), in which the market share of each intermediate input variety depends solely on its own price normalized by a single price aggregator A_t. This single aggregator serves as a sufficient statistic summarizing all competitive pressure effects on pricing behavior, including the markup rate and pass-through rate. HSA nests CES and Translog as special cases, is analytically tractable (equilibrium existence and uniqueness are straightforward to ensure with endogenous entry), and is flexible enough to accommodate both the Second and Third laws of demand.

Q3: What are Marshall’s Second and Third laws as defined in the paper? The Second law states that the price elasticity of demand zeta(z) is increasing in the normalized price z (equivalently, increasing in the single price aggregator A_t, which rises with fewer firms). The Third law, as defined by Matsuyama and Ushchev (2023b), states that the rate of increase in the price elasticity is decreasing in z. Together they ensure that both markup rates and pass-through rates respond systematically to changes in competitive pressure.

Q4: How does market concentration structurally flatten the NKPC under Rotemberg pricing? Under Rotemberg pricing, the slope of the NKPC equals (zeta(z) - 1) / chi, where chi is the Rotemberg price adjustment cost parameter. Higher entry costs reduce the equilibrium number of firms, which reduces competitive pressure and lowers z. Under the Second law, lower z reduces zeta(z), directly shrinking the slope coefficient. This is the steady-state effect of concentration: the structural slope of the curve declines because the price elasticity falls.

Q5: How does market concentration structurally flatten the NKPC under Calvo pricing? Under Calvo pricing, the slope of the NKPC is positively related to the pass-through rate rho(z) rather than the price elasticity. The Third law implies that lower z (more concentration) reduces rho(z). Market concentration therefore causes structural flattening through the pass-through channel under Calvo. This is why the Calvo–Rotemberg equivalence — which holds to first order under CES — breaks down under HSA: Rotemberg highlights the Second law / price elasticity channel and Calvo highlights the Third law / pass-through channel.

Q6: What is the endogenous cost-push shock and how does it arise? When the number of operating firms N_t changes endogenously, it alters the single price aggregator A_t and therefore the competitive environment facing each firm. Under the Second law, firms exhibit strategic complementarity in price setting: a firm reduces its markup when other firms lower their prices (A_t falls with more entry). Consequently, movements in N_t directly enter the NKPC as an additional term — (1/chi) * (1 - rho(z)) / rho(z) * N_hat_t — acting as an endogenous cost-push shock. This channel is absent under CES because rho = 1 makes the coefficient zero.

Q7: How does the endogenous cost-push shock create a negative omitted variable bias? A naive regression of inflation on real marginal cost omits the N_hat_t term. Under the Second law, N_t is positively correlated with the marginal cost (more entry drives markups down, consistent with marginal cost movements), so the omitted variable N_hat_t is positively correlated with the included regressor. Because the true coefficient on N_hat_t in the NKPC is negative, omitting it biases the estimated slope on marginal cost downward (negative omitted variable bias). The estimated relationship between inflation and marginal cost is therefore weaker than the true structural relationship.

Q8: How is the omitted variable bias amplified by concentration? Under the Third law (Rotemberg case) and under both the Second and Third laws (Calvo case), greater market concentration amplifies the magnitude of this negative bias. The intuition is that higher concentration makes the pass-through rate rho(z) smaller, which increases the coefficient on N_hat_t in the NKPC and thereby raises the magnitude of the bias when N_hat_t is omitted. Greater concentration thus generates both more structural flattening and more observational flattening simultaneously.

Q9: What are the quantitative magnitudes of Phillips curve flattening in the simulations? De Loecker, Eeckhout, and Unger (2020) document that aggregate markups rose from 21% above marginal cost to 61% — approximately 40 percentage points. The paper’s simulations imply this corresponds to an entry cost increase of roughly 3.5 times under Translog and roughly 2.5 times under Co-PaTh with rho = 0.5. According to Figure 2, the accompanying market concentration can halve the slope of the NKPC. The slope declines more steeply for demand systems with smaller pass-through rates (rho further from 1).

Q10: How do impulse responses change with market concentration? As entry costs rise (deeper concentration), the responses of the inflation rate to both technology shocks and monetary policy shocks become smaller in magnitude. Under the Second law, a positive technology shock increases the number of firms through a wealth effect, but strategic complementarity in price setting reduces markups, muting the inflation response relative to CES. The dynamic effect of endogenous entry thus weakens the transmission of real economic shocks to inflation — a supply side effect of monetary policy that parallels Baqaee, Farhi, and Sangani (2021) but operates through firm entry rather than the misallocation channel.

Q11: What is the cyclicality of the markup rate under HSA, and why is it ambiguous? Under CES with flexible prices, the markup is constant. Under CES with sticky prices, the markup is procyclical (marginal cost falls with a positive technology shock but the price is rigid in the short run). Under the Second law with flexible prices, a positive technology shock increases firm entry, which reduces markups, making the markup countercyclical. In a sticky price equilibrium under the Second and Third laws, the cyclicality is therefore ambiguous: it depends on the tension between nominal rigidities (pushing toward procyclicality) and the pass-through rate (pushing toward countercyclicality).

Q12: Why do the three price indices in the model differ, and which is used for the NKPC? The model features three aggregate price measures: the final goods price (CPI) P_t, which captures productivity effects of entry; the single price aggregator A_t, which captures competitive effects of entry and is the reference price for firms; and the average price index (PPI) p_t, which is not affected by entry effects and is the measured price index. Because entry effects shift P_t and A_t in ways that are not directly observed, the paper evaluates NKPC responsiveness in terms of p_t (PPI inflation), the measurable index.

Q13: How does this paper relate to Wang and Werning (2022) and Baqaee, Farhi, and Sangani (2021)? Wang and Werning (2022) use a dynamic oligopoly model with exogenous entry and CES/Kimball demand, showing that higher concentration amplifies real effects of monetary policy and generates inflation persistence and endogenous cost-push shocks. Baqaee, Farhi, and Sangani (2021) use monopolistic competition with exogenous entry and Kimball demand under Calvo pricing, showing flattening through real rigidities and a misallocation channel (supply side effects of monetary policy). This paper uses monopolistic competition with endogenous entry and HSA under both Rotemberg and Calvo pricing; it produces supply side effects through firm entry rather than misallocation, and uses HSA rather than Kimball because HSA more readily guarantees equilibrium uniqueness with endogenous entry.

Q14: What parametric families of HSA are used in simulations and what are their properties? Three families are used: CES (constant price elasticity theta, pass-through rho = 1, benchmark); Translog (satisfies the Second law, variable markups and pass-through); and Co-PaTh or Constant Pass-Through (proposed by Matsuyama and Ushchev 2020a, constant pass-through rate rho in (0,1) under flexible prices, containing CES as a limit as rho approaches 1). For Calvo pricing, a fourth family — PEM (Power Elasticity of Markup, proposed by Matsuyama and Ushchev 2023b) — is used; PEM satisfies the Third law in its strong form and contains Co-PaTh as a limit case. Translog is noted to behave similarly to Co-PaTh with rho = 0.5.

Q15: What are the policy implications for central banks? Rising market concentration, by flattening the NKPC both structurally and observationally, reduces the effectiveness of monetary policy in achieving price stability through real economic activity — consistent with the concerns expressed by Federal Reserve officials (Clarida, Daly, Williams) quoted in the paper. The results suggest that empirical estimates of the NKPC slope that omit endogenous entry dynamics will be systematically biased downward, potentially leading central banks to underestimate the true structural responsiveness of inflation to demand conditions. Competition policy and barriers to entry thus have macroeconomic consequences beyond standard allocative efficiency considerations.

Homothetic Single Aggregator (HSA): A class of homothetic demand systems in which the market share of each input variety depends solely on its own price normalized by a single price aggregator A_t, which serves as a sufficient statistic for all competitive pressure effects on firm pricing behavior including the markup rate and pass-through rate. Nests CES and Translog as special cases.

Marshall’s Second Law of Demand (as used in the paper): The condition that the price elasticity of demand zeta(z) is strictly increasing in the firm’s normalized price z. Under this condition, markup rates and pass-through rates vary endogenously with competitive pressure, and strategic complementarity in price setting arises.

Marshall’s Third Law of Demand (as used in the paper): The condition, defined by Matsuyama and Ushchev (2023b), that the rate of increase in the price elasticity is decreasing in z. This law determines how the pass-through rate responds to concentration changes and is the relevant condition for structural flattening under Calvo pricing.

Pass-through rate rho(z): The fraction of a cost change that a monopolistically competitive firm passes through to its price under flexible pricing, defined as rho(z) = [1 - dln(zeta/(zeta-1))/dln(z)]^(-1). Under CES, rho = 1 (complete pass-through); under the Second law, rho < 1 (incomplete pass-through); it declines with concentration under the Third law.

Endogenous cost-push shock: The direct effect of changes in the endogenous number of firms N_t on inflation in the NKPC, arising from strategic complementarity in price setting under HSA. This term is absent under CES (where the coefficient is zero) and generates an omitted variable bias in naive regressions of inflation on marginal cost.

Steady-state (structural) flattening: The reduction in the true structural slope of the NKPC caused by market concentration operating through lower price elasticity (Rotemberg channel) or lower pass-through rate (Calvo channel). This is the first of the paper’s two reasons for observed Phillips curve flattening.

Observational (omitted variable bias) flattening: The downward bias in empirically estimated NKPC slopes arising because naive regressions omit the endogenous cost-push shock term. The bias is negative and is amplified by greater market concentration under the Third law and/or Second law depending on the pricing mechanism.

Complete Pass-Through in Levels

Mon, 01 Jan 0001 00:00:00 +0000

Consistent Evidence on Duration Dependence of Price Changes

Mon, 01 Jan 0001 00:00:00 +0000

Layer 1 — Overview

Research Question. This paper asks two related questions. First, can one develop a robust, distribution-free estimator for the discrete-time mixed proportional hazard (MPH) model of duration with unobserved heterogeneity? Second, what does that estimator reveal about the shape of the hazard of price changes, the role of heterogeneity in shaping aggregate price dynamics, and the distinction between regular price changes and sales?

Methodology. The authors develop a linear generalized method of moments (GMM) estimator for the discrete-time MPH model, building on identification results in Honoré (1993). The model specifies that the probability a price spell ends at duration t, conditional on surviving to t, equals the product of a product-specific frailty parameter θ (unobserved, fixed over time) and a common baseline hazard bt. The estimator exploits repeated price spells per product via moment conditions that are linear in bt, making estimation and inference straightforward. It accommodates right- and left-censored data, competing risks, and spell-specific observable characteristics, without requiring any parametric assumption on the frailty distribution. The estimator is consistent as the number of products grows, even with a short time dimension. A Hansen-Sargan J-test of overidentifying restrictions and a test of the monotone-average-type prediction are also developed.

The estimator is applied to two datasets: (1) IRI weekly store data (2001–2011), covering 30 product categories and more than 21 million products, yielding 684,919,778 pairs of durations; and (2) Online Micro Price data from Cavallo (2018), comprising approximately 250,000 products at daily frequency.

Main Findings with Quantitative Magnitudes.

Baseline hazard and heterogeneity. In the pooled IRI data, the Kaplan-Meier hazard is steeply declining throughout the entire range from 2 to 60 weeks. In contrast, the estimated baseline hazard is roughly constant until week 4 and then declines only modestly, with a noticeable spike at week 52. The ratio of the Kaplan-Meier hazard to the baseline hazard — the average type, E[θ|t] — drops by approximately 60 percent within the first 20 weeks, and continues to decline, reaching roughly 0.3 of its initial value after one year. This decomposition reveals substantial unobserved heterogeneity that accounts for a large fraction of the observed decline in the Kaplan-Meier hazard.

Implications for structural models. The finding of a decreasing baseline hazard is inconsistent with canonical state-dependent pricing models (Golosov and Lucas, 2007), which predict an increasing hazard, conditional on a given firm’s type. The decreasing baseline hazard is instead broadly consistent with time-dependent pricing models, though not with a constant-hazard (Calvo, 1983) specification.

Monetary policy impulse response. In a calibrated time-dependent pricing model with strategic complementarity (α = 0, 0.5, 0.95), the aggregate price level dynamics in the estimated heterogeneous-firm MPH economy are close to those of a homogeneous-firm economy that uses the Kaplan-Meier hazard as the common price-change hazard. The homogeneous-firm approximation is substantially closer to the MPH economy than a Taylor (1979, 1980) staggered-contract economy with the same Kaplan-Meier hazard, particularly when strategic complementarity is strong (α = 0.95). The Calvo economy provides a poor approximation due to its exponential (constant-speed) price convergence structure.

Regular versus temporary price changes. Using the competing-risks extension with spell-specific observables — classifying spells by whether they start and end with a price increase (+) or decrease (−) — the authors separately estimate four baseline hazards. The baseline hazard for consecutive price increases (b++t) is relatively flat, especially for the first 6 weeks, then flat until week 45, with a spike near one year, consistent with price-plan models. The baseline hazard for reversals (particularly b−+t, price decreases followed by price increases, associated with sales) is steeply declining. The J-test statistics are substantially lower for price trends (J++ = 3,920; J−− = 3,401) than for reversals (J+− = 8,737; J−+ = 7,910), and markedly lower than the pooled-model J = 10,498, indicating that the MPH structure fits regular price changes considerably better than sales.

Scope Conditions. Results are conditional on weekly store-level price data for mostly packaged consumer goods (30 IRI product categories). The analysis focuses on price spells of at least 2 weeks to avoid spurious duration-one spells from mid-week price changes. The maximum duration examined is 60 weeks. The comparison of estimation methods relies on the IRI data only; the Online Micro Price data confirm weekly decision-making through a spike in the daily hazard every 7 days. Comparisons with maximum likelihood estimates show that GMM recovers more heterogeneity (average type declines to 0.37 at 6 months by GMM versus 0.48 by continuous-time MLE), and that time aggregation explains most of the discrepancy between the two methods.

In depth

Q1. What is the mixed proportional hazard (MPH) model as used in this paper, and what does the estimator identify?

A1. The MPH model specifies that the hazard that a price spell ends at duration t, conditional on surviving to t, equals θ·bt, where θ is a product-specific frailty parameter drawn from an unknown distribution G and bt is a baseline hazard common to all products. The estimator, which is linear in bt, identifies the baseline hazard up to a multiplicative constant using moment conditions derived from repeated spell data, without restricting the shape of the frailty distribution. Identification relies on comparing the joint survival probabilities of two consecutive spells for the same product and exploits the symmetry implied by the MPH structure across spells.

Q2. How does the Kaplan-Meier hazard relate to the baseline hazard, and what does this relationship imply about heterogeneity?

A2. The paper proves that the Kaplan-Meier hazard Ht equals bt times E[θ|t], the mean frailty among spells surviving to duration t. Because higher-type products (those with a higher propensity to change prices) exit the pool of surviving spells earlier, E[θ|t] is strictly decreasing in t — a form of dynamic selection. The ratio Ht/bt, normalized to 1 at the start, falls to approximately 0.4 by week 20 in the pooled IRI data and to approximately 0.3 after one year, documenting that a large share of the decline in the Kaplan-Meier hazard reflects heterogeneity rather than structural negative duration dependence.

Q3. What does the estimated baseline hazard imply about structural models of price setting?

A3. A decreasing baseline hazard is inconsistent with the canonical state-dependent model of Golosov and Lucas (2007), in which a firm’s hazard of price change is increasing in the time since the last change, because larger deviations from the desired price accumulate with duration. The decreasing baseline hazard is instead consistent with time-dependent pricing models and with price-plan models where within-plan switches are costless. The mild spike at week 52 in the baseline hazard is consistent with Taylor-type annual pricing rules.

Q4. What is the approximate aggregation result for monetary policy, and how quantitatively accurate is it?

A4. In the time-dependent pricing model without strategic complementarity (α = 0), the impulse response of the aggregate price level to a monetary shock in a heterogeneous-firm economy is exactly the same as in a homogeneous-firm economy whose single firm uses the Kaplan-Meier survival function. This extends Carvalho and Schwartzman (2015) to an approximation in the case with strategic complementarity (α = 0.5 and α = 0.95). Numerically, the path of aggregate prices in the estimated MPH economy is close to that in the homogeneous-firm Kaplan-Meier economy, and substantially closer to it than to the Taylor-contract economy — the difference is most pronounced at horizons beyond about half a year when α = 0.95, where the Taylor economy shows notably slower initial convergence and faster later convergence relative to the MPH and homogeneous economies.

Q5. How do the paper’s results differ from those obtained using maximum likelihood estimation of the continuous-time MPH model?

A5. The GMM estimator recovers substantially more heterogeneity than maximum likelihood (MLE) applied to the continuous-time model with continuous records (assumed gamma frailty). The average type falls from 1 to 0.37 at six months under GMM, versus only 0.48 under MLE. The authors investigate two sources of this discrepancy: the assumed frailty distribution family (gamma) and time aggregation. They conclude that time aggregation is quantitatively more important in the IRI weekly data — that is, the continuous-time MLE approach fails to properly account for the discrete nature of the data-generating process, leading it to understate heterogeneity and recover a steeper baseline hazard.

Q6. How does the paper distinguish regular price changes from sales without directly observing a sales flag?

A6. The competing-risks extension classifies each spell by whether it starts with a price increase or decrease (observable characteristic χ ∈ {+, −}) and by whether it ends with a price increase or decrease (competing risk ρ ∈ {+, −}). Price trends — spells where the direction is the same at both the start and end (++ or −−) — are interpreted as regular price changes; price reversals (especially −+, i.e., price decrease followed by increase) are associated with sales. This approach is consistent with the statistical model used for estimation, avoids the bias from simply dropping suspected sales spells before estimation, and allows the MPH structure to hold only for the risks of interest even if it fails for others.

Q7. How well does the MPH model fit regular price changes versus sales?

A7. The J-test of overidentifying restrictions yields test statistics of J++ = 3,920 for consecutive price increases and J−− = 3,401 for consecutive price decreases, compared with J = 10,498 for the pooled model and J+− = 8,737 and J−+ = 7,910 for the reversal hazards. All rejections are at conventional significance levels (critical value 1,749 at 5%), but the rejection is substantially milder for price trends than for price reversals. For individual product categories, the model cannot be rejected for 8 categories (out of 30) for b++ and 21 categories for b−−, suggesting the MPH structure is a much better description of regular price changes than of sales.

Q8. What role do one-week price spells play in the data, and why are they excluded?

A8. In the IRI data, prices are measured as the ratio of weekly revenue to quantity, so a price change occurring mid-week generates a spurious price spell of duration one week. If all spells including one-week spells are retained, the autocorrelation of spell durations is only 0.029 in levels and even negative (−0.042) in logs, which is inconsistent with a mixture model. Once one-week spells are excluded, the autocorrelation rises to 0.235 in levels and 0.233 in logs, and is stable when two-week spells are also excluded (0.248 and 0.256). The paper therefore sets the lower duration bound at T̲ = 2 weeks.

Q9. What does the daily Online Micro Price data add relative to the weekly IRI data?

A9. The daily data reveal a sharp spike in the price-change hazard every seven days, suggesting that even when prices are observed daily, the decision to change prices is made at the weekly frequency. This justifies the use of a discrete-time model with a one-week period. The estimates from daily and weekly aggregations of the same data are broadly similar, though weekly data recovers somewhat less heterogeneity than daily data. Aggregating IRI weekly data to monthly frequency understates heterogeneity even more, confirming that frequency matters for measuring heterogeneity.

Q10. What are the computational advantages of the GMM estimator relative to maximum likelihood?

A10. Because the moment conditions are linear in the baseline hazard bt, the GMM estimator is obtained in closed form, making estimation fast and inference straightforward. On the pooled IRI sample, GMM estimation (including standard errors) required 70 minutes on a machine with 60 GB memory, whereas the maximum likelihood estimator required 15 hours on a machine with 256 GB memory and failed entirely on the 60 GB machine. The GMM approach also avoids the need to specify the frailty distribution family and guarantees a global solution (proved by the identification result), whereas the likelihood function is non-linear in bt and may have multiple local maxima.

Q11. What is the shape of the b++ baseline hazard for regular price increases, and what models does it support?

A11. The baseline hazard for spells starting and ending with a price increase (b++) is decreasing during the first 6 weeks — dropping by almost 50% — and then flat until approximately week 45, with a pronounced spike at around one year. This shape is consistent with price-plan models (Eichenbaum, Jaimovich, and Rebelo, 2011) with Calvo-type switching between plans, where within-plan changes are costless and the hazard of between-plan switching is approximately constant. The annual spike is consistent with Taylor-type pricing. Approximately 76.8% of complete spells starting after a price increase last at most 6 weeks.

Key Concepts

Baseline hazard (bt). The component of the MPH hazard that is common to all products and may vary arbitrarily with elapsed duration t. It represents structural duration dependence — the tendency for a given product to be more or less likely to change price as a function of how long its current spell has lasted — net of heterogeneity. It is identified only up to a multiplicative constant.

Frailty parameter (θ) / frailty distribution (G). The product-specific scaling factor in the MPH model, fixed over all spells for a given product, that captures permanent unobserved differences in price-change frequency across products. The paper treats G as a nuisance parameter and does not require a parametric assumption on its shape. A higher θ means the product has a higher baseline propensity to change its price.

Average type (E[θ|t]). The mean frailty parameter among spells that have survived to at least duration t. Because high-type products change price earlier and exit the pool of surviving spells first, the average type is provably strictly decreasing in t under the MPH model. It is measured as the ratio of the Kaplan-Meier hazard to the baseline hazard, and its rate of decline measures the importance of dynamic selection.

Kaplan-Meier hazard (Ht). The probability that a randomly drawn spell ends at duration t, conditional on having lasted at least t periods. It mixes together structural duration dependence (captured by bt) and dynamic selection (captured by changes in the average type). It can be estimated without imposing the MPH structure, requiring only stationarity of the duration process.

Competing risks. The framework in which a price spell can end for multiple distinct reasons — here, ending with a price increase or a price decrease — each with its own hazard function. The paper’s GMM approach allows the MPH structure to hold for only a subset of risks and observables, without imposing any structure on the remaining risks.

Price trends vs. price reversals. A classification of spells based on the direction of the surrounding price changes. Price trends are spells where the direction of the price change at the start and end of the spell is the same (++ or −−), interpreted as regular price changes. Price reversals are spells where the direction switches (e.g., −+, a price decrease followed by a price increase), associated with sales and other temporary price changes.

Strategic complementarity in pricing (α). The degree to which a firm’s target price responds to the average price set by other firms. Parameterized by α ∈ [0, 1), where α = 0 yields the exact aggregation result (only the Kaplan-Meier hazard matters) and higher α increases aggregate price stickiness by making firms reluctant to deviate from the average price when few others are adjusting.

Dynamic selection. The mechanism by which the composition of the pool of surviving price spells shifts toward lower-type (more price-sticky) products as duration increases, because higher-type products change price sooner and exit the pool. This is the source of the gap between the steeply declining Kaplan-Meier hazard and the more modestly declining baseline hazard.

Costs of Financing U.S. Federal Debt Under a Gold Standard: 1791-1933

Mon, 01 Jan 0001 00:00:00 +0000

Overview

This paper constructs a new dataset of US federal bond prices and uses it to estimate the full term structure of yields on gold-denominated US federal debt from 1791 to 1933 — the entire gold standard era. The core research question is how the costs of financing US federal debt evolved over this period and what monetary, fiscal, and financial policy changes drove that evolution, with the ultimate aim of understanding how the US built fiscal capacity and transformed its debt from a “junk bond” into a global “safe asset.”

Data and Methodology. The authors compile monthly prices, quantities, and descriptions of all US Treasury securities from 1776 to 1960 (the Hall et al. 2018 dataset). Bonds with less than one year to maturity are excluded from the main estimation due to liquidity premia. The primary estimation uses a Dynamic Nelson-Siegel (DNS) model with stochastic volatility (Diebold and Li 2006; Hautsch and Yang 2012), estimated by Bayesian MCMC. A key methodological innovation is the addition of bond-specific idiosyncratic pricing errors (Assumption 3), which allows the authors to include bonds with heterogeneous contract features — call options, indefinite maturities, conversion features — that characterize 19th-century US debt without either dropping them from the sample or having their idiosyncrasies distort the common yield curve. The data are “big” in the time-series dimension but sparse in the maturity (cross-sectional) dimension, frequently offering fewer than five price observations per month; the DNS framework pools information across time to address this sparsity.

For the greenback period (1862–1878), the authors extend the approach by modeling the greenback yield curve as a function of the gold yield curve and a time-varying VAR model of exchange rate expectations (Assumptions 4–5). Only nine greenback-denominated bonds exist in the sample, most of them short-term; the VAR is estimated jointly using exchange rate data and the relative prices of greenback and gold bonds.

Main Findings.

Long-run decline in yields. The 10-year gold-denominated zero-coupon yield fell from approximately 8% in 1800 to approximately 2% in 1900, consistent with global secular decline trends, but the trajectory stabilized near 2% after 1900 — suggesting US debt began to play a distinctive “safe-asset” role from the turn of the 20th century.
War spikes were much larger than previously understood. The paper’s estimate of the 10-year gold yield reaches a peak of approximately 16% near the end of the Civil War. This is substantially higher than the Homer and Sylla (2004) peak of 6% at the start of the war. The discrepancy arises because Homer and Sylla used bonds trading at par — which did not exist during the Civil War — while this paper uses the full universe of bonds at monthly frequency.
Yield curve slope switched sign. The term spread (10-year minus 2-year gold yield) was typically negative before the Civil War (inverted yield curve) and turned persistently positive afterward. The authors link this switch to a change in long-run inflation predictability: inflation was relatively hard to forecast before the Civil War and easier to forecast after, consistent with a negative inflation-risk premium in the pre-war period.
Default risk premium disappeared around 1905. Comparing hypothetical gold-denominated US consols to UK consols (the 19th-century benchmark safe asset), US yields were persistently above UK yields until approximately 1905, when US yields fell below UK yields. This indicates that US federal debt acquired safe-asset characteristics well before World War I, foreshadowing the shift in global reserve asset status during and after Bretton Woods.
Nominal anchor during the Civil War. Despite a 60% depreciation of the greenback against gold during the Civil War (100 greenback dollars could be purchased for as few as 40 gold dollars in summer 1864), investors expected greenbacks to eventually return to gold parity. Estimated long-run exchange rate expectations remained anchored at one-for-one parity throughout the period. This kept greenback-denominated bond yields flat at approximately 6% — bonds traded around par — explaining the “Civil War yield puzzle” noted by Friedman and Schwartz (1963).
Short-rate disconnect. Short-maturity government bonds (less than one year) traded with a premium of approximately 0.25 to 0.5 percentage points relative to model-implied yields throughout most of the 19th century, reflecting scarcity of money-like assets. This premium effectively disappeared from the 1880s until World War I — coinciding with the National Banking Era — and then reappeared in the 1920s after the Federal Reserve created a secondary market for Certificates of Indebtedness.

In depth

Q1. Why does the paper restrict estimation to bonds with maturity greater than one year?

Short-maturity Treasury notes exhibited particularly large estimated bond-specific pricing errors in preliminary analysis, which the authors attribute to a liquidity premium: short-term government debt was used for transactions and thus commanded a money-like premium that a common discount function cannot accommodate. To keep this liquidity premium from distorting estimates of the longer end of the curve, these bonds are excluded from the main estimation. Short-maturity bonds are then studied separately as an “out-of-sample” exercise (the short-rate disconnect).

Q2. How does the Dynamic Nelson-Siegel model with stochastic volatility solve the cross-sectional sparsity problem?

The DNS model parameterizes the entire yield curve at each date using only three latent factors — level (L), slope (S), and curvature (C) — which follow a driftless random walk. The stochastic volatility component, captured in the covariance matrix Σt, governs how much information is pooled across adjacent time periods. When Σt → 0, the yield curve is assumed constant (full pooling); when Σt → ∞, estimates are date-by-date (no pooling). By allowing Σt to vary, the model pools more heavily in sparse periods and less during wars when yields change rapidly. The companion paper (Payne et al. 2023a) confirms via information criteria that stochastic volatility and correlated shocks improve fit without overfitting.

Q3. What is the bond-specific pricing error and why is it essential for historical data?

Assumption 3 adds to each bond i a Gaussian pricing error with mean zero and bond-specific standard deviation σ(i)_m (scaled by Macaulay duration to approximate yield-space errors). This allows bonds with idiosyncratic contract features — call options, conversion clauses, ambiguous payment currency — to inform the common yield curve without unduly distorting it. Bonds with larger σ(i)_m receive less weight in estimation. In modern datasets, researchers pre-select homogeneous bonds and use time-specific pricing errors; the historical sparsity prevents that approach here.

Q4. How large were Civil War yields compared to prior estimates, and why does the discrepancy arise?

The paper’s posterior median for the 10-year gold zero-coupon yield peaks at approximately 16% near the end of the Civil War. Homer and Sylla (2004) report a peak of 6% at the start of the war. The discrepancy arises because Homer and Sylla used bonds trading close to par, but during the Civil War no federal bonds traded at gold-price par (Lincoln’s re-election was uncertain in summer 1864; 100 greenback dollars could be purchased for 40 gold dollars, implying 6% coupon bonds were priced at 40% of par, implying yields in excess of 15%). This paper uses the full universe of Treasury bonds at monthly frequency and allows all bonds — regardless of trading price — to inform the yield curve.

Q5. When did US debt cease to carry a default risk premium relative to UK debt, and how is this measured?

The authors compare yields-to-maturity on gold-denominated UK consols to those on hypothetical gold-denominated US consols promising the same coupon flows. Because both countries were on a gold standard for most of the period and UK consols were the 19th-century safe asset, the spread is interpreted as a risk premium on US debt. US yields fell below UK yields persistently after approximately 1905, indicating that US debt was priced as a safe asset well before World War I. US yields were temporarily close to UK yields in the 1820s but the spread re-widened after the Jacksonian era, state defaults in the 1840s, and the Civil War. The spread closed only after Civil War disruptions resolved, the National Banking System matured, and gold-greenback parity was restored in 1879.

Q6. What is the “nominal anchor” finding during the greenback era, and what econometric method uncovers it?

During 1862–1878, the federal government issued non-convertible greenback dollars alongside gold bonds. The greenback depreciated substantially (to 40 cents per gold dollar in 1864), yet greenback-paying bonds traded near par, implying greenback yields near 6%. The authors model the greenback yield curve as a product of the gold discount function and a “multiplier” z(j)_t capturing the expected future gold-to-greenback exchange rate at each horizon j (Assumption 4). The exchange rate expectations are estimated via a time-varying VAR(2) model of the gold-to-greenback and gold-to-goods exchange rates (Assumption 5), jointly constrained by the prices of greenback bonds via an interest-rate parity condition. The resulting estimates show that throughout the greenback era — even during large wartime depreciations — investors’ long-run expectations of the exchange rate remained anchored near gold parity, consistent with anticipated eventual resumption.

Q7. How did political events affect exchange rate expectations during and after the Civil War?

The time-varying VAR captures shifts in exchange rate expectations associated with identifiable political events. Grant’s victory in 1869 (which resolved uncertainty about whether debts would be honored in gold) coincided with an increase in the price of greenbacks, a decrease in expected greenback appreciation, and a closing of the gap between greenback and gold 10-year yields. In the early 1870s, following the Panic of 1873 and uncertainty about resumption, investors came to expect that gold-greenback discrepancies would persist almost indefinitely, causing gold and greenback yields to converge. The Resumption Act of January 1875 then shifted 2-year and 10-year expectations back toward parity.

Q8. What is the short-rate disconnect and what does it reveal about the National Banking Era?

The short-rate disconnect is the difference between observed yields-to-maturity for bonds with less than one year to maturity and the yields-to-maturity implied by the model estimated on bonds with more than one year maturity. A positive disconnect means short-maturity bonds yielded less than long-maturity bonds conditional on the model — indicating a liquidity premium on short-term debt. The authors find a persistent premium of 0.25 to 0.5 percentage points through most of the 19th century, reflecting scarcity of money-like assets when state bank notes circulated at variable discounts. The premium disappeared from approximately the 1880s to World War I, coinciding with the mature National Banking Era after greenback-gold parity was restored in January 1879. The authors interpret this as evidence that the National Banking Acts (1862–1866), which allowed National Banks to issue standardized bank notes backed by long-term US government bonds, ultimately succeeded in supplying liquid assets and equalizing the pricing of short- and long-term federal debt — but only after the currency risk from the greenback period had been resolved.

Q9. How does the composite long-term yield series (Officer-Williamson / Homer-Sylla) distort historical narratives?

The composite series combines Homer and Sylla US federal yields (1798–1861), New England Municipal bond yields (1862–1899), and corporate bond yields (1900–1940). The paper shows that this composite series substantially underestimates the increase in US federal borrowing costs during Civil War deficits (peak of 6% vs. this paper’s 16%) and overstates post-Civil War borrowing costs by mixing in riskier private obligations. The authors argue that earlier findings of no strong association between 19th-century interest costs and deficits (Evans 1985, 1987) may reflect the composite series’ failure to accurately capture federal borrowing costs during large deficit episodes.

Q10. How did the yield curve slope change after the Civil War and what explains it?

The term spread (10-year minus 2-year gold yield) was typically negative before the Civil War and positive after the late 1870s. Major wars caused sharp temporary decreases (inversions). The authors connect the sign switch to a change in long-run inflation dynamics documented in a companion paper (Payne et al. 2023b): long-run inflation was hard to predict before the Civil War and easier to predict after, suggesting gold bonds provided a better inflation hedge in the pre-war period (negative inflation-risk premium), which is consistent with asset pricing theory producing a downward-sloping yield curve. After the Civil War, as inflation became more predictable, the inflation-risk premium became positive and the yield curve turned upward-sloping.

Q11. What did the National Banking Acts seek to do and was the puzzle of bank note under-issuance resolved?

The National Banking Acts (1862, 1863, 1865, 1866) authorized federally chartered banks to issue bank notes up to 90% of the par or market value of eligible US Treasury bonds deposited as collateral, subject to a 1% annual tax on notes outstanding (0.5% after 1900), compared to a 10% tax on state bank notes. The intended goals were to increase the supply of short-term liquid assets and to increase bank demand for long-term federal debt, thereby lowering long-term yields and eliminating the short-rate disconnect. A long-standing puzzle (Friedman-Schwartz, Cagan, Champ, Calomiris-Mason) held that yields on eligible Treasuries did not fall enough to equal the note tax rate, implying under-issuance. The paper’s analysis of the short-rate disconnect offers a resolution: if one focuses on the disconnect rather than the yield-tax spread, the National Banking Acts appear to have largely achieved their goals by the 1880s — but only after greenback-gold parity was restored, suggesting that currency devaluation risk had initially restrained bank note issuance, as hypothesized by Cagan (1965).

Key Concepts

Dynamic Nelson-Siegel (DNS) model with stochastic volatility: A parametric yield curve model (Diebold-Li 2006) parameterizing zero-coupon yields at each date as a function of three latent factors — level (L), slope (S), curvature (C) — following a driftless random walk. The paper extends this with time-varying shock volatilities (stochastic volatility) to allow the degree of information pooling across time periods to vary with institutional and wartime disruptions. Used here to handle cross-sectional sparsity in historical bond data.

Bond-specific pricing error: A Gaussian pricing error with bond-specific standard deviation σ(i)_m (scaled by Macaulay duration) added to each bond’s observed price. Allows bonds with heterogeneous and idiosyncratic contract features (call options, conversion clauses) to inform a common discount function without distorting it, by automatically down-weighting “peculiar” bonds through higher estimated σ(i)_m.

Short-rate disconnect (liquidity premium): The systematic difference between observed yields-to-maturity on bonds with less than one year to maturity and yields implied by a pricing kernel fitted on bonds with more than one year to maturity. Interpreted as a money-like convenience yield (liquidity premium) on short-term debt: when money-like assets are scarce, short-term bonds are overpriced (lower yields) relative to the term structure implied by longer maturities. Measured here as an out-of-sample fit residual from the DNS model.

Denomination risk: The risk that the unit of account in which bond payments are promised may change in value relative to gold. During the greenback era (1862–1878), bonds denominated in greenbacks carried denomination risk because greenbacks could depreciate against gold. The paper distinguishes denomination risk from default risk by estimating separate gold and greenback yield curves and modeling exchange rate expectations.

Nominal anchor: The phenomenon in which long-run market expectations of the gold-to-greenback exchange rate remained anchored near gold parity (one-for-one) even during large short-run depreciations during the Civil War. Inferred from the observation that greenback-denominated bonds traded near par (yield ~6%) while the spot greenback depreciated by up to 60% against gold, implying investors anticipated eventual full appreciation.

Default risk premium (US-UK yield spread): The difference between yields on hypothetical gold-denominated US consols and yields on UK consols. Since both were on a gold standard (so inflation expectations are similar), and UK consols were the 19th-century benchmark safe asset, the spread is interpreted as the compensation investors demanded for the risk that the US might default or alter payment terms. Persistently positive until approximately 1905, then became negative.

Convenience yield: An implicit yield that accrues to holders of money-like or safe assets because of their use in transactions or as collateral. In this paper, it emerges as the spread between yields on US federal bonds and other low-risk bonds in the late 19th century, reflecting increased demand for Treasuries as reserves under the National Banking System. Historically identified via the short-rate disconnect disappearing in the National Banking Era.

Customer accumulation, returns to scale, and secular trends

Mon, 01 Jan 0001 00:00:00 +0000

This paper asks how rising returns to scale in production contributed to three concurrent U.S. secular trends since 1980: declining business dynamism, rising markups, and growing firm expenditures on customer acquisition. The author constructs a firm dynamics model in the Hopenhayn (1992) tradition with endogenous entry and exit, heterogeneous markups, and customer accumulation grounded in directed search in the product market. Firms compete for customers through both prices and selling activities; larger firms gain a competitive edge when returns to scale rise because their marginal costs fall more than those of smaller firms—even though the technological shift is uniform across firms. This demand-based channel triggers winners-and-losers dynamics and the rise of superstar firms.

The empirical foundation rests on Compustat data for U.S. publicly traded firms (1977–2014) and Business Dynamics Statistics (BDS) for aggregate and sector-level dynamism measures. Production-function estimation using Ackerberg, Caves, and Frazer (2015) augmented with sales-share controls documents that aggregate returns to scale rose from approximately 1.0 in 1980 to approximately 1.05 by 2014—a within-sector increase, not a reallocation effect. Over the same period, the cost-weighted markup rose by 42%, the firm entry rate fell by 33%, the excess reallocation rate fell by 29%, and selling costs relative to production costs rose by 60%–90% depending on the measure used.

The model is calibrated to 1980 steady-state moments (firm life-cycle patterns, markups, entry and reallocation rates). A 5% increase in returns to scale—matching the empirical estimate—accounts for: a +15 percentage point rise in the average cost-weighted markup (vs. +42% in the data); a 33% decline in the entry rate (exactly matching the data); a 21% decline in the reallocation rate (vs. 29% in the data); and a 23% increase in selling costs relative to production costs (vs. 60%–90% in the data). The model also generates a 53% rise in the share of firms aged 11 years or older (vs. 50% in the data) and a 58% decline in the employment share of firms aged 5 years or younger (vs. 56% in the data), closely tracking the aging of the U.S. firm population. Firm-level responsiveness to productivity shocks declines by 0.08 in the model, versus about 0.01 in Compustat and 0.09 in Decker et al. (2020).

Sector-level panel regressions with sector fixed effects confirm the model’s directional predictions: within-sector increases in returns to scale are associated with lower entry rates (coefficient −2.89, significant at 1%), lower reallocation rates (−1.16, significant at 1%), higher markups (+3.15, significant at 1%), and higher selling costs relative to production costs (+1.85 for the advertising-based measure; +8.52 for adjusted SG&A).

A key scope condition is that the model yields a constrained-efficient allocation: directed search and full internalization of returns to scale imply decentralized equilibrium efficiency, making the paper a laboratory for assessing how far efficient firm responses to technological change can explain the secular trends without invoking market failures. The model fits the post-2000 transition dynamics better than the 1980s–1990s period, and explains a substantial but incomplete share of the trends, suggesting complementary—possibly inefficient—forces also contributed.

Q: What is the core mechanism through which rising returns to scale generate winners-and-losers dynamics?

A: The marginal cost of production under increasing returns to scale (alpha > 1) is MC(z,n) = l(n,z)^(1−alpha) × (1/alpha) × (W/e^z), which depends on firm size l(n,z). A uniform rise in alpha rotates the marginal cost schedule clockwise by firm size: larger firms see a proportionally larger cost reduction than smaller firms, even though the technological change is identical across all firms. Because firms compete for the same pool of customers, this asymmetric cost advantage allows large firms to offer lower prices while sustaining higher margins, attracting customers away from small firms. The result is a demand-based channel that generates winners-and-losers dynamics and increases market concentration.

Q: How does the model capture customer accumulation, and why is it central to the paper’s argument?

A: The model introduces directed search in the product market, where firms post advertisements and customers—including those already matched with a firm—choose which submarket to enter by trading off offered utility against matching probability. A constant-returns-to-scale matching function governs match creation; in submarket with tightness theta, customers match with probability m(theta) = theta(1+theta)^(−1) and firms attract customers with probability q(theta) = (1+theta)^(−1). The customer accumulation motive creates an investment-harvest trade-off: firms can either post high promised utility (low prices) to grow their customer base or extract surplus through high prices. Rising returns to scale amplify large firms’ ability to resolve this trade-off favorably, linking the technological change directly to markup dynamics, entry incentives, and selling expenditures.

Q: What is the directed search framework’s role in ensuring equilibrium uniqueness and efficiency?

A: The author introduces firm-side commitment contracts—specifying price, separation probability, and continuation utility contingent on productivity realizations—combined with directed search. Because search is directed on both sides and firms fully internalize returns to scale, the decentralized equilibrium is constrained-efficient. This delivers uniquely determined heterogeneous prices in equilibrium (solving the indeterminacy problem common in customer-market models) and establishes the paper’s efficient-mechanism benchmark: it tests how far profit-maximizing firm responses to technological change—without any market failure—can account for the secular trends.

Q: How are prices structured in the model, and what life-cycle pattern do they generate?

A: Each firm charges two distinct prices in each period: one to incumbent customers (the same for all incumbents, since they are identical conditional on being attached to the same firm) and one to newly acquired customers (which varies based on the promised utility in the submarket searched). Firms that are expanding their customer base offer greater promised utility and therefore charge lower prices to attract customers; firms harvesting their existing base charge higher prices. Because firms enter small and grow, this dynamic generates a price life cycle: young firms invest via low prices and mature firms harvest through higher prices, which the model reproduces as a rising markup pattern over the firm life cycle—an untargeted moment the model fits well.

Q: What does the calibration target and what untargeted moments does the model reproduce?

A: The model is calibrated to 1980 using: the number of employees of entrant firms (pinning entry customer base n_e), employees of age-5 firms (pinning convex cost chi_1), share of firms aged 11+ years (pinning chi_2), average firm size (operating cost f), entry rate (entry cost kappa), excess reallocation rate (exit shock delta), and average cost-weighted markup (linear cost c). Untargeted moments reproduced include: a sales-weighted markup of 0.28 (vs. 0.25 in De Loecker et al. 2020), endogenous customer turnover of approximately 9% (vs. 15% in Gourio and Rudanko 2014), and an elasticity of customer base shrinkage to price of 0.08 (within the 0.01–0.16 range from Paciello et al. 2019). The model also matches markup and selling-cost life-cycle patterns that are typically overlooked.

Q: How large is the quantitative contribution of the 5% rise in returns to scale to each secular trend?

A: Comparing the 1980 steady state (alpha = 1) to the 2014 steady state (alpha = 1.05): the average cost-weighted markup rises by 15% in the model versus 42% in the data; the entry rate declines by 33% in the model, exactly matching the data; the reallocation rate declines by 21% in the model versus 29% in the data; and selling costs relative to production costs rise by 23% in the model versus 60%–90% in the data. The model thus explains a substantial share of each trend while leaving a residual requiring additional mechanisms.

Q: How does the model explain the aging of U.S. firms, and how well does it match the data?

A: The winners-and-losers mechanism shifts activity toward larger, older firms, which mechanically ages the firm population. The model generates a 53% increase in the share of firms aged 11 years or older (vs. 50% in the data) and a 58% decline in the employment share of firms aged 5 years or younger (vs. 56% in the data). This aging arises because rising returns to scale increase the cost of customer acquisition, acting as a barrier to entry that disproportionately hurts new, small firms while allowing large incumbents to remain viable at lower productivity thresholds.

Q: What is the channel through which rising returns to scale reduce business dynamism specifically?

A: The unequal reduction in marginal costs intensifies competition for customers and raises customer acquisition costs. This operates through two simultaneous effects on the exit threshold: (i) lower marginal costs allow large firms to remain viable at lower productivity levels despite higher customer acquisition costs; and (ii) heightened competition forces smaller firms to require higher productivity to survive in a market that has become increasingly costly to operate in. Higher customer acquisition costs therefore function as an endogenous barrier to entry, reducing the entry rate and the reallocation of resources across firms.

Q: Does the model attribute the secular trends entirely to efficient firm behavior, and what does it conclude about residual explanations?

A: No. The model is explicitly designed as a constrained-efficient benchmark, and the paper finds that while rising returns to scale account for a substantial share of the trends—particularly in magnitude—the transition dynamics show a less accurate fit before the 2000s. The author concludes that complementary mechanisms, likely involving inefficiencies (such as market power from horizontal product differentiation or barriers to entry beyond those captured by the model), played a significant role in the earlier evolution of these trends and in the portion of the trends not explained by the efficient channel.

Q: What evidence supports the rising returns to scale finding, and what are its limitations?

A: Production-function estimation using the Ackerberg-Caves-Frazer method with sales-share controls on Compustat data shows returns to scale rising from approximately 1.0 in 1980 to approximately 1.05 by 2014, driven primarily by within-sector increases rather than reallocation toward high-returns sectors. A translog production function finds limited evidence of heterogeneous increases across firm sizes within Compustat. However, Compustat predominantly covers large publicly traded firms; smaller firms outside the sample may have experienced minimal or no increase in returns to scale. If technology adoption involves fixed costs, the aggregate impact could be larger than estimated, meaning the quantitative exercises likely represent a conservative lower bound.

Q: How does the paper relate to and extend the directed search literature in product markets?

A: The paper builds on Gourio and Rudanko (2014) and Roldan-Blanco and Gilbukh (2020), where customers are locked in once matched, by introducing labor-search tools from Schaal (2017) to allow: (i) incumbent customer switching between firms at rates of 10%–25% annually (Gourio and Rudanko 2014), and (ii) a non-zero price sensitivity of incumbent customers (Paciello et al. 2019). It also allows firms to invest in demand through selling expenditures, which prior directed search models in product markets typically abstracted from, making it possible to study how technological changes affect customer reallocation and firms’ cost structures jointly.

Customer capital: The stock of customers a firm has accumulated through prior selling and pricing decisions; treated as a state variable that firms invest in (by offering low prices and spending on advertisements) or harvest from (by charging high markups), with a customer turnover rate estimated at 10%–25% annually in the literature.

Directed search in the product market: A market structure in which both firms and customers choose which submarket (indexed by the promised utility level) to enter, trading off match probability against terms; delivers constrained-efficient equilibrium and uniquely determined heterogeneous prices.

Investment-harvest trade-off: The firm’s dynamic choice between offering high promised utility (low prices, low current markups) to grow the customer base versus extracting surplus through high prices from an existing customer base; shaped by the firm’s current size, productivity, and the cost structure implied by returns to scale.

Returns to scale (alpha): The curvature of the production function y = e^z × l^alpha; equals 1.0 under constant returns and approximately 1.05 by 2014 in the empirical estimates; the paper’s central technological change parameter, whose rise disproportionately reduces marginal costs for larger firms.

Winners-and-losers dynamics: The reallocation of customers and market share from small to large firms triggered by the asymmetric cost advantage large firms obtain when returns to scale rise; the demand-based channel through which superstar firms emerge.

Cost-weighted markup: The average markup aggregated using each firm’s costs as weights, as opposed to sales-weighted markup; the primary measure of market power used in the paper, rising by 42% in the data between 1980 and 2014.

Constrained-efficient allocation: An equilibrium outcome in which, given the frictions present (search-and-matching in the product market), no social planner operating under the same constraints could improve welfare; the paper uses this as a benchmark to assess how far efficient firm responses explain secular trends without invoking market failures.

Selling costs relative to production costs: The ratio of customer acquisition expenditures (advertising or adjusted SG&A) to cost of goods sold; rose by 60%–90% in the data between 1980 and 2014 and by 23% in the model’s steady-state comparison.

Default Options and Retirement Saving Dynamics

Mon, 01 Jan 0001 00:00:00 +0000

Overview

Research question. Does automatic enrollment (auto-enrollment) in retirement savings plans increase lifetime wealth accumulation and welfare? The prior literature established large short-run participation effects but had not traced the policy’s consequences over a full working life.

Data. The paper draws on two primary sources. First, a proprietary panel of 401(k) administrative records from nearly 600 U.S. firms, covering roughly 159,216 first-year employees across 86 firms (for the “increasing default” fact) and 6,415 employees across 34 firms (for structural estimation), observed between December 2006 and December 2017. Second, 12 successive waves (2006–2017) of the U.K. Annual Survey of Hours and Earnings (ASHE), a 1% nationally representative panel of approximately 200,000 private-sector employees per year, including 37,120 job-switchers, used to exploit the phased rollout of the U.K. Pension Act of 2008.

Methodology. The paper proceeds in three steps. (1) Three empirical stylized facts are documented using quasi-experimental variation (comparing employees hired before versus after changes in the default contribution rate within the same firm, and exploiting the staggered employer-size-based rollout of U.K. auto-enrollment). (2) A structural lifecycle model is estimated via the Method of Simulated Moments, using three preference parameters—intertemporal discount factor (δ), elasticity of intertemporal substitution (σ), and opt-out cost (k)—identified from the within-firm default variation in 34 U.S. firms. (3) The estimated model is used for out-of-sample validation and counterfactual welfare analysis.

Three stylized facts.

Fact I — Increasing the default reduces participation. Among 159,216 first-year employees in 86 auto-enrollment firms, each percentage-point increase in the default contribution rate reduces 401(k) participation by approximately 1 percentage point and increases contributions strictly below the new default by 1 percentage point. When the default rose from 3% to 6%, workers were 3.2 percentage points more likely to contribute at 1% or 2% of salary. This “drop-out” pattern is consistent with an opt-out cost model but is inconsistent with loss-aversion and psychological-anchoring theories, both of which predict that raising the default should weakly increase low-end contributions.

Fact II — Non-autoenrolled workers catch up within three years. In the estimation sample of 34 U.S. firms offering a 50% match up to 6% and an auto-enrollment default of 3%, median cumulative employee 401(k) contributions of non-autoenrolled workers equal those of autoenrolled workers after three years of tenure. Because non-autoenrolled workers compensate for initial non-participation by contributing more later—earning similar cumulative employer match and tax benefits over the full three-year horizon—a modest opt-out cost suffices to explain the observed inertia. Previous studies (which examined only the first year of tenure and did not allow future contribution adjustment) inferred opt-out costs of $1,000–$2,200 or more; the dynamic model implies a cost of only approximately $250.

Fact III — Prior auto-enrollment reduces saving in the next job. Using the phased U.K. policy rollout, workers who were auto-enrolled in their previous job and then move to a new employer that has not yet implemented auto-enrollment participate 12.8 percentage points less and contribute 0.55% of salary less in the new plan relative to otherwise similar job-switchers from non-auto-enrollment employers. When the new employer also has auto-enrollment, no statistically significant difference is observed. Placebo rollout tests confirm the effect is not a pre-existing selection pattern. This negative spillover contradicts a “savings habit” hypothesis and suggests that auto-enrollment’s short-run boost overstates lifetime savings effects.

Structural estimation results. The estimated quarterly discount factor is δ = 0.987 (approximately 0.949 annually), and the elasticity of intertemporal substitution is σ = 0.435, both standard in lifecycle models. The opt-out cost is estimated at $254 per contribution-rate change (standard error $11). Sensitivity exercises show that combining a short observation window (first year only), sticky contributions (no intra-job adjustment), no income uncertainty, immediate vesting, and penalty-free DC withdrawals yields an opt-out cost of $3,004—broadly matching the range in previous studies. The low baseline estimate is thus driven by the dynamic nature of decisions (ability to compensate later), the illiquidity of retirement accounts (which reduces their perceived value), and income uncertainty (which expands the inaction range).

Long-run wealth effects. Simulating a universal 3% auto-enrollment policy, the model predicts that wealth at retirement changes by less than 2% for the top 7 income deciles. For individuals in the top two deciles, total wealth at age 65 is actually reduced by less than 1% because many high earners who would voluntarily contribute above 3% are pulled down to the default. At the bottom decile, however, auto-enrollment raises total retirement wealth by more than 12%; savings increases are concentrated in the first 20 years of working life and peak around age 45, where bottom-quintile workers hold an additional 20% of average annual lifetime earnings. Even at the bottom, approximately one-third of the early savings gains are offset by lower contributions after age 45, as the wealth effect dominates. Crowd-out of liquid savings is limited: for bottom-quintile individuals, 89% of the increase in retirement savings at age 65 passes through to total wealth; for middle-quintile individuals, 62% passes through.

Out-of-sample validation. The U.S.-estimated model is not rejected (at the 10% level) in 8 of 11 response moments in the 86-firm sample where defaults were raised between two positive rates, covering over 85% of workers. Recalibrated to U.K. institutions (using δ and σ from the U.S. and k = £160 via the average USD/GBP exchange rate), the model replicates the roughly 30-percentage-point increase in both participation and contributions at the 1% U.K. default. The model also predicts a 9.6-percentage-point drop in participation when workers move from an auto-enrollment to an opt-in employer, close to the empirical 12.8 percentage points.

Welfare and optimal policy. Under utilitarian preferences (policymaker shares individuals’ discount rate, no redistributive motive), the opt-in regime is always preferred to auto-enrollment regardless of policy incidence, because matching and tax incentives already induce over-saving relative to individuals’ revealed time preferences. Under paternalistic preferences (social discount factor = 1) or inequality-averse preferences (Pareto weights inversely proportional to income, with degree of inequality aversion ν = 1 following Saez 2002), an auto-enrollment default at or near the employer matching threshold (6% of income) maximizes social welfare. A 6% auto-enrollment default improves welfare by 0.3% in lifetime consumption-equivalent for the bottom decile even under a utilitarian policymaker when incidence is on employers. These optimal policy rankings are robust to whether the opt-out cost is treated as fully welfare-relevant (π = 1) or welfare-irrelevant (π = 0), and hold under three incidence scenarios (employer profit reduction, match-rate adjustment, wage adjustment).

In depth

Q1. What is the core mechanism by which non-autoenrolled workers “catch up” at the median, and why does this reduce the implied opt-out cost relative to prior estimates?

A: Non-autoenrolled workers who do not contribute in their first year are not permanently forgoing employer matching and tax benefits; they can contribute more later in the same job and earn similar cumulative benefits. The paper shows that at the median and 75th percentile, cumulative employee 401(k) contributions among opt-in workers equal those of autoenrolled workers after three years of tenure in 34 U.S. firms offering a 50%-up-to-6% match at a 3% default. This dynamic substitutability means the opportunity cost of initial non-participation is far smaller than one-period back-of-the-envelope calculations suggest. Previous studies, which implicitly or explicitly assumed static contribution decisions or examined only the first year, inferred opt-out costs of $1,000–$2,200; in a fully dynamic model the same inertia requires only ~$254.

Q2. Why does Fact I (higher default reduces participation) specifically rule out loss aversion and anchoring as the primary mechanism, and what does it support instead?

A: Under loss aversion, contributions above the default feel like losses while contributions below the default feel like gains. Raising the default shifts some contributions from the loss domain into the gain domain, making low contributions relatively less attractive. Proposition 2 demonstrates formally that loss-averse preferences predict a weakly lower fraction contributing below the new (higher) default — the opposite of what is observed. Similarly, Proposition 3 shows that psychological anchoring shifts preferences toward the new default, also predicting more participation at low rates when the default rises. Only the opt-out cost model (Proposition 1) predicts that a higher default causes some workers to incur the cost to switch away from the default and end up at lower contribution rates, matching the empirical finding that each 1-percentage-point rise in the default increases contributions strictly below the old default by approximately 1 percentage point.

Q3. What is the quantitative magnitude of the opt-out cost, and what modeling assumptions are responsible for it being much smaller than prior estimates?

A: The baseline estimate is $254 per contribution-rate change (s.e. $11), roughly an order of magnitude smaller than prior estimates of $1,000–$3,000+. Table 4 decomposes the sources of the difference: using only first-year data changes the estimate only slightly (to $226). Assuming contributions cannot be changed within a job (“sticky contributions”) raises the cost to $308 with four years of data or $712 with one year of data. Eliminating income uncertainty raises the estimate to $465. Assuming immediate vesting raises it to $344. Assuming penalty-free DC withdrawals raises it to $609. Combining all these restrictions simultaneously yields $3,004 — closely matching the prior literature. The three key drivers are thus: (1) the ability to adjust contributions over time within a job; (2) the illiquidity of the DC account (early-withdrawal penalties); and (3) income uncertainty widening the inaction range.

Q4. How does the paper validate the structural model out of sample, and what confidence does this provide in the long-run predictions?

A: Two out-of-sample exercises are reported. First, the model estimated on 34 U.S. firms (introduction of auto-enrollment from 0% to 3% default) is used to predict workers’ response when 86 other firms raised the default from one positive rate to a higher rate. The model prediction cannot be rejected at the 10% level in 8 of 11 response-moment cases, covering 71 of 86 firms and more than 85% of workers. Second, the model is re-calibrated to U.K. institutions (keeping U.S. preference estimates, setting k = £160 via exchange rate) and applied to the phased rollout of the U.K. Pension Act of 2008. The model replicates the roughly 30-percentage-point increase in both participation and contributions at the 1% default following the policy, and predicts a 9.6-percentage-point drop in participation when previously autoenrolled workers move to a new opt-in employer — compared with an empirical estimate of 12.8 percentage points (s.e. 5.5 pp).

Q5. What are the distributional implications of a universal 3% auto-enrollment policy for wealth at retirement?

A: The effect is concentrated at the bottom. For the top 7 income deciles, retirement wealth at age 65 changes by less than 2% relative to the opt-in counterfactual. For the top two deciles, total wealth at age 65 is actually reduced by less than 1% because high-earning workers who would voluntarily contribute above 3% are pulled down to the default. For the bottom decile, the policy raises total retirement wealth by more than 12%. Even at the bottom, roughly one-third of the early savings gains are later offset by lower contributions after age 45 as the wealth effect dominates, so even 20-year empirical follow-ups may overstate the policy’s lifetime effect at the bottom.

Q6. How large is crowd-out of liquid savings by auto-enrollment, and what explains the limited degree of substitution?

A: Crowd-out is modest. For bottom-quintile workers, 89% of the increase in retirement savings at age 65 translates into higher total wealth; for middle-quintile workers, 62% passes through. The limited crowd-out arises because liquid assets serve a precautionary motive and DC accounts serve a lifecycle motive — the two assets are not close substitutes. Additionally, as in Kaplan and Violante (2014), the marginal propensity to consume out of liquid assets is high in the model, so autoenrolled workers reduce consumption rather than run down liquid balances. These predictions align with Beshears et al. (2021), who find no significant increase in unsecured debt after four years, and Chetty et al. (2014), who estimate an 80% pass-through to total savings in a different Danish policy.

Q7. Why do previously autoenrolled workers contribute less when they switch to an opt-in employer, and how is this consistent with the model?

A: The most plausible explanation, and the one consistent with the model’s out-of-sample predictions, is a standard wealth effect: workers auto-enrolled early accumulate more retirement wealth and therefore have less incentive to contribute in a new job. The model predicts a 9.6-percentage-point participation drop for AE-to-non-AE movers, close to the empirical 12.8 pp. An alternative explanation — that previously autoenrolled workers rationally expect their new employer to soon adopt auto-enrollment and thus delay active enrollment — is partially ruled out by the finding that the empirical estimate is closer to the model prediction for job-switchers whose new employer is not expected to adopt auto-enrollment in the next 12 months.

Q8. What are the welfare implications of auto-enrollment under utilitarian, paternalistic, and inequality-averse policymakers, and how robust are these to the incidence assumption?

A: Under utilitarian preferences (policymaker shares individuals’ discount factor, no extra redistributive weight), the opt-in regime is always preferred regardless of whether the policy’s cost falls on employer profits, the match rate, or wages. The negative welfare effect is largest when incidence falls on wages (approximately 50% larger than under match-rate reduction). Under paternalistic preferences (social discount factor = 1), a 6% default (equal to the employer matching threshold) is optimal under all three incidence scenarios. Under inequality-averse preferences (ν = 1 Pareto weights), a 6% default is optimal when incidence falls on employers, and a 5% default when incidence falls on workers. These results are identical whether the opt-out cost is treated as fully welfare-relevant (π = 1) or welfare-irrelevant (π = 0). A 6% auto-enrollment default increases welfare by 0.3% in lifetime consumption-equivalent for the bottom income decile even under a utilitarian planner when incidence is on employers.

Q9. How does the paper address heterogeneity in default effects across age and income groups within a parsimonious homogeneous preference model?

A: The model has only three estimated preference parameters (δ, σ, k), yet it endogenously replicates empirical heterogeneity. Conditional on participating, workers in their 20s are approximately 20 percentage points more likely to stay at the 3% default than workers in their late 50s and early 60s; the model attributes this to the option value of waiting: young workers can compensate for current non-saving by contributing more later, so the cost of opting out is effectively smaller for them. The lowest-income workers are approximately 40 percentage points more likely to remain at the default than the highest-paid; the model explains this primarily because the fixed opt-out cost of $254 represents a larger share of earnings for low-income individuals (and secondarily because high-income workers have more to gain from active contribution decisions due to higher marginal tax rates and a lower Social Security replacement rate). All model-predicted coefficients fall within the 95% confidence intervals of the empirical estimates.

Q10. What does the paper conclude about the broader relevance of the “dynamic opt-out cost” framework beyond retirement saving?

A: The paper argues that wherever individuals can compensate for present inaction with future actions — as in retirement saving — the observed inertia at a default understates the freedom of choice preserved by the nudge, and short-run effects overstate long-term consequences. In contrast, in domains such as healthcare plan choice or school selection, future actions cannot easily offset present inertia; opt-out costs are likely to remain large; and the distinction between a nudge and a hard mandate collapses. The paper therefore argues that the appeal of “libertarian paternalism” (Thaler and Sunstein 2003) is domain-specific and is strongest precisely where intertemporal adjustment is possible.

Key Concepts

Opt-out cost (k). In this paper, a utility cost — estimated at $254 per contribution-rate change — that individuals must pay every time they choose a retirement contribution rate different from the current default. The cost is modeled as a consumption reduction and captures both real transaction costs (form-filling, adviser fees) and behavioral costs (cognitive cost of attention and optimal-choice search). It is fixed and homogeneous across individuals, and applies symmetrically in any direction of deviation from the default.

Auto-enrollment default contribution rate. The positive contribution rate at which new hires are automatically enrolled in a defined-contribution plan, with the option to opt out by incurring the opt-out cost. In the paper’s estimation sample, this is 3% of salary. The default is exogenous at the start of each new job but endogenous thereafter: once established, the default for subsequent periods equals the worker’s contribution rate in the previous period.

Default eﬀect. The empirically observed tendency of workers to remain at the default contribution rate rather than actively choosing a different rate. In this paper, the default effect is explained by opt-out costs rather than loss aversion or psychological anchoring — a distinction identified through the novel prediction that raising the default from a positive rate to a higher positive rate reduces overall participation (the “drop-out” effect), a pattern consistent only with opt-out costs.

Drop-out eﬀect. The paper’s term (following Caplin and Martin 2017) for the empirical finding that increasing the auto-enrollment default contribution rate causes some workers to stop contributing altogether or to contribute at rates strictly below the initial default. This effect is used as a discriminating test between competing theories of the default effect.

Dynamic opt-out cost framework. The paper’s core modeling insight: that opt-out costs must be estimated in a fully dynamic lifecycle model that allows workers to adjust contributions over time, to hold liquid assets and unsecured debt, and to face labor market risk. In a static or short-horizon model, the opportunity cost of initial non-participation appears large (because the worker permanently forgoes match and tax benefits), requiring large opt-out costs. In the dynamic model, the ability to compensate later shrinks the implied opportunity cost and hence the opt-out cost required to rationalize observed inertia.

Crowd-out of liquid savings. The extent to which higher DC retirement contributions induced by auto-enrollment reduce liquid asset holdings (or increase unsecured borrowing), rather than increasing total wealth. The paper estimates limited crowd-out (89% pass-through to total wealth for bottom-quintile workers, 62% for middle-quintile workers), attributable to the different roles of liquid assets (precautionary motive) and DC accounts (lifecycle motive) in the model.

Policy incidence. The channel through which employers balance their budget in response to higher matching costs created by auto-enrollment. The paper considers three scenarios: employers absorb costs through reduced profits; employers reduce the match rate; employers reduce wages. Optimal policy rankings and welfare magnitudes differ across these scenarios, but the qualitative conclusions — utilitarian policymaker prefers opt-in; paternalistic or inequality-averse policymaker prefers AE at 6% — are robust across incidence assumptions.

Consumption-equivalent variation (γ). The welfare metric used in the paper: the proportional increase in consumption in every period and every state of the world that would make the policymaker indifferent between an auto-enrollment policy at default d and the opt-in regime. A 6% default increases welfare by 0.3% in consumption-equivalent for the bottom income decile under a utilitarian policymaker when incidence is on employers.

Designing Disability Insurance Reforms: Tightening Eligibility Rules or Reducing Benefits?

Mon, 01 Jan 0001 00:00:00 +0000

This paper develops a sufficient statistics framework for the welfare analysis of disability insurance (DI) policy reforms and applies it to two reform episodes in Austria. The framework derives social optimality conditions for the two main DI policy instruments — eligibility rules and benefit levels — expressed in terms of estimable reduced-form objects (fiscal multipliers and insurance losses). The fiscal multiplier of a DI policy instrument is defined as the ratio of total fiscal cost savings to the mechanical (counterfactual-behavior-held-fixed) fiscal cost savings; it measures how much the program shrinks per dollar mechanically removed, and values above 1 indicate behavioral crowd-out of DI enrollment. The paper then evaluates two Austrian reforms: (1) a 2013 increase in the Rehabilitation Stricter Assessment (RSA) age threshold from 57 to 58 (and separately to 59), which tightened eligibility for DI applicants aged 57 by requiring them to demonstrate inability to be retrained for alternative work; and (2) a 2003 reform that reduced DI benefit generosity for workers aged 30–60 as a side effect of a pension reform. Using difference-in-differences with cohorts just above and below the relevant thresholds, the paper finds that the RSA reform generated a fiscal multiplier of 2.50 (RSA to 58) and 2.05 (RSA to 59), while the benefit reduction generated a multiplier of only 1.41 (ages 57–60) and 1.36 (ages 30–56). The large gap implies that for a given mechanical cost saving, tighter eligibility rules generate 1.8 times more total fiscal savings than benefit cuts. The paper further provides empirical evidence that the insurance losses associated with stricter eligibility rules are, in all likelihood, smaller than those from benefit reductions, strengthening the dominance of eligibility tightening over benefit cuts as a DI reform instrument.

In depth

Q1. What is the sufficient statistics framework and what does it deliver?

The paper derives social optimality conditions for two DI policy instruments — tighter eligibility rules and lower benefits — in terms of two sufficient statistics: the fiscal multiplier (total fiscal savings / mechanical fiscal savings) and the insurance loss (marginal utility of consumption of the affected recipients). An eligibility reform that tightens the threshold θ from θ* to θ* + dθ is welfare-improving if and only if the fiscal multiplier exceeds the social value of one dollar in the hands of the marginally excluded applicant; a benefit cut from b to b − db is welfare-improving if the multiplier exceeds the social value of one dollar held by the average current DI recipient. Because the fiscal multiplier is estimable from reduced-form variation and the insurance loss gives the welfare benchmark, the framework converts the welfare question into: “Is the multiplier large enough relative to the insurance value?”

Q2. How is the fiscal multiplier decomposed, and why does this decomposition matter?

The fiscal multiplier equals 1 + B/M where B is the behavioral fiscal effect (savings from deterred applications and enrollment) and M is the mechanical fiscal effect (savings from unchanged behavior on the affected population); the multiplier exceeds 1 whenever the behavioral response amplifies the direct savings. The decomposition matters because the behavioral effect operates through marginal applicants (who apply only under lenient rules) while the mechanical effect operates through always-applicants (who apply regardless). These groups have different characteristics: in the Austrian data, marginal applicants are more likely to be employed at age 56 (73% vs. 60% for always-applicants) and more likely to be blue-collar workers with musculoskeletal impairments, while always-applicants are more likely to be on sick leave — a proxy for genuine disability. Confusing the two groups would misidentify who bears the insurance loss.

Q3. How is the mechanical fiscal effect identified when marginal and always-applicants cannot be directly observed?

The paper exploits previously-rejected DI applicants (those who filed applications between ages 50–56 and were rejected) as a proxy group for always-applicants: these individuals qualify as always-applicants by revealed preference (they applied under strict rules), and the paper shows that their DI benefit receipt and net fiscal expenditures after a simulated age-57 application are statistically indistinguishable from those of all-age-57 applicants in the whole population. The mechanical fiscal effect per always-applicant in the whole population is estimated as M = 5,585 Euro × 0.070 = 391 Euro per capita (the product of the mechanical effect among pre-57 re-applicants and the share of always-applicants in the population, πAA = 0.070). The behavioral fiscal effect is then computed as the residual: B = total fiscal savings – M = 976 − 391 = 585 Euro, yielding the multiplier of 2.50.

Q4. What are the reduced-form effects of the RSA reform on DI enrollment and employment?

A one-year increase in the RSA from age 57 to 58 reduces DI inflow by approximately 21 percentage points at age 57 (relative to a control-group mean), increases employment by roughly 15 percentage points, increases other benefit receipt (unemployment insurance and social assistance) by about 16 percentage points, and generates net fiscal cost savings of approximately 976 Euro per person per year in the two years after the reform. The pattern of employment and benefit substitution shows that the behavioral response is substantial: a large share of those deterred from DI enrollment at 57 transition to employment or other social benefits rather than remaining without any income support, which is why the fiscal multiplier of 2.50 substantially exceeds 1.

Q5. What are the effects of the 2003 DI benefit generosity reduction?

A 1-percentage-point reduction in DI benefit generosity for the ages 57–60 cohort produces a behavioral fiscal effect of 18.69 Euro per year (through reduced DI inflow and application), a mechanical fiscal effect of 45.16 Euro per year (1% of the pre-reform mean benefit expenditure of 4,516 Euro among those aged 57–60), and a total fiscal effect of 63.85 Euro — yielding a multiplier of 1.41. For the younger cohort (ages 30–56), the multiplier is 1.36, with a behavioral effect of 1.18 Euro and mechanical effect of 3.24 Euro per year. The lower multipliers for benefit cuts relative to eligibility tightening reflect the fact that benefit reductions affect all current recipients uniformly (generating large mechanical savings) rather than targeting a group with a strong behavioral response at the margin.

Q6. How does the paper compare the insurance losses of the two DI instruments?

The paper derives a sufficient condition under which the insurance loss from tighter eligibility rules is smaller than the insurance loss from benefit cuts: it requires that the per-dollar income loss borne by the marginally excluded applicant (upper-bounded by their DI benefit minus available social welfare benefits) is weakly smaller than the per-dollar income loss of current recipients (lower-bounded by the benefit reduction itself), evaluated within each income quintile. Implementing this condition empirically using the Austrian income data, the paper finds that the income losses borne by marginally excluded applicants fall short of those borne by current recipients at all income quintile comparisons — meaning tighter eligibility rules both generate higher fiscal multipliers and impose smaller insurance losses than benefit cuts, making eligibility tightening the dominant instrument when the goal is to reduce DI program costs.

Q7. What welfare conclusion follows from combining the fiscal multipliers with the insurance benchmark?

Combining the multiplier estimates with hand-to-mouth CRRA assumptions as an upper-bound calculation, the RSA increase to 58 is welfare-improving if the coefficient of relative risk aversion of affected DI recipients is below 2.8; the RSA increase to 59 is welfare-improving if risk aversion is below 2.2. The corresponding critical risk aversion level for the benefit cut (ages 57–60) is 1.1 — below the range typically estimated in the literature for low-income individuals — suggesting the benefit cut was likely welfare-reducing while the eligibility reform was likely welfare-improving. For a given dollar of mechanical budget reduction, stricter eligibility rules generate 1.8 times the total fiscal savings (= 2.50 / 1.41) relative to benefit cuts.

Q8. What is the complier analysis and what does it reveal about who is affected by each instrument?

Using the complier analysis method adapted for difference-in-differences settings, the paper estimates that the RSA-58 reform affects three types of individuals in the age-57 population: marginal applicants (πMA = 0.014, who apply only under lenient rules), always-applicants (πAA = 0.070, who apply regardless), and never-applicants (πNA = 0.916, who never apply). Marginal applicants differ from always-applicants in that they are more likely to be employed at age 56 (73% vs. 60%) and less likely to be on sick leave; they are more likely to apply with musculoskeletal impairments than with mental impairments — consistent with these workers facing the largest relaxation in disability eligibility when reaching the RSA and being on the borderline of eligibility under strict rules.

Key concepts

fiscal multiplier of a DI instrument : total fiscal cost savings divided by mechanical fiscal cost savings from that instrument; equals 1 + B/M where B is the behavioral savings (from deterred applications) and M is the mechanical savings (from unchanged behavior); values above 1 indicate behavioral crowd-out and are the policy-relevant benchmark against which insurance losses must be compared.

mechanical fiscal effect (M) : the fiscal cost savings that would accrue if DI application and enrollment behavior were held fixed at pre-reform levels; for eligibility tightening, this equals the DI benefits that would have been paid to always-applicants who are now rejected; identified using the subpopulation of previously-rejected DI applicants as a proxy for always-applicants.

behavioral fiscal effect (B) : the additional fiscal savings generated by deterred applications and enrollment that result from the reform; equals total fiscal savings minus the mechanical fiscal effect; operates through marginal applicants who adjust their application behavior in response to stricter rules or lower benefits.

always-applicants : individuals who apply for DI regardless of whether eligibility rules are strict or lenient; they bear the mechanical cost of eligibility tightening (being rejected under stricter rules); in the Austrian data, their population share at age 57 is estimated at 7.0%.

rehabilitation stricter assessment (RSA) age : the Austrian policy threshold above which DI applicants are evaluated under more lenient standards that do not require demonstration that the applicant can be retrained for alternative work; increasing the RSA age from 57 to 58 subjects the age-57 cohort to stricter evaluation criteria.

insurance loss : the welfare cost to DI recipients or excluded applicants from the income reduction caused by a DI reform; the right-hand side of the social optimality condition; the paper bounds it using income losses by income quintile rather than requiring utility function assumptions.

Digital Distractions with Peer Influence

Mon, 01 Jan 0001 00:00:00 +0000

This paper estimates the causal effects of mobile app usage on college students’ academic performance, physical health, and labor market outcomes, while separately identifying behavioral (endogenous) and contextual (exogenous) peer effects in app usage — the first study to do so within a unified empirical framework. The analysis draws on administrative data for three freshman cohorts (2018–2020) at a mid-tier Chinese university, linked to individual-level mobile phone usage records from a major telecommunications carrier covering 6,430 students over four years (excluding COVID semester). High-frequency GPS data, hourly app usage records for the 2020 cohort, and two waves of university surveys supplement the main dataset.

The identification strategy addresses three challenges: endogeneity of own app usage, endogeneity of peer group formation, and the reflection problem in peer effects. For own usage, two instrumental variables are used: (1) a shift-share instrument interacting the September 2020 launch of the blockbuster game Yuanshen with students’ pre-college app usage intensity; and (2) China’s October 2019 minors’ game restriction policy (prohibiting under-18s from playing online games 10 p.m.–8 a.m. and capping weekday gaming at 90 minutes/day) interacted with the evolving number of underage pre-college friends. For peer effects, the university’s random dormitory assignment within gender-class units provides exogenous peer variation; behavioral peer effects are further isolated using the minors’ restriction policy interacted with roommates’ pre-college underage friend networks, an instrument that affects roommates but not the focal student. Contextual peer effects are recovered by subtracting the estimated behavioral component from reduced-form estimates.

The main findings are as follows. First, app usage is contagious: a one standard deviation (s.d.) increase in roommates’ in-college total app usage raises a student’s own usage by 5.8% (IV). Behavioral peer effects dominate: contextual peer effects are small and statistically insignificant. Second, own app usage severely harms academic performance: a one s.d. increase in total app usage reduces GPA for required courses by 36.2% of a within-cohort-major s.d. (IV), and a one s.d. increase in game app usage alone reduces GPA by 56.6% of a within-cohort-major s.d. The direct disruption effect of roommates’ app usage reduces GPA by a further 20.6% of a within-cohort-major s.d.; combining the indirect channel (behavioral contagion), the total roommate effect reaches 22.7% of a within-cohort-major s.d., more than 60% of the own-usage effect. Third, the effect on physical education scores is roughly four times larger than on required-course GPA: a one s.d. increase in own app usage reduces PE scores by 2.74 points, while roommates’ app usage has no direct effect on PE. Fourth, a one s.d. increase in own in-college app usage reduces initial wages upon graduation by 2.3% (12.1% of within-cohort-major wage s.d.); a one s.d. increase in roommates’ usage reduces wages by 0.9% directly, with a total effect (including the contagion channel) of approximately 1.0% (5.3% of within-cohort-major s.d.). Controlling for cumulative GPA reduces the gaming-to-wage coefficient by roughly one-third, indicating that academic performance is an important but partial mediator.

A back-of-the-envelope policy simulation extending the minors’ gaming cap (3 hours/week) to college students — binding for 34.3% of student-month observations — projects an average wage increase of 0.9% at graduation, approximately half the wage premium from one additional year of work experience in developing countries.

Mechanism evidence from GPS data shows that Yuanshen’s launch caused students to arrive at study halls 18.2 minutes later and leave 23.4 minutes earlier per day. High-frequency sleep data show that a one s.d. increase in nighttime app usage reduces sleep duration by approximately 30 minutes and raises the probability of sleeping late by 34 percentage points. Survey evidence indicates that heavy app users recognize the addictive nature of gaming, pointing to self-control problems rather than lack of awareness.

The scope conditions are: single mid-tier Chinese university; 2018–2020 cohorts; outcomes through initial job placement only; peer group restricted to dormitory roommates; findings rely on IV exclusion restrictions conditional on student and time fixed effects.

Q: What is the core research question? A: The paper asks how individual and peer mobile app usage affect college students’ academic performance, physical health, and early labor market outcomes, and it separately identifies the behavioral (endogenous) versus contextual (exogenous) components of peer influence in app usage. This is claimed as the first study to disentangle these two types of peer effects within a unified empirical framework.

Q: What data does the paper use? A: Administrative records for 7,479 undergraduates across three freshman cohorts (2018–2020) at a medium-sized mid-tier Chinese university are linked to monthly mobile app usage records from a telecommunications provider covering 75% of the provincial population; 6,430 students are matched. The dataset also includes GPS location data at 5-minute intervals, hourly app usage for the 2020 cohort (used to infer sleep), and two waves of voluntary annual surveys with 1,798 respondents (24% response rate). Labor market outcomes — employment status, wages, post-graduate admissions — are available for the 2018 and 2019 cohorts.

Q: How does the paper address the endogeneity of own app usage? A: Two sets of instruments are used. The first interacts the September 2020 launch of Yuanshen (the most popular game in China, with over 13 million Chinese users by 2021, the majority under age 25) with students’ pre-college app usage, forming a shift-share instrument under the assumption that the game launch is orthogonal to unobserved GPA determinants conditional on student fixed effects. The second interacts China’s October 2019 minors’ game restriction policy with the evolving count of a student’s underage pre-college friends; event studies confirm no pre-trends and a sharp, transitory drop in app usage post-policy that dissipates as friends age out of the restricted group.

Q: How does the paper solve the reflection problem and separate behavioral from contextual peer effects? A: Three-step procedure: (1) random dormitory assignment within gender-class units yields reduced-form peer effect estimates using roommates’ pre-college app usage as the exogenous peer shifter; (2) behavioral peer effects are isolated via an IV using the minors’ restriction policy interacted with roommates’ (not the focal student’s) underage pre-college friend networks — an instrument that shifts roommates’ app usage but is orthogonal to the focal student’s outcomes; (3) contextual peer effects are recovered as the residual from subtracting the estimated behavioral effect from the reduced-form estimate.

Q: How large and significant are the behavioral versus contextual peer effects in app usage? A: A one s.d. increase in roommates’ in-college total app usage raises own usage by 5.8% (IV estimate, significant). For game apps alone the behavioral spillover is 10.7%, and for games plus video it is 6.5%. Contextual peer effects (identified from roommates’ pre-college characteristics) are much smaller and statistically insignificant, indicating that peer influence operates primarily through the direct imitation of peers’ actions rather than their background traits.

Q: What is the effect of own app usage on GPA? A: The IV estimate shows a one s.d. increase in total in-college app usage reduces GPA for required courses by 0.716 points, equivalent to 36.2% of a within-cohort-major GPA s.d. (significant at 1%). For game apps alone, a one s.d. increase reduces GPA by 1.119 points, or 56.6% of a within-cohort-major s.d. OLS estimates are biased toward zero, likely because negative health shocks reduce both GPA and app usage simultaneously.

Q: How large is the total peer effect of roommates’ app usage on a student’s GPA? A: Roommates’ app usage directly lowers GPA by 0.408 points (20.6% of within-cohort-major s.d.) through disruption of the dormitory study environment or crowding out of group study. The behavioral contagion channel (5.8% increase in own usage per s.d. of roommates’ usage) adds an additional 0.042 points, bringing the total effect to approximately 0.450 points, or 22.7% of a within-cohort-major s.d. — over 60% of the own-usage effect.

Q: What is the effect on physical education (PE) scores, and why do roommates’ app usage not matter there? A: A one s.d. increase in own total app usage reduces PE scores by 2.74 points (IV), approximately four times the magnitude of the effect on required-course GPA, consistent with health literature on excessive screen time. Roommates’ app usage has no statistically significant direct effect on PE, which the authors attribute to the irrelevance of dormitory noise and study disruptions for outdoor physical activity.

Q: What are the effects of app usage on wages at graduation? A: Doubling total app usage during college reduces initial wages by approximately 2% (IV). A one s.d. increase in own usage reduces wages by 2.3%, or 12.1% of a within-cohort-major wage s.d. A one s.d. increase in roommates’ usage directly reduces wages by 0.9% (4.8% of within-cohort-major s.d.); including the behavioral contagion channel, the total roommate effect is approximately 1.0% (5.3% of within-cohort-major s.d.). Controlling for cumulative GPA reduces the game-usage-to-wage coefficient by about one-third, implying GPA is a partial but not complete mediator.

Q: What does the policy simulation of the gaming cap say? A: Extending the minors’ game restriction (3 hours/week cap) to college students would bind for 34.3% of student-month observations, reducing average monthly gaming from 12.1 hours to 8 hours (a one-third decrease). Incorporating the behavioral peer multiplier for gaming (0.078), average gaming further converges to approximately 7.65 hours in steady state. The implied wage gain at graduation is 0.9%, approximately half the wage premium from one additional year of work experience in developing countries (Lagakos et al., 2019 estimate).

Q: What does the GPS evidence show about time allocation? A: Following Yuanshen’s launch, the average student arrives at the study hall 18.2 minutes later and returns to the dormitory 23.4 minutes earlier per day. The minors’ restriction reverses this: students with the average number of minor friends arrive at study halls 17.4 minutes earlier and return to the dorm 19.8 minutes later. Both game shocks also shift tardiness and absence rates for major-required courses in the expected directions, and the effects intensify over time with Yuanshen’s growing popularity.

Q: What do the sleep data show? A: A one s.d. increase in nighttime app usage (9 p.m.–3 a.m.) is associated with roughly 30 minutes less sleep (7% of the mean), a 34 percentage point higher probability of sleeping late, and a 4.5 percentage point higher probability of waking up late. Daytime app usage (8 a.m.–9 p.m.) is also associated with 7.2 fewer minutes of sleep (1.8% of mean) and a 3.7 percentage point higher probability of late wake-up. These results are descriptive (from the 2020 cohort hourly data) rather than IV-based.

Q: What does the survey evidence show about mechanisms and self-awareness? A: Heavier app users report worse physical health and higher stress, are less likely to have obtained professional certifications by graduation, submit fewer job applications, and express lower satisfaction with job offers. Notably, heavier users are more likely to acknowledge the addictive nature of apps and games, suggesting a self-control problem rather than informational deficiency. They also report better relationships with roommates and greater likelihood of following roommates’ advice on post-graduation choices, a potential direct channel for peer labor market effects.

Q: How representative is the sample, and what are the key scope conditions? A: The university is a mid-tier institution in southern China with students predominantly from the 30th–80th CEE score percentile among provincial college-admitted applicants; it is less female (42% vs. 53% nationally) and more rural (40% vs. 27% nationally). Survey respondents oversample less advantaged backgrounds and are re-weighted. Findings pertain to dormitory roommates as the peer group; all labor market outcomes are initial wages upon graduation; the sample covers 2018–2021 with COVID semester excluded. The peer effects estimates rest on random dormitory assignment, which the authors verify by showing no within-dorm correlation in pre-college characteristics.

Behavioral (endogenous) peer effects: The mechanism by which a peer’s actual behavior — here, contemporaneous app usage — directly influences a focal individual’s own behavior. In this paper, identified via IV using the minors’ game restriction policy interacted with roommates’ underage pre-college friend networks, which shifts roommates’ usage but not the focal student’s characteristics.

Contextual (exogenous) peer effects: The influence of peers’ pre-determined background characteristics (e.g., pre-college app usage, reflecting motivation, study habits, attitudes toward academics) on a focal individual’s outcomes, independent of peers’ actual in-college behavior. Recovered as the residual after subtracting estimated behavioral peer effects from reduced-form estimates; found to be small and insignificant in this setting.

Shift-share instrument (Yuanshen): A quasi-experimental instrument constructed by interacting the mid-sample launch date of the blockbuster game Yuanshen (September 2020) with students’ pre-college app usage intensity, under the assumption that pre-college usage predicts differential susceptibility to the shock while the launch itself is orthogonal to the university’s academic environment.

Minors’ game restriction policy: China’s October 2019 policy prohibiting individuals under 18 from playing online games between 10 p.m. and 8 a.m. and capping weekday gaming at 90 minutes per day (tightened to 3 hours/week in September 2021). Used both as an instrument for own app usage (via underage pre-college friends) and as an instrument for roommates’ usage (via roommates’ underage friends) to isolate behavioral peer effects.

Reflection problem: The identification challenge first articulated by Manski (1993) arising because an individual’s behavior both affects and is affected by peers simultaneously, making it impossible to separately identify the direction of influence from observational data without exogenous variation in peer behavior.

Source text origin: The paper’s own data provenance category distinguishing whether summaries are based on full working paper text (pdf or oa-html) versus abstract only — a distinction the paper itself does not use but that is relevant to the review pipeline running this analysis.

Within-cohort-major GPA standard deviation: The unit used to scale all GPA effect sizes, defined as the standard deviation of GPA within students of the same graduation cohort and declared major. This normalization accounts for systematic differences in grading across fields and years, making effect magnitudes comparable across specifications.

Disaggregated Economic Accounts

Mon, 01 Jan 0001 00:00:00 +0000

This paper develops and implements a system of disaggregated economic accounts that breaks down national accounting positions into bilateral flows between small groups of consumers, producers, the government, and the rest of the world. Standard national accounts document aggregate income and production plus input-output trade between producer industries; they contain no comprehensive data on which consumers buy from which producers or which producers pay income to which consumers. The paper fills this gap by measuring, for Denmark, all 36 positions in the UN System of National Accounts (SNA) — consumer spending, labor compensation, profit income, intermediates trade, government transfers and taxes, and foreign trade — as bilateral cell-to-cell flows, satisfying all national accounting identities at the level of individual cells and at the aggregate level. The data reveal systematic stylized facts about domestic spending shares, gravity of spending, urban bias, and assortative matching between consumer and producer characteristics. Combining the disaggregated accounts with a general equilibrium model with nominal wage rigidities, the paper shows that fiscal transfer multipliers vary substantially across consumer cells — from below 1 to above 2 — depending on the spending intensity of recipient cells on the slack (unemployed) portion of the economy. Applying the framework to a hypothetical U.S. tariff shock on Denmark (calibrated to July 2025 effective tariff levels on China), the paper demonstrates that the cells generating the highest multipliers are not those directly exposed to the shock or even those made slack, but those whose spending intensity on slack cells is high. The disaggregated accounts allow the government to select more effective fiscal policies: choosing transfers targeting high-spending-intensity cells saves approximately 0.4–0.7% of Danish GDP relative to programs targeting low-intensity cells, for the same GDP stimulus.

Measurement framework (Section II): The paper assigns every Danish adult to one of approximately 2,744 consumer cells, defined by the interaction of 98 municipalities (regions) and 28 industries (industry of main employment). Every production establishment is assigned to one of approximately 2,646 producer cells by region and industry. Median consumer cell contains 658 adults; median producer cell contains 47 establishments. The circular flow includes: (i) consumer spending on domestic and foreign producers; (ii) labor compensation paid by producer cells to consumer cells; (iii) profit income (dividends, mixed income, owner-occupied housing surplus) from producers to consumers; (iv) intermediates trade between domestic producers; (v) foreign trade; (vi) government taxes, transfers, and spending. A “bottom-up” approach uses microdata — geocoded transaction records from Danske Bank (largest Danish bank) and administrative government registers — to directly measure bilateral flows; a “top-down” approach distributes aggregate flows using assignment algorithms. Year: 2018. Data available at disaggregatedaccounts.com.

Stylized facts (Section IV):

1. Domestic spending shares (§IV.B): The share of a consumer cell’s spending going to domestic rather than foreign producers ranges from 75% to almost 100% (average 92%). Rural (small-population) cells, older cells, and less college-educated cells have higher domestic spending shares. Population size, average age, and college share jointly explain about half of the cross-cell variation in domestic shares; the patterns hold within industry and within region. The majority of foreign spending goes to travel-related and specialized retail categories (hotels, airlines, food away from home, clothing).

2. Gravity (§IV.C): Consumer spending declines with distance (log-log gradient = −1.33, column 1 of Table II). On average, roughly 50% of spending stays in the home region and an additional 10% goes to regions within 25 km. The distance gradient is steeper for groceries and fuel (local, in-person purchases) and shallower for telecommunications, insurance, and hotels. Rural, older, and less college-educated consumers spend more locally (stronger distance gradient, consistent with higher domestic shares).

3. Urban bias (§IV.E): Consumer spending flows disproportionately toward large cities. The 15 largest regions receive 34% of national consumer spending while accounting for only 27% of consumers. Urban bias is absent for everyday purchases (groceries) and strong for irregular or remote purchases (telecommunications, specialized retail). Rural consumers also visit urban regions in person, so urban bias is present in card payments too.

4. Assortative spending (§IV.D): Consumers tend to spend on producer cells employing workers with similar characteristics. Age of consumers and average age of workers in receiving cells are positively correlated (β = 0.178); college share similarly (β = 0.120); domestic spending share similarly (β = 0.203). The slopes are well below 1 (consumers purchase from many cells), but mild assortative spending reinforces first-order domestic spending patterns through higher-order connections.

5. Triangular flows (§IV.F): A distinctive cross-regional pattern: consumer spending and intermediates trade flow on net from rural to urban regions (urban regions run a net internal trade surplus); rural regions run a net external surplus (rural manufacturers export; e.g., Novo Nordisk in Kalundborg, Vestas in Nakskov); urban regions import relatively more from abroad. This triangular flow arises from urban consumption amenities and urban business service concentration.

Spending intensity (§IV.G): The paper constructs a reduced-form measure capturing, for each consumer cell i, how much its spending contributes to the income of a target group of cells — accounting for all higher-order connections (the infinite sum over indirect spending chains). The domestic spending intensity of cell i is defined recursively as the sum over all domestic producer cells j of (spending share αji × domestic spending intensity of producer cell j). Values range from roughly 0.4 to 0.9. The measure is strictly greater than the direct domestic spending share because the recursive formula incorporates second- and higher-order domestic connections. Domestic spending intensity is higher for rural, older, and less college-educated cells (consistent with the stylized facts). A spending intensity on slack cells can be constructed in the same way by replacing the target group with cells experiencing demand-driven unemployment.

General equilibrium model (Sections V–VI): The model is a static small open economy with many consumer and producer cells. Consumer utility is Cobb-Douglas over goods from all producer cells and foreign goods. Each producer cell’s production function is Cobb-Douglas with decreasing returns to scale (equivalent to a fixed factor). The key friction is downward nominal wage rigidity: Wi ≥ (1−δ)W̄i. When demand for a cell’s labor falls sufficiently (more than fraction δ), the wage rigidity binds and some workers in that cell become slack (unemployed demand-determined). A fiscal transfer to consumer cell i raises its income, which stimulates spending, which flows through the disaggregated network to raise labor demand across cells. The multiplier is higher when recipient spending flows disproportionately to slack cells, generating additional employment. The model is calibrated using the measured disaggregated accounts: spending shares αji, profit shares κij, labor shares λij, intermediates shares ωjj′, and tax rates are all taken directly from the disaggregated data. Baseline elasticity of substitution = 1 (Cobb-Douglas); robustness checks use short-run elasticities (< 1) and long-run elasticities (> 1), with no material change in conclusions.

Analytical result (Proposition 1): In an economy-wide recession (all cells slack), the vector of transfer multipliers is µ = ϕ′ · (I − M)⁻¹ · M · D((1 − τ̄ᵢ)⁻¹), where M is a transformed Leontief-style spending matrix incorporating the disaggregated accounts and τ̄ᵢ are fiscal externalities. The key insight is that the multiplier of cell i’s transfer is closely linked to its spending intensity on all other domestic cells, with all higher-order connections captured by the (I − M)⁻¹ M term. A cell’s multiplier is high when: (i) it spends domestically rather than on imports; (ii) it spends on producers that in turn employ domestic workers in slack cells; and (iii) these higher-order effects amplify through the circular flow.

Economy-wide recession: quantitative multipliers (Table III):

Transfer policy	Multiplier	Cost to raise GDP by 5% (bn DKK)
Uniform (all adults)	1.04	96.08
Top 10% domestic spending intensity	1.21	81.99
2018 child tax credit	1.02	97.85
2022 inflation relief to elderly	1.13	88.11
2023 housing rent inflation support	1.03	96.45
Construction worker support	1.23	81.16
Consulting/IT worker support	0.95	105.22

High-multiplier policies (construction workers, 2022 elderly relief) target rural, older, less college-educated cells with high domestic spending intensity. Low-multiplier policies (consulting/IT workers, 2023 housing relief, 2018 child tax credit) target urban, young, or college-educated cells with lower domestic intensity. The gap between the best and worst policies amounts to savings of roughly 15 bn DKK (≈ 2.4 bn USD), or 0.4–0.7% of Danish GDP, for the same aggregate GDP impact.

U.S. tariff shock application (Section VII): The paper analyzes a hypothetical U.S. tariff increase to 41.4% (the July 2025 effective U.S. tariff on China) on Danish exports, motivated by Greenland tensions. The shock reduces export revenue by 41.4% for each producer cell, with direct exposure varying by region: Billund (Lego headquarters), Kalundborg (pharmaceuticals), and a Copenhagen manufacturing hinterland face the largest direct declines — up to 8% of total regional sales. The shock propagates through the disaggregated network; cells whose income falls by more than 4% become slack. Key findings:

Regional slackness follows direct exposure but is also shaped by proximity to other exposed regions (urban bias propagates the shock to cities) and isolation (Billund has high direct exposure but low slackness relative to exposure because it is geographically isolated from other high-exposure cells)
Transfer multipliers for this heterogeneous recession (Proposition 2) depend on spending intensity on slack cells, not on direct exposure or own slackness
Table IV (R² for multiplier): slack cell indicator alone explains R² = 0.015; direct spending share on slack raises R² to 0.366; spending intensity on slack cells raises R² to 0.769 (column 3); adding both spending share and spending intensity on slack reaches R² = 0.840 (column 4)
Billund, despite high exposure, has low multiplier because its spending (often local to a low-exposure vicinity) does not create labor demand for slack cells elsewhere
Some of the highest-multiplier regions are themselves non-slack but are surrounded by many slack cells, so their spending effectively employs slack workers

Dynamic model (Section VIII): The paper extends to a dynamic OLG (Blanchard-Yaari) model with heterogeneous marginal propensities to consume (MPCs) calibrated from a 2009 Danish fiscal policy. Key result: static and year-4 dynamic multipliers are closely correlated (slope ≈ 0.898). Long-run cumulative multipliers exactly equal static multipliers (formally proved in Appendix V.F): in the long run, all transfers are fully spent. MPCs and domestic spending intensity are complementary determinants of dynamic multipliers — targeting high-MPC cells amplifies short-horizon (year 0–2) multipliers, while targeting high-spending-intensity cells shapes both short- and long-run multipliers. The paper’s main mechanism (spending intensity on slack cells) is robust at all horizons.

Robustness (Section IX): (i) Counterfactual accounts with reversed stylized patterns (e.g., rural cells spending like urban cells) lead to substantially different multipliers — the specific measured patterns drive the results. (ii) Imposing standard simplifying assumptions (consumer spending flows only to local producers; spending flows across regions in proportion to intermediate trade) misses most of the multiplier variation. (iii) The mechanism is similarly important in less open economies. (iv) Low short-run and high long-run substitution elasticities (from the trade literature) produce similar multiplier rankings across cells.

Scope conditions: The implementation is a proof of concept for Denmark, using existing micro data from a single large bank and government registers; full coverage of all banks and complete data on within-firm flows would strengthen measurement. Capital-related transactions (saving, investment, financial assets) are aggregated into a single capital accumulation cell — disaggregating these would require different data. The model is intentionally static (with a dynamic extension), abstracting from price adjustment dynamics beyond the NK wage rigidity. The analysis is a partial equilibrium in the sense that monetary policy response is not modeled; the fixed exchange rate assumption is realistic for Denmark (pegged to the Euro) but may not transfer to economies with flexible rates. The proof of concept suggests that national statistical agencies could benefit substantially from measuring disaggregated flows through refined surveys.

In depth

Q1. What is missing from standard national accounts that this paper’s system provides?

Standard national accounts measure aggregate consumer spending, income, and output, plus intermediates trade among producer industries (input-output tables); what they do not measure is which specific consumer groups buy from which specific producer groups, or which specific producer groups pay labor and profit income to which specific consumer groups. This means that propagation of a shock through the circular flow — e.g., a tariff shock that reduces exports by rural manufacturers, which reduces income for rural workers, who then reduce spending on urban services, which reduces urban workers’ income — cannot be traced without simplifying assumptions (like “spending flows only to local producers”) that the disaggregated data shows to be empirically inaccurate. The paper provides a proof of concept demonstrating that measuring these bilateral consumer-to-producer and producer-to-consumer flows, while satisfying all national accounting identities, is feasible with existing micro data and yields policy-relevant variation in fiscal multipliers.

Q2. Why do rural, older, and less college-educated consumer cells have higher fiscal multipliers during an economy-wide recession?

These groups have higher domestic spending intensity — a higher fraction of their spending reaches domestic consumers rather than leaking abroad — because they spend less on international tourism, less on imported goods accessed through online retail or urban services, and more on local goods purchased in person. The gravity patterns (stronger distance gradient) and direct domestic spending shares document this directly: rural consumers allocate ~92–100% of spending to domestic producers versus ~75–80% for urban young college-educated consumers. When all cells are slack, a transfer to a high-domestic-intensity cell circulates more within the country, generating more rounds of domestic income and employment before leaking to imports. The mild assortative spending pattern further reinforces the first-order effect: spending by rural older consumers flows toward producer cells employing workers with similar characteristics, who also spend domestically, so higher-order connections amplify rather than dilute the domestic spending effect.

Q3. Why does targeting directly exposed or slack cells not guarantee a high transfer multiplier after the U.S. tariff shock?

A transfer raises GDP by increasing spending, which creates labor demand for other consumer cells; a transfer to a slack cell only generates a high multiplier if that cell’s spending flows toward other slack cells (directly or through indirect chains) — not if it flows toward non-slack cells or abroad. The tariff shock creates isolated pockets of slackness in rural manufacturing regions (e.g., Billund for Lego) that are geographically far from other slack regions; Billund consumers spend locally (gravity) and their locality is not itself a center of other slack cells. In contrast, regions near Copenhagen with moderate direct exposure may have high multipliers if they are close to many other slack manufacturing cells — their spending generates employment across the slack network. The R² decomposition confirms this: knowing a cell is slack explains only 1.5% of multiplier variation (R² = 0.015), while knowing its spending intensity on slack cells explains 76.9% (R² = 0.769).

Q4. How does the paper ensure that the disaggregated flows satisfy national accounting identities?

The system is designed so that every cell’s total inflows equal total outflows (a cell-level balance sheet constraint), and the sum of all cell-level flows equals the corresponding national aggregate from the SNA — both conditions are imposed by construction, not just approximated. For most positions, a bottom-up approach uses observed bilateral microdata (e.g., card payments from Danske Bank directly measure consumer spending by consumer cell i at producer cell j); for positions without direct microdata, a top-down algorithm distributes an aggregate total across cells using assignment rules grounded in the microdata. This dual approach ensures national comprehensiveness (the sum of disaggregated flows equals aggregate national accounts) and individual consistency (cell-level identities hold), unlike existing regional accounts or social accounting matrices that satisfy only one of these constraints.

Q5. What is the relationship between spending intensity and the standard fiscal multiplier formula?

The cell-level multiplier (dGDP/dTi) in Proposition 1 equals approximately the cell’s spending intensity on domestic cells, corrected for fiscal externalities and price effects of the fixed factor. The formal difference is that the model multiplier involves the matrix (I − M)⁻¹M where M incorporates both spending and production shares (through which price changes for the fixed factor enter), while the reduced-form spending intensity uses only the spending matrix. Despite this difference, the two measures are highly correlated empirically: the regression of cell-level multipliers on domestic spending intensity has a slope of approximately 1.66 for static multipliers. The spending intensity can thus be calculated directly from the disaggregated accounts without solving the full general equilibrium model, making it a practical statistic for policy guidance.

Q6. How does the dynamic model reconcile the fact that rural, older, and less college-educated cells have high spending intensities but typically lower MPCs?

MPCs and spending intensities are complementary but distinct determinants of dynamic multipliers at short horizons: high-MPC cells spend the transfer quickly (year 0–1), generating a large immediate impact, while high-spending-intensity cells ensure that spending, whenever it occurs, circulates domestically and reaches slack labor markets. At long horizons (year 4+) the two effects converge because all cells eventually spend their full transfer (long-run MPC = 1) and the multiplier converges to the static model’s value, which depends only on spending intensity. The practical implication is that policies targeting rural/older/less-educated cells (high intensity, lower MPC) may have lower immediate multipliers than policies targeting high-MPC urban consumers, but converge to higher long-run multipliers. The year-4 cumulative multipliers from the dynamic model closely resemble the static model, suggesting a 3–5 year business cycle horizon is well captured by the static analysis.

Q7. What does the triangular flow pattern imply for understanding regional inequality and fiscal redistribution?

The triangular flow — rural regions receive net income from foreign exports; rural consumers spend net inflows toward urban regions; urban consumers spend net toward abroad — means that rural regions’ incomes depend on export competitiveness while urban regions’ incomes depend on domestic consumption demand; fiscal transfers to rural consumers thus have high domestic multipliers because their spending boosts urban income (via the rural-to-urban spending flow), which then circulates domestically before leaking abroad. This pattern is also consistent with the political economy finding that high-multiplier cells (rural, older, less educated) are more likely to vote for right-wing populists and feel politically disenfranchised — they are the “left behind” groups that economic research associates with exposure to globalization and automation, but whose spending patterns happen to generate large domestic multipliers during recessions.

Key concepts

disaggregated economic accounts : a system that breaks down all national accounting positions — consumer spending, labor and profit income, intermediates trade, government transactions, foreign trade — into bilateral flows between consistently defined region-by-industry consumer cells and producer cells, satisfying national accounting identities both at the cell level and in aggregate; the paper’s proof of concept is implemented for Denmark using 2,744 consumer cells and 2,646 producer cells in 2018.

spending intensity : a cell-level, reduced-form statistic capturing how much a consumer cell’s spending contributes to the income of a target group of cells (e.g., all domestic cells or all slack cells), accounting for all indirect higher-order connections through the circular flow; formally defined as a recursive sum that incorporates the full disaggregated network structure; ranges from 0.4 to 0.9 for domestic spending intensity and is systematically higher for rural, older, and less college-educated cells.

slack cell : in the paper’s NK model, a consumer cell for which demand-driven unemployment occurs because the nominal wage rigidity binds — labor supply exceeds demand when the cell’s income declines by more than a threshold δ due to a negative demand shock; fiscal transfers with high multipliers are those whose spending reaches slack cells (directly or through higher-order network connections).

triangular flows : the cross-regional spending pattern documented for Denmark in which net consumption spending flows from rural regions to urban regions (urban bias), net foreign export revenue flows to rural regions (rural manufacturing), and net foreign import spending flows from urban regions; implies that rural-to-urban spending flows act as an important transmission channel for fiscal stimulus targeted at rural consumers.

bottom-up vs top-down disaggregation : the two methodological approaches for constructing bilateral cell-to-cell flows; the bottom-up approach uses individual-level microdata (e.g., bank transaction records) to directly observe cell-to-cell payment flows; the top-down approach allocates an aggregate national accounting position across cells using assignment algorithms informed by microdata; both approaches are designed so that the resulting disaggregated flows sum to the corresponding SNA aggregate.

Disincentive effects of unemployment insurance benefits

Mon, 01 Jan 0001 00:00:00 +0000

This paper isolates the disincentive effects of pandemic unemployment insurance (UI) benefits on employment recovery, separating them from the simultaneously operating stimulative (demand) effects that previous studies conflate. The authors study the largest UI expansion in U.S. history — the CARES Act of March 2020 — which introduced three simultaneous provisions: a $600 weekly income supplement (FPUC) through end of July 2020, a 13-week extension of maximum benefit duration (PEUC), and expanded eligibility to workers previously ineligible for UI (PUA), together raising the median replacement rate to 145% and more than doubling the number of UI recipients.

The empirical strategy uses high-frequency establishment-level data from Homebase (HB), a scheduling and payroll provider covering approximately 140,000 small U.S. businesses — predominantly restaurants and retailers — matched to Yelp price-tier data and Safegraph foot-traffic and spending data. The final estimation sample is 4,595 businesses within 1,195 local-industry cells, observed at weekly frequency from January 2019 to December 2020.

The identification rests on comparing employment recovery of low-wage versus high-wage businesses within the same narrow local labor market (four-digit zip code), industry (two-digit NAICS), and price tier. Because neighboring businesses largely share the local demand stimulus from UI, differencing within local-industry cells removes common demand effects. The key variation is the expiration of the $600 supplement, which differentially compresses the replacement-rate gap between low- and high-wage businesses depending on local average wages — labor markets where the gap falls more sharply are the treated group.

The main empirical finding is that a 100 percentage point decline in the replacement rate gap is associated with a 5.7 percentage point rise in low-wage business employment recovery relative to high-wage business employment recovery at 12 weeks after the $600 expiration. For the average labor market, the expiration of the $600 supplement decreased the replacement rate gap by 46 percentage points, implying a 2.6 percentage point closing of the low-versus-high-wage employment gap within 12 weeks. Importantly, hours per employee and hourly wages grew faster in low-wage businesses over the same period, consistent with a labor supply rather than a demand mechanism. When the comparison is conducted at the U.S. state level rather than within local-industry cells — as in Finamor and Scott (2021) — the effect disappears and reverses sign, illustrating how local demand effects obscure disincentive effects at broader geographic aggregations.

To quantify the aggregate employment impact, the authors build and calibrate a McCall-style labor search model with heterogeneous firm wages, a UI-eligible and non-UI unemployed pool, and equilibrium reservation wages. The model is extended to include a probability (calibrated at 16.5%) that workers lose UI eligibility upon refusing a job offer, which reconciles the model with the empirical estimates; without this feature the baseline model substantially overstates the differential employment effect of the $600 expiration.

The full model-implied aggregate employment loss from all CARES Act UI provisions combined is 3.4 percentage points on average between April and December 2020, representing approximately 20% of the average employment shortfall in the Leisure and Hospitality sector over that period. When each provision is implemented in isolation, the effects are modest ($600 supplement: 0.2 pp; extended duration: 0.2 pp; expanded eligibility: 1.0 pp), but their interaction generates the large combined effect. Expanded eligibility is identified as the most disruptive provision, particularly for low-wage businesses, because it depletes the pool of non-UI unemployed who are the primary source of hires for these firms. The unemployment duration elasticities implied by the model are modest and in line with the low-to-middle range of pre-pandemic estimates.

The paper’s scope is restricted to the disincentive channel and deliberately excludes the stimulative effects of UI; it studies small, in-person service sector businesses and the April–December 2020 recovery period only.

Q: What is the core identification challenge this paper addresses? A: Prior empirical studies find only modest net effects of pandemic UI on employment, but it is unclear whether this reflects small disincentive effects or the near-cancellation of two opposing forces — UI suppressing labor supply while simultaneously stimulating local consumer demand. Identifying the disincentive effect alone requires a design that neutralizes the demand channel. The authors accomplish this by comparing low-wage and high-wage businesses within the same narrow local market, industry, and price tier, so that common local demand shifts from UI are differenced out.

Q: What data does the empirical analysis use, and how is the sample constructed? A: The primary data source is Homebase, covering approximately 140,000 small U.S. businesses with daily employment, hourly wages, and hours worked. The estimation sample is restricted to 4,595 businesses present throughout 2019, matched to Yelp price-tier classification and Safegraph weekly foot traffic and credit-card spending. Businesses are grouped into 1,195 local-industry cells defined by four-digit zip code, two-digit NAICS industry, and Yelp price tier (inexpensive vs. expensive). Within each cell, businesses are classified as low-wage or high-wage, with high-wage businesses paying on average $1.80 per hour more — about 8% above the average hourly wage of $10.90.

Q: How is the replacement rate defined in the empirical framework? A: The business-specific replacement rate is the ratio of average UI receipts (state benefit plus the pandemic supplement, converted to hourly units) to the pre-pandemic average hourly wage of that business. Because the supplement is uniform across workers, businesses with lower pre-pandemic wages face higher replacement rates; the replacement rate gap between low- and high-wage businesses within a local market is therefore a function of both state benefit levels and the local wage dispersion.

Q: What does the event-study analysis around the $600 expiration show? A: The event study exploits cross-labor-market variation in how much the replacement rate gap between low- and high-wage businesses declined when the $600 FPUC supplement expired at end of July 2020. Labor markets with a larger decline in the gap see faster relative recovery in low-wage business employment after expiration. A 100 percentage point decline in the replacement rate gap is associated with a 5.7 percentage point rise in the low-versus-high-wage employment recovery gap at 12 weeks post-expiration. For the average labor market, the $600 expiration reduced the replacement rate gap by 46 percentage points, implying a 2.6 percentage point narrowing of the employment recovery gap.

Q: Why does the estimated effect disappear when broader geographic aggregations are used? A: When businesses are compared within U.S. state borders rather than within local-industry cells, the estimated coefficient on the replacement rate gap turns positive and statistically insignificant. This occurs because at the state level, low-wage areas benefit disproportionately from the purchasing power increase that generous UI provides to local unemployed workers, so demand effects swamp and reverse the supply-side disincentive. This finding explains why Finamor and Scott (2021), using Homebase data with state fixed effects, find no negative association between replacement rates and labor market re-entry.

Q: What evidence supports a labor supply rather than demand interpretation of the differential recovery? A: During the period of the $600 supplement, hours per employee and hourly wages grew faster in low-wage businesses than in high-wage businesses, even as low-wage businesses lagged in employment levels. If the differential recovery reflected demand deficiencies at low-wage businesses, hours per employee and wages should have grown faster at high-wage businesses instead. The observed pattern is consistent with labor supply shortfalls at low-wage firms.

Q: What is the structure of the quantitative labor search model? A: The model features a unit measure of workers and a fixed measure of firms, each posting a constant idiosyncratic wage drawn from an exogenous distribution. Unemployed workers receive job offers at a rate determined by labor market tightness and accept offers above their reservation wage. Reservation wages are equilibrium objects because UI benefits depend on the worker’s previous wage. The unemployed are split into UI-eligible and non-UI pools; the non-UI pool accepts jobs from lower in the wage distribution and is the primary supply source for low-wage firms. The model is calibrated to pre-pandemic U.S. service sector averages, with a pre-pandemic UI replacement rate of 0.51, a UI recipiency probability of 14%, and a non-UI replacement rate of 0.15.

Q: Why does the baseline model overstate the empirical effect, and how is this reconciled? A: The baseline model dramatically overstates the differential employment impact of the $600 expiration because the CARES Act’s expanded eligibility (modeled as a rise in the recipiency probability from 14% to 70%) nearly empties the non-UI unemployed pool, which is the dominant labor supply source for low-wage firms. In the data, the share of unemployed receiving UI nearly tripled for in-person leisure and hospitality workers, but not to the degree that the model’s implied employment collapse would require. The model is reconciled by introducing a 16.5% probability that a worker loses UI eligibility upon refusing a suitable job offer — consistent with UI law — which reduces the effective outside option and raises acceptance rates for low-wage firms.

Q: What are the aggregate employment losses implied by the model? A: When all three CARES Act provisions are implemented jointly, the model estimates that the disincentive effects held back aggregate employment recovery by 3.4 percentage points on average between April and December 2020 — approximately 20% of the average employment shortfall in the Leisure and Hospitality sector. Implemented in isolation, each provision generates only modest losses: the $600 supplement alone accounts for 0.2 percentage points, extended duration for 0.2 percentage points, and expanded eligibility for 1.0 percentage points. The large combined effect arises from the interaction of all three provisions, not from any single one.

Q: What are the conditional (interaction) effects of each provision when the other two are in place? A: Conditional on the other two provisions being active, the income supplement holds back employment recovery by 1.6 percentage points, the extended duration by 1.5 percentage points, and expanded eligibility by 2.9 percentage points. This interaction effect is the central quantitative finding: individually modest provisions combine to produce effects far exceeding their sum when implemented simultaneously.

Q: What are the implied unemployment duration elasticities, and how do they compare to the literature? A: The $600 supplement alone raises average unemployment duration by 8% against a 343% rise in the replacement rate, implying an elasticity of 0.02. Extended duration alone raises unemployment duration by 6% against a 150% increase in potential benefit duration, implying an elasticity of 0.03. Expanded eligibility alone raises unemployment duration by 19%, implying an elasticity of 0.04. When each provision is activated on top of the other two, the implied elasticities rise substantially: 0.24 for the $600 supplement, 0.43 for extended duration, and 0.28 for expanded eligibility. These are in the low-to-middle range of pre-pandemic estimates (Katz and Meyer, 1990: 0.3–0.5; Johnston and Mas, 2018: 0.4–0.8; Rothstein, 2011: 0.06; Farber and Valletta, 2015: 0.15).

Q: What is the role of expanded eligibility specifically? A: Expanded eligibility is identified as the most disruptive CARES Act provision, accounting for 1.0 percentage points of employment loss alone and 2.9 percentage points conditional on the other provisions. Mechanically, expanded eligibility converts non-UI unemployed workers into UI-eligible workers, draining the pool of workers willing to accept low-wage job offers. Because low-wage firms depend disproportionately on the non-UI pool for hiring, this provision disproportionately depresses their employment. Using CPS data, the authors document that the share of unemployed workers receiving UI in the in-person leisure and hospitality sector nearly tripled in 2020 relative to the pre-pandemic period.

Q: What are the scope conditions and limitations of the analysis? A: The empirical analysis is restricted to small, in-person service sector businesses (restaurants and retailers) in the Homebase sample, which may not be representative of the broader labor market. The quantitative model is explicitly focused on disincentive effects only and does not capture the stimulative or demand effects of UI. The model also abstracts from re-opening restrictions and other pandemic-specific confounders. The analysis covers April to December 2020; the 2021 pandemic UI extensions are not studied. The job-refusal probability (chi = 16.5%) is a reduced-form calibration target rather than a structurally identified parameter.

Replacement rate gap: The difference in business-specific UI replacement rates between low-wage and high-wage businesses within the same local labor market; defined as UI benefits (state benefit plus supplement) divided by the business’s pre-pandemic average hourly wage. Larger gaps indicate greater relative disincentive for workers to accept jobs at low-wage firms.

Disincentive effect: The negative impact of higher UI replacement rates on workers’ willingness to accept job offers and thus on business employment recovery, isolated from the simultaneous stimulative demand effect of UI spending.

Non-UI unemployed pool: Workers who are ineligible for or have exhausted UI benefits and therefore receive only social benefits at a lower replacement rate (calibrated at 0.15 in the model). This group has a lower reservation wage and constitutes the primary labor supply source for low-wage firms.

Local-industry cell: The paper’s unit of comparison — businesses sharing the same four-digit zip code (covering on average four neighboring zip codes), two-digit NAICS industry, and Yelp price tier. Within-cell differencing is the mechanism that removes common local demand effects.

Benefit recipiency probability: The probability that a newly separated worker enters the UI-eligible unemployed pool, combining UI eligibility and takeup. Pre-pandemic this is calibrated at 14%; under the CARES Act it rises to 70%, targeting the observed near-tripling of UI recipients in the CPS data.

Job-refusal eligibility loss: A probability (calibrated at 16.5%) that a UI-eligible worker who rejects a job offer loses UI status and transitions to the non-UI pool. Motivated by UI law prohibiting refusal of suitable work; reduces the effective outside option and reconciles the model’s predicted employment gap with the empirical estimate.

Equilibrium residual wage dispersion: The wage dispersion observed in equilibrium conditional on worker observables. The model generates realistic dispersion by calibrating the non-UI replacement rate to match the lower half of the wage distribution and the firm wage offer variance to match the upper half; the presence of the non-UI state substantially increases residual dispersion relative to standard search models.

Distorted prices and targeted taxes in the New Keynesian Network model

Mon, 01 Jan 0001 00:00:00 +0000

This paper asks how governments should optimally adjust sector-specific taxes in response to sectoral shocks when monetary policy cannot be tailored to individual sectors. The authors work within a variant of Rubbo’s (2023) New Keynesian Network (NKN) model, augmented to include time-varying sectoral sales taxes and production subsidies. The model features N sectors connected through input-output linkages, with Calvo-type price rigidity that is heterogeneous across sectors, and encompasses both sectoral productivity (supply) shocks and demand shocks.

The central finding, stated as Proposition 1, is that the first-best tax policy requires exactly 2N instruments—one sales tax and one production subsidy per sector—not just instruments in the shocked sector. The mechanism turns on a twofold distortion created by sticky prices. Because only a fraction of firms adjust prices at any time, relative prices are distorted both within sectors (price dispersion among firms) and across sectors (misalignment of relative prices). The production subsidy offsets the effect of shocks on marginal costs, incentivizing price-adjusting firms to leave seller prices unchanged and thereby eliminating within-sector dispersion. The sales tax—which applies to both household purchases and intermediate goods trade—steers demand across sectors so that market prices move as if fully flexible, closing sectoral output gaps even as seller prices remain constant. The optimal sales tax moves exactly one-for-one with the vector of natural prices. Crucially, budget neutrality holds to first order: the sales tax revenues fund the production subsidies.

The strength of each instrument’s response depends on network proximity rather than price rigidity. For supply shocks, adjustment propagates downstream (governed by the Leontief inverse), so sectors that intensively use inputs from the shocked sector require larger responses. For demand shocks, adjustment propagates upstream first and then back downstream, so upstream suppliers to the shocked sector face the largest responses.

Because the first-best policy requires observing sectoral shocks directly, the authors propose a simple 2N rule (Proposition 2) that responds only to observable sectoral seller-price inflation, with rule strength parameter ϕ_i per sector. As ϕ_i → ∞ the simple rule converges to the first-best. Crucially, the rule can be implemented by observing inflation only in the shocked sector and adjusting taxes and subsidies in other sectors proportionally to their input-output distance from that sector.

The quantitative assessment calibrates the model to the U.S. economy using BEA 2017 input-output accounts with N = 373 sectors at the 6-digit classification. Sectoral price flexibility is drawn from Antonova (2025), ranging from 0.052 to 0.989 with a median of 0.277 (implying a median price duration of roughly 4.3 months). Shocks follow AR(1) processes with persistence ρ = 0.97. Supply shocks hit 10 energy-related sectors (roughly 10% of total sales); demand shocks hit 22 service-related sectors (roughly 7% of total sales). The key quantitative finding is that the simple 2N policy—both subsidy and tax together—delivers substantially greater welfare improvement than a subsidy-only policy (N instruments), particularly for supply shocks. When the subsidy is not accompanied by the corresponding sales tax, welfare gains are much smaller.

The paper extends to an open economy with import-price shocks that act simultaneously as supply and demand shocks. Applied to the 2022 Ukraine war energy crisis: a 24% world-energy-price increase (IMF Global Energy Price index, 2022M1–2022M4) is used, with high-dependence Europe (energy import share γ_EU = 0.63, substitution elasticity η_EU = 1) contrasted against low-dependence U.S. (γ_US = 0.17, η_US = 4). In Europe, adverse supply effects dominate so the domestic energy sector contracts; in the U.S., demand substitution effects dominate so domestic energy expands. Simple 2N rules correlate 0.89 with the optimal policy across sectors for Europe and 0.94 for the U.S. A notable normative implication: the optimal policy raises sales taxes on energy to discourage consumption, in contrast to the actual European policy of subsidizing energy consumption during the 2022 crisis.

Q: Why can monetary policy not achieve the first-best allocation in the NKN model?

A: Monetary policy sets a single nominal interest rate that applies uniformly across all sectors, but sectoral shocks generate heterogeneous natural rates. Even if monetary policy stabilizes aggregate output, it cannot simultaneously close all sectoral output gaps and eliminate within-sector price dispersion. Rubbo (2023) shows that optimal monetary policy improves welfare but leaves a significant welfare loss remaining.

Q: What is the core tradeoff in each sector that motivates the 2N result?

A: With Calvo-type staggered pricing, adjusting a sector’s relative price to close its output gap creates price dispersion within the sector because not all firms adjust simultaneously; but holding seller prices constant to avoid dispersion leaves output gaps open due to the absence of relative price adjustment. Two instruments—production subsidy and sales tax—are required to address both sides of this distortion simultaneously, in keeping with the Tinbergen principle.

Q: How exactly do the production subsidy and sales tax each work under the optimal policy?

A: The production subsidy is paid to producers and affects the optimal seller price for a given marginal cost, incentivizing firms that can adjust prices to leave them unchanged. The sales tax is levied on buyers (households and downstream firms) and, because it is applied to both household consumption and intermediate goods trade, it steers demand across sectors to replicate the efficient allocation of expenditure. Under the optimal policy, seller prices are fully stabilized (ps_t = 0) while buyer (market) prices move as pt = τs_t = pn_t, mimicking flexible-price outcomes.

Q: What determines which sectors receive larger optimal tax and subsidy responses?

A: For supply (productivity) shocks, responses are governed by the matrix L̄ = XL, where L is the Leontief inverse measuring downstream proximity; sectors that are more intensive downstream users of the shocked sector require larger responses. For demand shocks, the relevant matrix measures upstream proximity, so sectors that supply inputs to the shocked sector face stronger responses. Critically, the level of the policy response is independent of sector-specific price rigidity; only the network structure matters.

Q: Is the optimal 2N policy budget-neutral, and why only approximately?

A: Budget neutrality holds to first order around the zero-profit steady state. The production subsidy applies to costs while the sales tax applies to sales; at the steady state these coincide, so the subsidy is exactly funded by the tax revenue. The approximation breaks down away from the zero-profit steady state because costs and sales diverge.

Q: What is the simple 2N rule and how does it relate to the first-best?

A: The simple rule sets sp_t = Iϕ · πs_t and τs_t = sp_t, where Iϕ = diag{ϕ_i} is a diagonal matrix of response coefficients for each sector’s seller-price inflation. As ϕ_i → ∞ for all i, the allocation converges to first-best; larger ϕ_i produces a stronger commitment to stabilize sectoral inflation, resulting in muted inflation rather than large tax and subsidy levels. In practice, the rule can be implemented by observing inflation only in the shocked sector and scaling responses in other sectors by their input-output distance from that sector.

Q: What does the three-sector example (Energy, Manufacturing, Services) illustrate about supply vs. demand shocks?

A: Under an adverse energy productivity shock, the optimal policy subsidizes Energy and Manufacturing (proportional to energy use in manufacturing) but not Services, since Services are not energy-intensive and thus not closely connected downstream. Under a positive manufacturing demand shock, the optimal policy subsidizes both Manufacturing and upstream Energy equally, reflecting that demand shocks propagate upstream first.

Q: What does the calibrated quantitative exercise show about the welfare gains from using both instruments versus one?

A: For both supply and demand shock scenarios, the simple 2N policy (subsidy plus tax) delivers substantially greater welfare improvement than using only monetary policy. When the subsidy is not accompanied by the corresponding sales tax, welfare gains are much smaller, confirming that both instruments together—not subsidies alone—are essential. This is identified as a key quantitative finding of the paper.

Q: How robust are results to decreasing returns to scale in production?

A: Under decreasing returns to scale, the optimal policy response is highly similar to the baseline: correlations between the two are 0.98 for supply shocks and 0.99 for demand shocks across sectors. The simple 2N rule continues to deliver significant welfare improvements. One difference is that demand shocks generate relatively higher welfare losses under decreasing returns, while productivity shocks lead to lower losses.

Q: How does the open-economy extension change the analysis for import-price shocks?

A: Import-price shocks enter the model as both supply shocks (raising input costs) and demand shocks (shifting expenditures toward domestic substitutes), so they require a policy response that accounts for both propagation channels simultaneously. The optimal open-economy policy is formally isomorphic to the closed-economy counterpart but with redefined upstream and downstream matrices and shock vectors. The relative importance of the supply versus demand channel depends on the economy’s import dependence and substitution elasticity.

Q: How does the 2022 energy crisis illustrate the difference between the optimal policy and actual European policy?

A: Using a 24% world-energy-price increase (IMF Global Energy Price index, 2022M1–2022M4), the model implies that with high European energy dependence (γ_EU = 0.63, η_EU = 1), adverse supply effects dominate and the optimal policy raises sales taxes on energy to discourage consumption and subsidizes domestic energy users proportional to downstream proximity. Actual European policy partly subsidized energy consumption, which the model identifies as welfare-reducing relative to the optimal response. For the low-dependence U.S. (γ_US = 0.17, η_US = 4), demand substitution toward domestic energy dominates, requiring additional subsidies to domestic energy producers.

Q: How does this paper relate to the Diamond-Mirrlees result on intermediate good taxation?

A: Diamond-Mirrlees (1971) recommends against taxing intermediate goods in an otherwise efficient economy to avoid introducing additional distortions. This paper considers an economy already subject to pricing frictions (Calvo staggered pricing), and shows that taxing intermediate goods through the sales tax—which applies to intermediate goods trade—is part of the optimal policy precisely because it corrects the pre-existing distortions. The paper thus does not contradict Diamond-Mirrlees but operates in a different setting where frictions are already present.

New Keynesian Network (NKN) model: A multi-sector general equilibrium framework with N sectors connected through input-output linkages, Calvo-type staggered price setting that is heterogeneous across sectors, and monopolistically competitive firms; provides the canonical system of sectoral IS curves and Phillips curves used in this paper.

2N policy: The paper’s central result that the first-best tax policy requires exactly two instruments per sector—one production subsidy and one sales tax—for a total of 2N instruments; characterized in Proposition 1 and named for this instrument count.

Production subsidy (sp_t,i): A sector-specific transfer paid to producers that affects the optimal seller price for a given marginal cost; under the optimal policy it offsets the effect of shocks on marginal costs, incentivizing price-adjusting firms to leave seller prices unchanged and thereby eliminating within-sector price dispersion.

Sales tax (τs_t,i): A sector-specific tax levied on buyers—both households and downstream firms purchasing intermediate goods—such that the buyer (market) price equals (1 + τs_t,i) times the seller price; under the optimal policy it replicates the efficient allocation of expenditure across sectors even when seller prices are fully stabilized.

Downstream proximity (Leontief inverse L̄ = XL): A measure of the total direct and indirect use of a sector’s output by other sectors, governing the propagation and optimal policy response to supply (productivity) shocks; the ij-th element of L̄ captures how strongly a shock in sector j affects policy in sector i through downstream input-output linkages.

Upstream proximity: A measure of how closely a sector supplies inputs to another sector, governing the propagation of demand shocks; demand shocks propagate first upstream (to input suppliers) before feeding back downstream.

Budget neutrality: The property that the optimal 2N policy is self-financing to first order—sales tax revenues exactly fund the production subsidies around the zero-profit steady state—so the fiscal intervention does not require net government expenditure.

Simple 2N rule: A practically implementable approximation to the first-best policy that sets subsidies and taxes proportional to observed sectoral seller-price inflation with response coefficients ϕ_i; converges to the first-best as ϕ_i → ∞ and can be implemented using only the inflation rate of the shocked sector plus network-distance weights from the input-output table.

Distributional Growth Accounting: Education and the Reduction of Global Poverty, 1980–2019

Mon, 01 Jan 0001 00:00:00 +0000

Layer 1 — Core Argument

This paper constructs the first estimates of the aggregate and distributional effects of worldwide educational expansion since 1980 by developing a “distributional growth accounting” framework that isolates the contribution of schooling to economic growth by income group. The framework integrates the canonical labor supply-and-demand model of education and the wage structure (à la Goldin and Katz 2007) with standard growth accounting tools, applied to a new microdatabase covering household surveys in 150 countries and representative of approximately 95% of the world’s population, alongside new country-specific estimates of private returns to primary, secondary, and tertiary schooling. Under conservative assumptions — relying on standard Mincerian returns, assuming capital income is unaffected by schooling, and abstracting from human capital externalities — education can account for approximately 50% of global economic growth, 70% of income gains among the world’s poorest 20% of individuals, and 40% of extreme poverty reduction since 1980; it also explains over 50% of improvements in the share of labor income accruing to women. A key mechanism is imperfect substitutability between skill groups: as educational expansion raises the supply of skilled workers, their relative wage falls, redistributing income toward low-skilled workers and amplifying education’s equalizing effect at the bottom of the distribution — a channel that canonical cross-country growth accounting misses, causing it to underestimate education’s contribution to poverty reduction by a factor of approximately three. Combining these indirect investment benefits from education with direct government redistribution (from a companion paper) brings the total contribution of public policies to extreme poverty reduction to at least 50%.

In depth

Q1. Q: What is distributional growth accounting and how does it differ from standard growth accounting?

A: Standard growth accounting (as in Barro and Lee 2015) combines cross-country data on average years of schooling with a uniform return to derive a counterfactual average income absent educational progress. Distributional growth accounting instead starts from microdata on the joint distribution of income and education within 150 countries, constructs income-group-specific counterfactuals, and accounts for both direct wage effects on individuals whose education changed and general equilibrium supply effects that alter relative wages across all workers. The standard approach is found to underestimate education’s contribution to the poorest 20%’s income growth by a factor of roughly three (23% vs. 71% in the benchmark specification), because cross-country averages cannot accurately locate the world’s poorest individuals and because two key channels — labor income shares being greater at the bottom, and supply-side wage redistribution — are omitted.

Q2. Q: How is the counterfactual world income distribution constructed?

A: In five steps applied to the 150-country microdata. First, education levels are downgraded within each survey until matching the 1980 distribution of educational attainment (using the Barro–Lee database), prioritizing individuals closest to the target level. Second, the earnings of downgraded workers are reduced using the “true” return to schooling, which lies between the initial return (prevailing before expansion, computed from the CES production function using the 2019 elasticity) and the final return observed in 2019 — for plausible parameterizations, the true return weights initial returns at 50–70%. Third, relative wages are adjusted to reflect supply effects: the increase in skilled-worker supply lowers their relative wage by 1/σ log points per log-point increase in relative supply. Fourth, counterfactual labor income is combined with unchanged capital income to yield counterfactual total income. Fifth, the share of actual income growth attributable to education is computed as the gap between the actual and counterfactual growth rates, expressed as a fraction of actual growth.

Q3. Q: What role does imperfect skill substitution play, and how is σ calibrated?

A: Imperfect substitution between skill groups (elasticity σ in a CES production function) is the mechanism through which educational expansion redistributes income. When skilled-worker supply rises, their relative wage falls and low-skilled workers’ relative wage rises, so the income gains from education are shared more broadly than individual returns alone would suggest. With perfect substitutes (σ → ∞), supply effects vanish and education’s distributional impact is determined entirely by who directly received schooling. The elasticity is calibrated from the recent macroeconomics literature; in sensitivity analysis, the paper bounds the contribution of education to the poorest 20%’s income growth between 60% and 90% across plausible values of σ and private returns.

Q4. Q: Why are the estimates described as conservative?

A: Three reasons, each biasing the estimates downward. First, standard Mincerian returns are used, which are systematically lower than causal estimates from natural experiments — a meta-analysis of 15 papers and the paper’s own quasi-experimental validation (India, Indonesia, United States) confirm this; if anything, the framework underestimates schooling’s benefits in those settings. Second, capital income is assumed unaffected by schooling, abstracting from potential effects on capital accumulation and returns. Third, human capital externalities — for which there is now substantial empirical evidence — are ignored entirely. These conservative choices are deliberate; relaxing them would increase all headline estimates.

Q5. Q: How does skill-biased technical change interact with the education contribution?

A: In the CES model, the return to schooling is increasing in the skill bias of technology (AH/AL): a higher skill bias raises the marginal product of skilled workers relative to unskilled, making schooling more profitable. The benchmark counterfactual holds technology fixed at its 2019 value and reduces education to its 1980 level. An alternative counterfactual would hold technology at its 1980 value and increase education to its 2019 level; the difference between these two exercises identifies the contribution of skill-biased technical change in amplifying the benefits of schooling. Because 1980 microdata on the world income distribution are unavailable, this decomposition can only be performed for the subsample of 33 countries with surveys around 2000; for that sample, skill-biased technical change accounts for 20–30% of the income benefits of schooling, meaning education would still have yielded large gains even absent technological progress.

Q6. Q: What do the quasi-experimental validations in India, Indonesia, and the United States show?

A: Three large-scale schooling policy interventions — a school construction program in India (studied in Khanna 2023), Indonesia’s INPRES program (Duflo 2001 and 2004), and US compulsory schooling laws (Acemoglu and Angrist 2000) — are used to externally validate the framework. Using regional variation in exposure to each program and rich microdata on the income distribution, the paper documents two findings: (1) educational expansion had large causal effects on aggregate regional incomes comparable in magnitude to individual returns estimated in the same contexts; and (2) all three policies disproportionately benefited low-income earners, substantially reducing inequality. The distributional growth accounting framework reproduces both findings with “a remarkable degree of accuracy,” and if anything underestimates the benefits of schooling, providing validation of the methodological foundation.

Q7. Q: How does the paper quantify education’s role in gender inequality reduction?

A: The framework is extended to gender by constructing a counterfactual for how large gender labor income gaps would be absent educational improvement since the early 1990s (the period for which female labor income share data are available). The counterfactual accounts for three gender-specific channels: differential educational expansion between men and women, heterogeneous returns to schooling by gender, and differential effects of schooling on female labor force participation. Comparing the counterfactual to actual trends in female labor income shares, education can explain 50–80% of the observed reductions in gender inequality, depending on specification and world region.

Q8. Q: How do public policies as a whole contribute to extreme poverty reduction?

A: The paper’s estimate of education’s indirect investment benefits (40% of extreme poverty reduction) is combined with a companion paper’s (Gethin 2023) estimates of direct government redistribution — cash and in-kind transfers together accounting for approximately 30% of global poverty reduction since 1980, with in-kind transfers alone accounting for approximately 20%. Because the two contributions overlap (e.g., public education spending is both an indirect investment benefit and an in-kind transfer), the combined lower bound is reported as “at least 50%” of extreme poverty reduction attributable to public policies.

Q9. Q: Why does the distributional approach yield such different results from the standard approach for the poorest 20%?

A: Two main reasons. First, cross-country data cannot accurately measure the incomes of the world’s poorest, because the poorest individuals are not all concentrated in the poorest countries — distributional accounting within countries is necessary to locate them precisely. Second, the standard approach misses two progressive channels: (a) labor income shares are higher at the bottom of the income distribution than average, so gains from schooling translate into larger income increases for the poor; and (b) supply effects redistribute schooling gains from high-skilled to low-skilled workers, a mechanism that is entirely absent from cross-country averages but directly captured in the microdata-based counterfactual.

Key Concepts

Distributional growth accounting: A framework, introduced in this paper, that combines a model of education and the wage structure with household microdata to construct income-group-specific counterfactuals, isolating the contribution of human capital accumulation to growth at each point of the income distribution rather than at the national-average level.

True return to schooling (r):* In the CES framework with imperfect skill substitution, the “true” aggregate return to schooling used in the counterfactual lies strictly between the initial return (prevailing before educational expansion, counterfactually higher because skilled-worker supply was lower) and the final return (observed after expansion, lower due to skill-supply pressure). The true return is the return that equates the model’s predicted output loss to the actual output loss from reducing education; for plausible parameters it weights initial returns at 50–70%.

Supply effects (general equilibrium effects of schooling): When the supply of skilled workers rises, their relative wage falls and the relative wage of unskilled workers rises. These wage adjustments are not captured by individual-level Mincerian returns but are modeled via the CES elasticity of substitution σ. Supply effects are central to education’s progressive distributional impact: they compress the skill premium and raise earnings at the bottom of the distribution.

Imperfect substitution between skill groups: The CES production specification in which skilled (H) and unskilled (L) labor are combined with elasticity σ < ∞. This governs the magnitude of general equilibrium wage effects: a lower σ means a larger wage compression per unit of skilled-supply increase, amplifying the redistributive role of education. The paper calibrates σ from the macroeconomics literature and bounds results over plausible ranges.

Skill-biased technical change (SBTC): Technology that raises the marginal product of skilled workers relative to unskilled (captured by the ratio AH/AL in the CES production function). SBTC amplifies returns to schooling; in the subsample of 33 countries with around-2000 surveys, SBTC accounts for 20–30% of schooling’s income benefits, but education would still have generated substantial income gains absent SBTC.

Conservative assumptions (scope condition): All headline quantitative results (50% of aggregate growth, 70% of poorest-20% income gains, 40% of extreme poverty reduction, >50% of gender inequality reduction) are explicitly conditioned on conservative assumptions: Mincerian rather than causal returns, no effect on capital income, and no human capital externalities. The paper argues these assumptions bias all estimates downward.

Summary based on HAL working paper (halshs-04423765v1, Working Paper 2023/25, November 2023). Period covered in working paper text: 1980–2022. AI-assisted, human review pending.

Diversifying Society's Leaders? Determinants and Causal Effects of Admission

Mon, 01 Jan 0001 00:00:00 +0000

This paper studies why children from high-income families are more likely to attend Ivy-Plus colleges (Ivy League, Stanford, MIT, Duke, Chicago — 12 colleges total) and whether attending these colleges causally improves post-college outcomes. The authors construct a de-identified panel dataset linking federal income tax records, Department of Education college attendance data, College Board and ACT test scores, and application and admissions records from several Ivy-Plus and flagship public colleges covering approximately 2.4 million students across entering classes from 1998–2015.

The central finding on the input side is that students from families in the top 1% of the income distribution (income above $611,000) are 2.3 times more likely to attend an Ivy-Plus college than middle-class students (defined as the 70th–80th percentiles of the national parental income distribution, approximately $91,000–$114,000) with comparable SAT/ACT scores. Two-thirds of this gap is attributable to higher admissions rates at Ivy-Plus colleges for high-income applicants; conditional on SAT/ACT scores, top-1% applicants are 58% more likely to be admitted than middle-class applicants. The remaining third splits between differences in application rates (roughly 20% of the total attendance gap) and matriculation rates (roughly 12%). In contrast, admissions rates at flagship public colleges are essentially uncorrelated with parental income conditional on test scores.

Three admissions practices drive the high-income admissions advantage at Ivy-Plus colleges. First, legacy preferences: legacy applicants from the top 1% are admitted at more than five times the rate of non-legacy applicants with comparable test scores, demographics, and admissions ratings; children of alumni of a given Ivy-Plus college are not more likely to be admitted to other Ivy-Plus colleges, confirming that legacy status is not merely a proxy for unobservable credentials. Legacy preferences account for 52 of the estimated 168 “extra” top-1% students per average Ivy-Plus class (enrollment ~1,650). Second, non-academic ratings: students from the top 1% have markedly stronger non-academic credentials (extracurricular activities, leadership ratings) partly because they disproportionately attend private high schools whose students receive higher non-academic ratings despite no higher academic ratings; this accounts for 35 additional extra top-1% students. Third, athletic recruitment: the share of recruited athletes rises from 5% among admitted students from the bottom 60% to 13% among those from the top 1%, accounting for 27 additional extra top-1% students.

On the output side, the authors estimate causal effects of attending an Ivy-Plus college using a new research design based on waitlisted applicants. The key identification assumption is that idiosyncratic variation in admissions decisions across waitlisted applicants at one Ivy-Plus college is uncorrelated with admissions decisions at other Ivy-Plus colleges — which the authors verify empirically. Under this assumption, comparisons of admitted vs. rejected waitlisted applicants identify causal effects for marginal students. The marginal student who attends an Ivy-Plus college instead of the average flagship public is approximately 50% more likely to reach the top 1% of the earnings distribution at age 33, nearly twice as likely to attend a highly-ranked graduate school, and 2.5 times as likely to work at a prestigious firm. Attending an Ivy-Plus college increases mean earnings by $101,000 at age 33 relative to a counterfactual mean of $143,000 at state flagships. Effects are concentrated in the upper tail of earnings — the impact on reaching the top quartile is small and statistically insignificant, while impacts on reaching the top 1% far exceed what a constant percentage treatment effect would predict. Effects are larger for students with weaker fallback options (i.e., whose home-state colleges channel fewer students to the top 1%).

Critically, the three credentials driving the high-income admissions advantage — legacy status, athletic recruitment, and high non-academic ratings — are uncorrelated with or negatively correlated with post-college success once the college attended is held constant. Academic credentials (SAT/ACT scores, academic ratings) remain highly predictive of outcomes.

Counterfactual simulations show that eliminating all three high-income admissions preferences and replacing those slots with students having the same test score distribution would increase enrollment from the bottom 95% of the parental income distribution by 8.8 percentage points — comparable in magnitude to the effect of race-based affirmative action on Black and Hispanic enrollment shares. Such a policy would have small effects on monetary leadership outcomes (e.g., Fortune 500 CEO share from bottom-95% families rises by only 0.4 pp, because Ivy-Plus graduates are a small fraction of all top earners) but larger effects on non-monetary leadership positions: the share of senators from the bottom 95% would rise by 1.7 pp and the share of Supreme Court justices by 5.4 pp. With need-affirmative policies (giving low-income students preferences comparable to those currently given to legacy applicants), the share of Supreme Court justices from families in the bottom 60% would rise by 17.5 pp. These predictions assume that the causal share of Ivy-Plus attendance in explaining observational differences in leadership outcomes is the same as that estimated for early-career outcomes, and they ignore general equilibrium effects.

Q: How much more likely are top-1% students to attend an Ivy-Plus college than middle-class students with the same test scores? A: Students from families in the top 1% (income above $611,000) are 2.3 times more likely to attend an Ivy-Plus college than students from the 70th–80th percentile of the parental income distribution (approximately $91,000–$114,000) with comparable SAT/ACT scores. This “missing middle” pattern is stable across entering classes from 1998 to 2018 and persists after controlling for race and ethnicity.

Q: How is the overall attendance gap decomposed into application, admissions, and matriculation? A: Differences in admissions rates explain two-thirds of the gap in Ivy-Plus attendance between top-1% and middle-class students conditional on test scores. Of the estimated 168 “extra” top-1% students per average Ivy-Plus class, 87 come from higher admissions rates for non-recruited athletes, 27 from athletic recruitment, and the remaining slack from application rate differences (accounting for roughly 20% of the overall attendance gap) and matriculation differences (roughly 12%).

Q: How large is the admissions advantage for top-1% applicants at Ivy-Plus colleges? A: Conditional on SAT/ACT scores, applicants from the top 1% are 58% more likely to be admitted to Ivy-Plus colleges than middle-class applicants. Students from the top 0.1% are 2.5 times more likely to be admitted than middle-class applicants with comparable test scores. At flagship public colleges, admissions rates are essentially constant across the income distribution conditional on test scores.

Q: What is the magnitude of legacy preferences and how is it established that legacy is not just a proxy for other credentials? A: Legacy applicants from the top 1% are admitted at more than five times the rate of otherwise comparable non-legacy applicants at the college their parents attended. The paper isolates the legacy effect by showing that children of alumni at a given Ivy-Plus college are only slightly more likely to be admitted at other Ivy-Plus colleges — and the predicted counterfactual admissions rate for legacy students at other colleges closely matches their actual admissions rate — confirming that legacy status is not merely a proxy for other unobservable credentials. Legacy applicants constitute 2.5% of the overall applicant pool but over 9% of top-1% applicants.

Q: How do non-academic credentials differ by parental income, and what drives the difference? A: Top-1% applicants have markedly stronger non-academic ratings (measuring extracurricular participation and leadership traits) compared with other applicants, while the share achieving high academic ratings is essentially constant across the income distribution. Students from the top 1% are much more likely to have attended private high schools, whose applicants receive substantially higher non-academic ratings than students from public high schools with the same SAT/ACT scores. Non-academic ratings account for 35 of the estimated 168 extra top-1% students per Ivy-Plus class.

Q: What is the research design for estimating causal effects, and what is the key identification assumption? A: The authors focus on applicants who are waitlisted at a given Ivy-Plus college and compare those ultimately admitted versus rejected from the waitlist. The key identification assumption is that if different colleges’ admissions committees make correlated assessments of underlying student merit but uncorrelated idiosyncratic admissions errors, then residual variation in admissions outcomes for waitlisted applicants at one college is orthogonal to students’ long-run potential. The authors validate this empirically by showing that waitlist admission at one Ivy-Plus college is uncorrelated with admissions decisions and internal ratings at other Ivy-Plus colleges.

Q: What are the causal effects of attending an Ivy-Plus college on post-college outcomes? A: For the marginal student (one who attends an Ivy-Plus college instead of the average flagship public), attending an Ivy-Plus college increases the probability of reaching the top 1% of the earnings distribution at age 33 by approximately 50%, nearly doubles the probability of attending an elite graduate school, and increases the probability of working at a prestigious firm by approximately 2.5 times. Mean earnings at age 33 increase by $101,000 (relative to a counterfactual mean of $143,000 at state flagships). Effects on reaching the top quartile of earnings are small and statistically insignificant, while effects at the very top tail are disproportionately large.

Q: Why do the findings differ from Dale and Krueger (2002) and related studies finding little effect of selective college attendance on earnings? A: The authors replicate the matriculation design of Dale and Krueger (comparing outcomes conditional on the set of colleges to which students were admitted) and obtain estimates statistically indistinguishable from their waitlist design — the research designs are not the source of disagreement. Instead, the differences arise because (1) the authors have direct college fixed effects rather than relying on average test scores as a proxy for college quality, and (2) the authors focus on upper-tail outcomes (top 1% earnings, elite graduate schools, prestigious firms) rather than log mean earnings, where Ivy-Plus colleges have their largest effects.

Q: Are the credentials that drive the high-income admissions advantage — legacy, athlete status, high non-academic ratings — predictive of better post-college outcomes? A: No. Recruited athletes, students with higher non-academic ratings, and legacy students have equivalent or lower chances of reaching the upper tail of the income distribution, attending an elite graduate school, or working at a prestigious firm than comparable Ivy-Plus applicants once the college attended is held constant. By contrast, SAT/ACT scores and academic ratings are highly positively predictive of all three post-college outcome measures.

Q: How much could changing admissions practices diversify Ivy-Plus enrollment and subsequently society’s leadership? A: Eliminating legacy preferences, non-academic rating weights, and the differential recruitment of high-income athletes — and filling those slots with students having the same test score distribution as the current class — would increase enrollment from families in the bottom 95% of the parental income distribution by 8.8 percentage points, a magnitude comparable to race-based affirmative action’s effect on Black and Hispanic enrollment shares. For leadership positions, predicted effects are small for monetary outcomes (Fortune 500 CEOs from the bottom 95% would increase by only 0.4 pp) but larger for positions where Ivy-Plus graduates are a larger share: senators from the bottom 95% would increase by 1.7 pp and Supreme Court justices by 5.4 pp. A stronger need-affirmative policy (giving low-income students preferences equivalent to current legacy preferences) would increase the share of Supreme Court justices from the bottom 60% by 17.5 pp.

Q: How are “elite” and “prestigious” employers defined in this study? A: Elite firms are defined as those that disproportionately employ Ivy-Plus graduates relative to flagship public graduates, pulling firms from the top of that ratio ranking until 25% of Ivy-Plus attendee employment is accounted for. Prestigious employers are defined by the residual of that ratio after controlling for the firm’s predicted top-1% income probability — they are firms that disproportionately employ Ivy-Plus graduates conditional on their salaries, capturing high-status jobs that do not necessarily lead to the highest earnings. The paper validates this algorithmic approach against external rankings (Vault.com for law and consulting firms; Scimagoir for hospitals), finding substantial overlap.

Q: How are treatment effect estimates adjusted for heterogeneity in students’ fallback options? A: Causal effects of Ivy-Plus attendance are much larger for students with weaker fallback options — specifically, students whose home-state flagship colleges channel fewer students to the top 1% of earnings. The authors exploit this heterogeneity to estimate the treatment effect for the marginal student who actually switches from a flagship public to an Ivy-Plus college. This heterogeneity also implies that the average causal effect across all admitted students may differ from the effect for the marginal admitted student.

Q: What share of the overrepresentation of top-1% families at Ivy-Plus colleges is attributable to pre-application factors versus admissions practices? A: Of the 245 “extra” top-1% students in an average Ivy-Plus class relative to an unconditionally income-neutral benchmark, 77 (31%) are attributable to the higher test scores of top-1% students (a pre-application factor). The remaining 168 (69%) reflect higher attendance rates conditional on test scores, of which the large majority is attributable to admissions practices (legacy, non-academic ratings, athletic recruitment) rather than application or matriculation rate differences.

Ivy-Plus colleges: The twelve highly selective private colleges comprising the eight Ivy League institutions plus Stanford, MIT, Duke, and the University of Chicago — the focus group of the study, which together account for more than 10% of Fortune 500 CEOs, a quarter of U.S. senators, and three-fourths of Supreme Court justices appointed in the last half century despite enrolling less than 0.5% of Americans.

Missing middle: The pattern by which attendance rates at Ivy-Plus colleges conditional on SAT/ACT scores are lowest for students from the middle class (70th–80th percentile of the parental income distribution, approximately $91,000–$114,000) — lower than both the top 1% and, slightly, the bottom 40% — producing a non-monotone income gradient in attendance.

Legacy preference: An admissions advantage given to applicants whose parent(s) obtained an undergraduate degree from the college to which the student is applying. In the paper’s data, legacy applicants from the top 1% are admitted at more than five times the rate of non-legacy applicants with comparable test scores, demographics, and admissions ratings; the preference is college-specific (children of alumni are only slightly more likely to be admitted at other Ivy-Plus colleges).

Waitlist research design: The paper’s primary identification strategy for causal effects, which exploits idiosyncratic variation in admissions decisions among waitlisted applicants. The design’s validity rests on the empirical finding that waitlist admissions at one Ivy-Plus college are uncorrelated with admissions decisions and internal ratings at other Ivy-Plus colleges, implying that residual variation conditional on being on the waitlist is orthogonal to students’ long-run potential outcomes.

Prestigious employers: Firms defined by the paper’s algorithm as disproportionately employing Ivy-Plus graduates conditional on those firms’ predicted top-1% income probability — capturing high-status employment that does not necessarily lead to the highest earnings (e.g., prominent law firms, consulting firms, elite hospitals). Validated against external rankings (Vault.com, Scimagoir).

Non-academic ratings: Numerical scores assigned by admissions officers measuring aspects of an application outside academic achievement, such as extracurricular activities and leadership traits. In the paper’s data, non-academic ratings differ substantially by parental income — particularly because top-1% applicants disproportionately attend private high schools whose students receive higher non-academic ratings — while academic ratings do not differ across the income distribution.

Surrogate index: A prediction of later earnings outcomes (specifically, probability of reaching the top 1% at age 33 and mean income rank) constructed from individuals’ graduate school attendance and employer fixed effects at ages 22–25, used to extend the outcome window for cohorts observed only early in their careers. The approach follows the terminology and methodology of Athey et al. (2019).

Do Credit Conditions Move House Prices?

Mon, 01 Jan 0001 00:00:00 +0000

Overview

Research Question. To what extent did an expansion and contraction of credit drive the 2000s housing boom and bust? The existing literature offers sharply divergent answers — ranging from credit explaining virtually none of the boom (Kaplan, Mitman, and Violante 2020) to credit explaining the majority of it (Favilukis, Ludvigson, and Van Nieuwerburgh 2017, who find credit alone explains 60% of the rise in price-to-rent ratios). Greenwald and Guren argue that the source of these divergent findings is a single structural assumption: the degree to which credit-insensitive agents (landlords and unconstrained savers) can absorb credit-driven demand for housing, which in turn depends on the degree of segmentation between the owner-occupied and rental housing markets.

Key Mechanism. The paper organizes the literature around a “tenure supply” curve, defined in price-rent ratio versus homeownership rate space. A perfectly inelastic (vertical) supply curve — corresponding to perfect segmentation, in which housing cannot move between the owner-occupied and rental sectors — implies that credit expansion bids up house prices with no change in the homeownership rate. A perfectly elastic (horizontal) supply curve — corresponding to a frictionless rental market with deep-pocketed landlords who price at the present value of rents — implies that credit expansion raises the homeownership rate but not the price-rent ratio, because landlord reservation prices are unaffected by credit. Intermediate degrees of segmentation produce intermediate outcomes: credit raises both the price-rent ratio and the homeownership rate, with the relative magnitudes determined by the slope of the tenure supply curve.

Empirical Strategy. To measure where reality falls on this spectrum, the authors estimate the relative elasticity of the price-rent ratio to an identified credit supply shock, compared to the elasticity of the homeownership rate to the same shock. This ratio is a sufficient statistic for the slope of the tenure supply curve. They use three distinct identification strategies from prior literature — (1) Loutskina and Strahan (2015), instrumenting for local credit supply using differential city-level exposure to changes in the conforming loan limit (CLL); (2) Di Maggio and Kermani (2017), exploiting the 2004 OCC preemption of state anti-predatory-lending laws for national banks; and (3) Mian and Sufi (2019), using differential city-level exposure to the 2003 private label securitization (PLS) expansion through bank funding composition. Regressions are estimated on annual CBSA-level panels using local projection IV (LP-IV) or event-study reduced-form methods. Key data include the CoreLogic repeat-sales house price index, the CBRE Torto-Wheaton same-store rent index (a repeat-rent index for multi-unit apartment buildings, constructed from newly-leased units), and Census Housing Vacancy Survey homeownership rates.

Main Empirical Findings. All three instruments consistently find that credit supply shocks generate a significant increase in house prices and the price-rent ratio but a much smaller, rarely statistically significant, effect on the homeownership rate. Under the LS LP-IV, the price-rent ratio peaks at an increase of 0.471, while the homeownership rate response reaches only 0.037 at the 2-year horizon and peaks at 0.101 after 5 years. The ratio of price-rent to homeownership responses ranges from 3 to infinity across the three instruments and horizons. These estimates imply a substantial degree of segmentation — the no-segmentation model falls far outside the 95% confidence intervals at all horizons.

Structural Model and Calibration. The authors construct a general equilibrium model featuring a representative borrower, landlord, and saver, with long-term fixed-rate mortgages subject to loan-to-value (LTV) and payment-to-income (PTI) limits following Greenwald (2018). The key modeling innovation is within-type heterogeneity in the benefit of owning versus renting, captured by logistic distributions for both borrowers and landlords. The dispersion parameter of the landlord distribution (σω,L) governs the slope of the tenure supply curve and is calibrated to minimize weighted distance to the LS empirical impulse responses. The resulting benchmark calibration yields σω,L = 2.877, with the benchmark model’s price-rent-to-homeownership ratio between 6.98 and 9.31 depending on the horizon — consistent with the empirical estimates.

Quantitative Results on the 2000s Boom. The paper then uses the calibrated model to simulate a credit standard relaxation (LTV limits relaxed from 85% to 99%, PTI limits from 36% to 65%) from 1998 Q1 through 2007 Q1, with a reversion at the start of the bust. This credit relaxation alone explains 34% of the peak rise in price-rent ratios observed in the boom, with a lower bound of 26% accounting for parameter uncertainty. In contrast, the no-segmentation model explains -1%, while the full segmentation model explains 38%. Adding a 2 percentage point permanent decline in mortgage spreads alongside the credit standard relaxation allows the benchmark model to explain 72% of the observed rise in price-rent ratios and 80% of the rise in loan-to-income ratios, compared to only 4% in the no-segmentation model. In a “full boom” scenario where additional demand and supply shocks are added to match the entire boom in price-rent ratios and homeownership, removing the credit relaxation reduces the rise in price-rent ratios by 55% in the benchmark economy — larger than the 34% explained in isolation due to nonlinear interactions — compared to only 5% in the no-segmentation economy.

Scope Conditions and Extensions. These results apply to the benchmark calibration in which landlords do not use credit and saver housing demand is fixed. When landlords are allowed to use credit (LTV limit of 65% relaxed to 85% during the boom), the role of credit is strengthened: the recalibrated model explains 80% of the rise in price-rent ratios from combined credit and rate changes, suggesting the benchmark is a lower bound. When savers are allowed to frictionlessly trade housing with borrowers, credit explains 54% of the rise in price-rent ratios even after recalibration — a roughly 25% reduction relative to the benchmark 72%, representing what the authors characterize as an extreme lower bound given that saver housing markets are in practice substantially segmented due to indivisibility, quality, and location differences.

Policy Implications. The findings imply that macroprudential policies tightening LTV and PTI ratios can be effective at restraining house price growth, but only in the presence of the significant rental market segmentation found in the benchmark economy. In the no-segmentation economy, removing the credit relaxation from the full boom reduces price-rent ratio growth by only 5%.

In depth

Q1. What is the core theoretical insight that reconciles the divergent findings in the prior literature on credit and house prices?

The key difference is the degree to which credit-insensitive agents — specifically landlords and unconstrained savers — can absorb credit-driven demand for housing. Models with perfectly segmented rental markets (no rental sector or fixed homeownership rate) feature borrowers competing only with each other for a fixed stock, so credit expansion bids up prices. Models with frictionless rental markets feature deep-pocketed landlords who supply housing at a price equal to the present value of rents, which is unaffected by credit; credit expansion then raises the homeownership rate rather than prices. Intermediate degrees of frictions produce intermediate outcomes. This mechanism had not been recognized as the source of the literature’s divergence before this paper.

Q2. What is the “tenure supply curve” and why is its slope the key empirical object?

The tenure supply curve describes the menu of price-rent ratios at which landlords are willing to supply varying amounts of owner-occupied housing (given total housing stock), traced out in price-rent ratio versus homeownership rate space. Its slope determines how the equilibrium responds to a credit-induced demand shift: a steep (inelastic) supply curve translates credit expansion primarily into price-rent ratio increases; a flat (elastic) supply curve translates it primarily into homeownership rate increases. Identifying this slope empirically is therefore sufficient to discipline any macro-housing model’s predictions about the role of credit in price dynamics, for arbitrary underlying shocks.

Q3. How do the authors identify the slope of the tenure supply curve empirically?

They estimate the slope as the ratio of the causal elasticity of the price-rent ratio to that of the homeownership rate, with respect to an identified credit supply shock. Three instruments are used: (1) the Loutskina-Strahan shift-share instrument based on differential exposure to changes in the conforming loan limit, estimated by LP-IV on an unbalanced panel of 62 CBSAs from 1992 to 2016; (2) the Di Maggio-Kermani event study based on the 2004 OCC preemption of state anti-predatory-lending laws, covering 262 CBSAs for house prices and 82 CBSAs for homeownership from 2001 to 2010; and (3) the Mian-Sufi event study based on differential exposure to the 2003 PLS expansion via non-core deposit share, covering 245 CBSAs using ACS and FHFA data. In practice, they estimate the inverse slope (ratio of homeownership to price-rent response) because the first stage is far stronger using price-rent ratios as the endogenous variable.

Q4. What are the empirical results on the relative price-rent and homeownership responses?

Across all three instruments, credit supply shocks significantly raise the price-rent ratio but have a much smaller, rarely statistically significant effect on the homeownership rate. Under the LS LP-IV, the price-rent ratio peaks at 0.471 after 2 years, while the homeownership rate reaches only 0.037 at 2 years and peaks at 0.101 at 5 years. The naive point-estimate ratios range from 2.93 to 12.83 at horizons 2 through 5, with the 4-year estimate negative (implying an infinite slope). The directly estimated inverse slope coefficients are small (0.05 to 0.24) and never statistically different from zero. The DK instrument yields slopes of 6.72 in 2005, 3.67 in 2006, and 3.40 in 2007. The MS instrument yields a slope of approximately 4.49 in both 2006 and 2007. The lower bound of the 95% confidence intervals corresponds to slopes of at least 1.8 to 8.4.

Q5. What is the key modeling contribution on the structural side?

The key innovation is the introduction of within-type heterogeneity in ownership preferences for both borrowers and landlords, modeled as logistic distributions. This heterogeneity allows the model to generate a fractional and time-varying homeownership rate — a feature absent from most prior macro-housing models — and maps directly into the slopes of the demand and tenure supply curves. The dispersion in landlord ownership costs (σω,L) governs the supply curve slope and is calibrated to match the empirical impulse responses. Without this heterogeneity, the model would produce corner solutions with all housing owned by one type.

Q6. How is the landlord dispersion parameter σω,L calibrated, and what is the estimated value?

The calibration minimizes a weighted sum of squared deviations between model and data impulse responses for the price-rent ratio and homeownership rate, using the LS LP-IV estimates. Deviations are weighted by the inverse of empirical standard errors. Because model impulse responses jump on impact while empirical responses are hump-shaped (due to search frictions), the calibration uses only horizons 2 through 5 years. The minimum-distance estimate yields σω,L = 2.877, alongside a mortgage spread shock persistence of 0.965 and a shock size of -0.041 (corresponding to an annualized CLL subsidy of approximately 17 basis points, within the 10-24bp range found in prior literature). The benchmark model’s implied price-rent-to-homeownership response ratio ranges from 6.98 to 9.31, consistent with the empirical estimates.

Q7. What lower bound does the paper derive for σω,L, and how does the no-segmentation model compare?

A credible set for σω,L is derived by targeting the upper and lower bounds of the 95% confidence interval for the estimated inverse slope. The lower bound for σω,L (targeting the top of the confidence interval) is 0.810; the lower bound targets the bottom of the confidence interval but is best matched by the full segmentation case (σω,L → ∞). The no-segmentation economy (σω,L = 0) produces inverse ratios between 4 and 32 times the empirical upper bound, placing it far outside the credible set.

Q8. What is the model’s quantitative finding on the role of credit standard relaxation in isolation?

A credit standard relaxation (LTV from 85% to 99%, PTI from 36% to 65%) implemented from 1998 Q1 to 2007 Q1 and then reverted explains 34% of the peak rise in price-rent ratios in the benchmark model, with a lower bound of 26% conditional on parameter uncertainty. In the full segmentation model, the same relaxation explains 38%, while in the no-segmentation model it explains -1%. Credit standard relaxation also explains 51% of the rise in loan-to-income ratios in the benchmark, compared to 31% in the no-segmentation model.

Q9. What does adding a decline in mortgage rates contribute?

Adding a permanent 2 percentage point decline in mortgage spreads alongside the credit standard relaxation increases the benchmark model’s explained share of the price-rent ratio boom from 34% to 72%, and the loan-to-income ratio share from 51% to 80%. The no-segmentation model explains only 4% of the price-rent ratio boom and 38% of the loan-to-income ratio boom under the same combined experiment.

Q10. How does the “full boom” counterfactual estimate the marginal contribution of credit?

The full boom experiment adds exogenous demand shocks (shifts to µω,B) and supply shocks (shifts to µω,L) on top of the credit relaxation and rate decline, calibrated to exactly reproduce the observed peak increase in both the price-rent ratio and the homeownership rate during the boom. Removing the credit relaxation from this full boom scenario reduces the rise in price-rent ratios by 55% and the rise in loan-to-income ratios by 74% in the benchmark economy. This exceeds the 34% figure from the credit-alone experiment due to strong nonlinear interactions: without the credit relaxation, binding PTI limits constrain households’ ability to finance properties even when ownership preferences rise, dampening both price and credit growth. In the no-segmentation economy, removing the credit relaxation reduces price-rent ratio growth by only 5%.

Q11. What are the implications of allowing landlords to use credit?

When landlords face an LTV limit of 65% relaxed to 85% during the boom, the credit expansion also shifts the tenure supply curve upward (as in Panel (d) of the supply-demand framework), leading to a larger price-rent ratio response and a smaller homeownership rate response than in the baseline. Without recalibration, this model explains 81% of the price-rent ratio rise. After recalibration of σω,L (which is required because landlord credit changes the mapping from empirical moments to structural parameters), the model explains 80% of the price-rent ratio rise. This implies the benchmark results are a lower bound on the role of credit in driving house prices.

Q12. What are the implications of allowing savers to frictionlessly trade housing with borrowers?

When savers are allowed to frictionlessly adjust their housing demand (purchasing housing from or selling to borrowers as credit conditions change), the price-rent ratio response is dampened because savers absorb excess borrower demand. After recalibrating σω,L, the combined credit-and-rate experiment explains 54% of the price-rent ratio boom — roughly 25% less than the benchmark 72%. The authors regard this as an extreme lower bound because in practice saver and borrower housing markets are substantially segmented due to indivisibility, location, and quality differences.

Q13. What are the implications for macroprudential policy?

Macroprudential policies that tighten LTV and PTI limits are effective at slowing house price growth in the benchmark economy, where rental market frictions are substantial. In the full boom counterfactual, tightening credit standards reduces the rise in price-rent ratios by 55%. However, in the no-segmentation economy, the same tightening reduces price-rent ratio growth by only 5%, because landlords readily absorb credit-driven demand and pin prices to the present value of rents. The effectiveness of macroprudential policies is therefore deeply dependent on the degree of rental market segmentation.

Q14. Why do the authors prefer the CBRE Torto-Wheaton rent index over typical rent measures?

The TW index uses a repeat-rent methodology on newly-leased multi-unit apartments, which better captures current market conditions than median rent measures, which are biased by composition changes and are sticky due to long-term lease contracts. Since the price-rent ratio is meant to capture the rent a unit could command if leased instead of sold, newly-leased apartment rents are more appropriate for constructing this ratio. The TW index is available for 53 CBSAs from 1989 and 62 CBSAs from 1994.

Q15. Why do the authors estimate the inverse slope rather than the slope directly?

The first stage for the homeownership rate response is very weak — the estimated coefficients are small and imprecise, so using the homeownership rate as an endogenous variable would suffer severe weak instrument problems. Instead, the authors use the price-rent ratio as the endogenous variable (with a much stronger first stage) and the homeownership rate as the outcome, obtaining the inverse slope (homeownership response per unit price-rent ratio response). The upper bounds of the 95% confidence intervals for the inverse slope range from 0.12 to 0.56 across horizons, corresponding to lower bounds on the slope of 1.8 to 8.4.

Key Concepts

Tenure Supply Curve. The menu of price-rent ratios at which landlords are willing to supply varying quantities of owner-occupied housing (i.e., sell rental units to potential homeowners) at a given total housing stock. Defined in price-rent ratio versus homeownership rate space. Distinct from the absolute supply of housing via the construction sector; shifts in the construction margin affect absolute quantities and prices but not necessarily the price-rent ratio or the ownership share. The slope of this curve — not the level — is the central empirical and structural object of the paper.

Market Segmentation (in the paper’s sense). The degree to which credit-insensitive agents (landlords, unconstrained savers) cannot absorb credit-driven demand from constrained borrowers. Perfect segmentation means owner-occupied and rental housing are entirely non-fungible, so all credit-driven demand falls on a fixed supply of owned units. Zero segmentation means landlords (or savers) can frictionlessly convert between owned and rented housing at a price tied to present discounted rents. In this paper, segmentation is measured continuously by the slope of the tenure supply curve.

Sufficient Statistic (for segmentation). The ratio of the causal elasticity of the price-rent ratio to the causal elasticity of the homeownership rate, both with respect to the same identified credit supply shock. This ratio identifies the slope of the tenure supply curve and is sufficient to calibrate a structural model to recover the role of credit in driving house prices for arbitrary combinations of shocks, even when those shocks differ from the identifying variation.

Ownership Benefit Heterogeneity. An additional idiosyncratic utility flow (positive or negative) that borrowers or landlords receive from owning versus renting a given unit, modeled as a logistic distribution. This within-type heterogeneity generates a fractional and time-varying homeownership rate in the model and maps directly into the slope of the demand and tenure supply curves. The dispersion parameter σω,L for landlords governs the slope of the tenure supply curve; higher dispersion implies a steeper (more segmented) supply curve and larger price-rent ratio responses to credit shocks.

Marginal Collateral Value (CB,t). The shadow value to borrowers of the additional credit that can be collateralized by an additional dollar of housing value, equal to µB,t × FLTV × θLTV in the model. A relaxation of credit standards (raising θLTV or θPTI) or a decline in credit costs raises CB,t, increasing borrower reservation prices and shifting the housing demand curve outward. This is the channel through which credit conditions enter house price dynamics.

Local Projection IV (LP-IV). A generalization of Jordà (2005) local projections to instrumental variables settings, as in Ramey (2016) and Ramey and Zubairy (2018), extended to a panel context with CBSA and time fixed effects. Used to estimate impulse responses of price-rent ratios, house prices, and homeownership rates to credit supply shocks at horizons 0 through 5 years, instrumenting for endogenous credit growth using the conforming loan limit shift-share instrument.

Conforming Loan Limit (CLL) Instrument. A shift-share instrument for local credit supply constructed by interacting the share of mortgage originations in the prior year falling within 5% of the current year’s CLL with the percentage change in the national CLL. Cities where a larger fraction of loans cluster near the CLL threshold experience a larger credit supply shock when the CLL increases, because more loans shift from unsubsidized to GSE-subsidized rates. The instrument is constructed using the change in the national CLL only to avoid endogeneity from high-cost area adjustments.

Do Financial Concerns Make Workers Less Productive?

Mon, 01 Jan 0001 00:00:00 +0000

Do Financial Concerns Make Workers Less Productive?

Research Question

The paper tests whether financial concerns distract workers sufficiently to meaningfully reduce their productivity, and whether receiving cash — by alleviating those concerns — can raise output even when total compensation is held fixed.

Setting and Sample

The experiment involves 408 low-income male agricultural casual laborers in rural Odisha, India, recruited from 47 villages across five worksites in four districts. The study takes place during the lean agricultural season (March–June 2017 and 2018), when formal employment is scarce (workers found paid wage work on only 1.9 days per week on average). During this period, 86% of workers reported being “worried” or “very worried” about their finances, 68–71% carried outstanding loans, and 64–66% said they would have difficulty coming up with Rs. 1,000 (roughly four days of wages) in an emergency. Workers bring these burdens to the job: on a given day, approximately one in two workers reported thinking about financial worries while working.

Experimental Design

Workers were employed for twelve days in a piece-rate manufacturing task — stitching sal tree leaves into disposable plates for restaurants. The payment-timing manipulation is the core of the identification strategy. Control workers received all accrued earnings as a lump sum on the final day (day 12). Treatment workers received their earnings in two installments: an interim payment of earnings to date on day 8 or 9 (randomly staggered across waves), with the balance paid on day 12. Total compensation was held constant across groups; only the timing of receipt differed. On day 5 (the “announcement day”), each worker learned his payment schedule individually. The design thus separates the announcement period (days 5 through the interim payment day, when workers know their schedule but have not yet received cash) from the post-pay period (days after the interim payment until the contract end). This enables the authors to test whether productivity effects arise from information about impending cash, or only once cash is physically in hand.

First Stage: Effects on Financial Strain

Within three days of receiving the interim payment, treated workers increased loan repayments by Rs. 271, a 287% increase relative to the control group mean (p < 0.001), and were 40 percentage points (222%) more likely to repay any loan (p < 0.001). The majority of repayments occurred on the same evening as the cash disbursement — a 746% single-day increase in loan payments. Household expenditures on food, clothing, and essentials rose by 40% (Rs. 150) over three days (p < 0.001). Treatment workers also reported feeling more focused on the work task (11.5 percentage points more likely, p = 0.032) and were less likely to report thinking about financial worries while making plates (13.7 percentage points, p = 0.044).

Main Productivity Results

In the post-pay period, treated workers increased output by 0.109 SD (6.9%) relative to the control group (p = 0.020). No treatment effect emerged during the announcement period (0.014 SD, p = 0.685); the post-pay and announcement-period effects are statistically distinguishable (p = 0.008). Because work hours are fixed and daily attendance is 98.3% with no treatment effect on attendance, these gains reflect improvements in how quickly workers produce plates per hour of work.

Effects are concentrated among workers with below-median baseline wealth (fewer assets, less liquidity): for this subgroup, the interim payment increases output by 0.204 SD (13.0%, p = 0.003). For workers with above-median wealth, the effect is close to zero and statistically insignificant (p = 0.819).

Attentiveness Results

Beyond total output, the authors measure attentiveness through three markers embedded in the finished plates: the number of “double holes” (paired stitching holes indicating a removed mistaken stitch), the number of leaves used, and the number of stitches used. These measures are collected unbeknownst to workers and combined into an “attentiveness index.” After receiving the interim payment, treated workers’ attentiveness index increased by 0.077 SD across all workers (p = 0.092); among poorer workers, attentiveness increased by 0.17 SD (p = 0.041). This improvement occurred simultaneously with higher output speed — workers were producing plates faster while also making fewer mistakes, suggesting improved cognitive engagement rather than mere effort intensification.

Piece-Rate Comparison

In separate supplementary rounds with 150 experienced workers, the authors varied piece rates (Rs. 2, 3, or 4) while holding overall earnings constant. Each one-rupee increase in the piece rate raised output by 0.020 SD (p = 0.042). Critically, piece-rate increases produced no detectable change in the attentiveness index (point estimate negative, statistically insignificant), and the piece-rate effect on output differs significantly from the attentiveness effect (p = 0.001). This indicates that consciou effort and automatic attentiveness can move independently: higher incentives increase pace but do not reduce attentional lapses, whereas financial relief increases both pace and attentiveness.

Alternative Explanations Ruled Out

The authors systematically address gift exchange/fairness, trust, nutrition, and sleep. Fairness and gift-exchange stories are inconsistent with: (i) no detectable announcement-period effect; (ii) no decline in control-worker effort when treatment workers are paid before them; (iii) the pattern of effects being concentrated among poorer workers; and (iv) attentiveness being affected when it is not a sanctioned quality dimension for payment. Nutritional channels are inconsistent with overnight effect onset (nutritional stock changes are too slow biologically), no treatment effect on breakfast consumption patterns, and productivity effects persisting through the end of each workday. Sleep channels are inconsistent with no treatment effect on hours or quality of sleep.

Scope Conditions and Implications

The effect operates through the actual arrival of cash, not its anticipation, consistent with a model in which automatic cognitive inputs — unlike consciously chosen effort — respond to current financial strain rather than expected future income. Effects are concentrated among more financially constrained workers within an already-poor sample. The authors do not identify the specific psychological mechanism (worry, anxiety, affect, or rumination) but interpret results as evidence that financial strain, at least partly through psychological channels, reduces earnings exactly when money is most needed.

In depth

Q1. Why does the experiment focus on payment timing rather than an outright transfer of additional money?

Varying only payment timing — not total pay — holds constant both the piece-rate incentive and total wealth across treatment and control. An outright cash transfer would raise total lifetime income, potentially reducing effort through a neoclassical income effect (more lifetime wealth lowers the marginal utility of current consumption). By holding total compensation fixed and only shifting when it arrives, the design isolates the effect of financial strain per se, separable from any wealth or incentive effect.

Q2. Why is there no treatment effect during the announcement period, and why does this matter?

Between day 5 (when workers learn their payment schedule) and the interim payment date, treated workers know cash is coming but have not yet received it. Output in this window shows no treatment effect (0.014 SD, p = 0.685), and the announcement effect is significantly smaller than the post-pay effect (p = 0.008). This matters because it rules out mechanisms that should operate on information alone — including gift exchange, trust updating, or effort responses to higher discounted expected income — and is consistent with a model in which financial strain falls only when cash is physically received (e.g., moneylenders do not relent until the loan is actually repaid).

Q3. What is the attentiveness index and how was it constructed?

The attentiveness index averages three plate-level markers: (i) number of “double holes” — pairs of stitching holes indicating a mistaken stitch was removed; (ii) number of leaves used; and (iii) number of stitches used. Each component was normalized using the control group’s post-pay mean and standard deviation, then averaged and reverse-coded so that higher values denote better attentiveness (fewer mistakes, fewer leaves, fewer stitches). Workers were unaware these dimensions were being measured. The index thus captures the number of unforced steps a worker took to complete a plate — a behavioral trace of cognitive lapses.

Q4. How do the piece-rate rounds demonstrate that effort and attentiveness are separable?

In supplementary rounds (150 workers, 2019), piece rates were experimentally varied among Rs. 2, 3, and 4 per plate with the base wage adjusted to hold total earnings constant, so financial strain was unchanged. A one-rupee increase in the piece rate raises output by 0.020 SD (p = 0.042), consistent with a standard effort response. The same increase produces no discernible change in the attentiveness index (point estimate: negative but not significant), and the output and attentiveness effects are significantly different from each other (p = 0.001). This shows that workers can speed up via conscious effort without reducing attentional lapses, whereas the cash infusion raises both pace and attentiveness simultaneously — a pattern inconsistent with pure motivation as the mechanism.

Q5. What does the staggered timing within the treatment group (Wave A vs. Wave B) contribute to identification?

Treatment workers were randomized to receive their interim payment on day 8 (Wave A) or day 9 (Wave B). On day 9, Wave B workers have not yet been paid while Wave A workers have. If fairness concerns drove control workers to reduce effort upon seeing colleagues paid first, control workers on day 9 — having observed Wave A payments the evening before — should work less hard relative to Wave B treatment workers (who have also not yet been paid). The authors find no such pattern: the triple interaction (Cash × Payment Day × Wave B) is close to zero and insignificant, ruling out effort reductions from seeing peers paid earlier.

Q6. What are the magnitudes and timing of the spending response to the cash infusion?

Within three days of the interim payment, treatment workers spent Rs. 900 in total — roughly two-thirds of the average interim payment of over Rs. 1,400. On the day of the payment itself, loan repayments rose by Rs. 169 (746% increase), and household expenditures rose by Rs. 70 (68% increase). Over three days, loan repayments increased by Rs. 271 (287%), the probability of repaying any loan rose by 40 percentage points (222%), and total household spending rose by 65% (Rs. 371). These patterns indicate that the two main sources of financial stress cited by workers — outstanding debt and inability to meet household essentials — were directly addressed, suggesting a meaningful reduction in financial strain.

Q7. Why are the productivity effects concentrated among poorer workers, and what are the two interpretations?

Workers with below-median baseline wealth (fewer assets, lower liquidity) show a 0.204 SD (13.0%) productivity gain, while workers above the median wealth threshold show essentially no effect. The authors offer two interpretations. First, poorer workers may start from a higher level of financial strain, giving the intervention more scope to reduce it. Second, since all workers in the sample are objectively poor and report similar baseline financial worries and loan levels, the more likely explanation is that the interim payment is larger relative to the wealth and income buffer of poorer workers, making the same nominal cash infusion more meaningful for them. Both richer and poorer workers in the sample use the interim payment to repay loans and cover household needs.

Q8. How do the authors rule out nutritional channels?

Two tests address nutrition. First, workers were not at subsistence — 94% reported missing no meals the prior week — and increased food spending cannot change the nutritional stock overnight (the medical literature indicates nutritional-stock effects on cognition operate over longer time horizons). Second, and more precisely, all food consumed at the worksite during the workday was provided by the researchers, so differential pre-worksite breakfast consumption is the only plausible same-day biological channel. The authors find no treatment effect on breakfast consumption (whether workers had breakfast, how much, or what they ate). Further, if blood sugar or satiety drove effects, they should attenuate over the workday as all workers are given the same afternoon meal; instead, treatment effects persist and if anything increase through the final hours of the workday.

Q9. What does the self-report evidence on focus and worry show, and why is it treated as suggestive rather than primary?

Two days after the interim payment, workers were asked an open-ended question about what they were thinking about while working. Treatment workers were 11.5 percentage points (15.5%) more likely to report feeling focused on the task (p = 0.032) and 13.7 percentage points (32.7%) less likely to report thinking about financial worries (p = 0.044). A supplementary test showed treated workers were 10 percentage points (31%) more likely to generate explanations for a low-income person’s negative affect that were unrelated to financial concerns (p < 0.05), suggesting a broadening of cognitive scope. These measures are treated as suggestive because they were collected only at a single point and are self-reported; the primary evidence rests on objective production data because it is more objective and collected at fine hourly resolution throughout the post-pay period.

Q10. What does the paper say about optimal payment frequency as a policy implication?

The authors are cautious in drawing a direct policy inference about paying workers more frequently. While the positive productivity effect of early payment points toward more frequent paydays reducing financial strain, this must be weighed against workers’ self-control problems in consumption. In settings where workers face lumpy expenditure needs (e.g., monthly rent), more frequent payments could cause under-saving and worsen strain at the time of lumpy bills. The authors suggest payment frequency or size that matches expenditure needs, or more generally financial products that allow workers to time income receipts to coincide with expenses, as potentially more robust solutions — noting that such products appear largely absent in these markets.

Key Concepts

Financial strain (as used in the paper): A psychological burden arising from pressing present needs for resources — defined in the authors’ model as increasing in both the current marginal utility of consumption (i.e., how valuable an additional rupee would be today) and the level of outstanding debt (including lender harassment pressure). Strain is present-oriented: it responds to current cash-on-hand and debt levels, not to expected future income, which is why anticipating a payment does not fully relieve it.

Automatic input (a): In the authors’ behavioral model, one of two inputs into production. Unlike “effortful” input (e), which the worker consciously controls (speed of hands, consciously directed attention), the automatic input captures cognitive functions that are beyond the worker’s full control — background attentional processes that can be degraded by financial strain even when a worker is motivated and exerting high effort. The key behavioral assumption is that a falls when financial strain is high, independently of chosen effort.

Attentiveness index: A composite measure constructed from three unincentivized physical markers embedded in completed leaf plates: (i) number of double holes (pairs indicating a stitch was removed to correct a mistake); (ii) number of leaves used; (iii) number of stitches used. The index is normalized to the control group’s post-pay distribution and reverse-coded so higher values denote better attentiveness. Workers were unaware these dimensions were measured. The index captures attentional lapses — unforced errors that increase the number of steps and time needed to complete each plate.

Announcement period: The days between when workers are individually informed of their payment schedule (day 5) and when the interim payment is actually disbursed (day 8 or 9). This window serves as a within-experiment control: if effects arose from information about impending cash (e.g., through discounting, gift exchange, or trust), they should appear here. The consistent absence of treatment effects during this period is a key identification result.

Post-pay period: The days from the interim payment until the contract end (day 12). The main productivity and attentiveness treatment effects are estimated in this window, comparing treatment workers (who have received cash) to control workers (who have not yet been paid).

Lean season: The months outside the peak agricultural planting and harvesting periods (roughly six to eight months per year in the study area) during which agricultural workers seek intermittent casual employment in manufacturing, construction, and other sectors. Employment rates are low (1.9 paid days per week on average), income is low and variable, and financial strain is correspondingly high. The experiment is intentionally conducted during this period to maximize baseline levels of financial concern.

Piece-rate elasticity of effort: The responsiveness of output to changes in the marginal return per unit produced (the piece rate), holding financial strain constant. In the supplementary rounds, a one-rupee increase in the piece rate raises output by 0.020 SD. The authors interpret this as the upper bound on how much pure motivational effort can move output in this task, and use it to benchmark the cash infusion effects, which are roughly five times larger per unit of treatment variation and additionally move attentiveness (which piece-rate changes do not).

Dollar Dominance and the Transmission of Monetary Policy

Mon, 01 Jan 0001 00:00:00 +0000

Layer 1 — Summary

An emerging view in international macroeconomics contends that dollar invoicing of exports renders monetary policy ineffective for non-U.S. countries: because export prices are allegedly sticky in dollars, exchange rate depreciations cannot shift expenditure toward domestic goods, muting the classical Mundell-Fleming channel. McLeay and Tenreyro argue that this view rests on empirical assumptions that are not borne out by the data: goods priced in dollars tend to have more flexible prices and higher elasticities of substitution, not the monopoly power and sticky dollar prices assumed in dominant currency pricing (DCP) models. They propose a mixed currency pricing (MCP) framework that incorporates heterogeneous price flexibility and intra-sector international competition, and show that even with dollar pricing, depreciating the currency by loosening monetary policy can still boost exports and activity materially. The limit to any expansion is not demand, but supply capacity: after a depreciation, domestic dollar costs fall, flexible-price exporters lower prices slightly and gain large market share due to high demand elasticities, and the expansion runs until rising marginal costs offset the initial depreciation — producing limited reduced-form dollar pass-through as an equilibrium result rather than evidence of nominal stickiness. Empirical tests using monetary policy shocks in a sample of emerging and developing economies, case studies of Canada and Chile as commodity exporters, and three large devaluation episodes all find significant, material increases in exports and aggregate activity following exchange-rate depreciations, consistent with the MCP model’s predictions.

In depth

Q1. What is the specific empirical claim that DCP models rest on, and how do McLeay and Tenreyro challenge it?

DCP models (e.g., Gopinath et al. 2020) posit that exporters invoicing in dollars have monopoly power and face nominal rigidities that keep their dollar export prices sticky. The observable implication used to motivate this assumption was limited exchange rate pass-through to dollar export prices. McLeay and Tenreyro show that low pass-through is equally consistent with a flexible-price, high-elasticity equilibrium. When demand elasticities are high, firms optimally absorb exchange rate changes through quantities rather than prices; the reduced-form pass-through coefficient is small even without any nominal friction. Low pass-through is therefore not informative about the degree of nominal rigidities, and using it to calibrate sticky-price DCP models and draw normative conclusions about exchange rate policy is unwarranted.

Q2. What are the three empirical facts that motivate the MCP framework’s assumptions?

Fact 1: Homogeneous products (commodities and commodity-like goods traded on organized exchanges or reference-priced, following Rauch 1999) represent a large share of goods exports, exceeding 70% for developing economies, around 60% for emerging economies, and around 35% for advanced economies; Sub-Saharan Africa, Latin America, and the Middle East all have shares above 50%. Fact 2: Homogeneous and more competitively produced goods have more flexible prices, documented across multiple countries — for instance, Nakamura and Steinsson (2008) find a median monthly price-change frequency of 10.8% for finished-good producer prices but 98.9% for crude materials. Fact 3: Dollar (vehicle currency) invoicing is most prevalent precisely in these homogeneous, competitive-good sectors; classical work by McKinnon (1979) and Magee and Rao (1980) emphasized that vehicle-currency invoicing facilitates continuous price comparability in competitive markets, and panel regressions corroborate a positive relationship between the share of exports invoiced in dollars and the homogeneous-goods share of exports.

Q3. What is the mechanism through which depreciation boosts exports in the MCP model, and why does this generate low observed pass-through?

With sticky wages (representing non-tradable input price stickiness more broadly), a monetary policy-induced depreciation lowers the domestic cost of production when expressed in dollars. For competitive exporters facing highly elastic demand, even a small reduction in the dollar price translates into a substantial gain in export quantities. Firms therefore lower their dollar prices slightly, trading some profit margin for a large increase in market share. As exports expand, domestic marginal costs rise (firms move up the upward-sloping marginal cost curve), partially offsetting the depreciation’s effect on dollar costs. In equilibrium, the net dollar price movement is small — producing the observed limited pass-through — but the quantity response is large. In the perfectly competitive limit (relevant for commodity exporters), the dollar price is unchanged by the world market, and the entire adjustment is through an expansion of export volumes until rising domestic marginal costs absorb the depreciation. The implied observation is identical to a sticky-price model for prices, but “the implications for export quantities are diametrically opposed.”

Q4. How does the MCP model nest existing frameworks, and what does it add relative to the DCP and PCP benchmarks?

The MCP (mixed currency pricing) framework nests sticky-price DCP as a special case (by setting demand elasticities low and allowing full price stickiness) and produces behavior close to PCP (producer currency pricing) in the flexible-price, high-elasticity limit — restoring the allocative properties of the exchange rate from Obstfeld and Rogoff (1995). The distinctive addition is intra-sector international competition: domestic exporters face competition from international competitors producing highly substitutable varieties of the same good, so substitution elasticities can be high at the variety level even when macro-level elasticities between goods remain low. This follows a bottom-up approach to elasticities as in Feenstra et al. (2018). The model also allows heterogeneous nominal rigidities across producers, with exporters of dollar-invoiced homogeneous goods having flexible prices while non-tradable input prices (wages) remain sticky — the source of monetary non-neutrality and the mechanism for real exchange rate effects.

Q5. What is the role of supply capacity, and why is it “the limit” rather than demand?

In the sticky-price DCP model, the constraint on the export response is on the demand side: dollar prices do not move, so demand is unchanged, and there is no export response at all. In the MCP model, demand responds immediately to the cost reduction — the constraint that eventually stops the expansion is supply capacity, captured by the slope of the marginal cost curve and macroeconomic constraints on non-tradable inputs. With a flat marginal cost curve (plentiful supply capacity), exports expand materially; with a steep curve or hard capacity constraints, the increase in marginal cost fully offsets the depreciation before much quantity adjustment occurs. This supply-side framing reorients the policy question: the limiting factor for monetary policy’s external effectiveness is not whether dollar prices can move, but whether the domestic economy has the productive capacity to expand tradable output. This also connects the paper to the Salter-Swan two-good framework and to Schmitt-Grohé and Uribe (2021).

Q6. What do the macroeconomic empirical tests find, and how do they distinguish the MCP from sticky-price DCP?

The paper uses three empirical exercises. First, using a sample of developing and emerging economies, monetary policy expansions that generate exchange rate depreciations cause significant increases in both exports and aggregate economic activity — consistent with the MCP model’s material export response and inconsistent with the DCP prediction of no export channel. Second, focusing on Canada and Chile as commodity exporters where the MCP assumptions (competitive markets, flexible export prices) are especially applicable, the aggregate results are corroborated and sectoral evidence provides additional support. Third, three case studies of large devaluations in the sample document that they are followed by material increases in exports relative to trend. In all exercises, the direction and magnitude of export and output responses are consistent with a functioning expenditure-switching channel, even where exports are priced in dollars.

Q7. How does the paper reinterpret the pass-through evidence that motivated sticky-price DCP models, and what does this imply for normative conclusions?

Standard reduced-form pass-through regressions relate the change in dollar export prices to changes in the exchange rate. These regressions typically omit or fail to fully capture movements in marginal cost. In the MCP model, flexible-price firms fully pass through changes in marginal cost; the observed limited pass-through to export prices is an equilibrium result of the offsetting rise in marginal costs as export volumes expand, not evidence of a nominal friction. Because the standard regressions omit marginal cost dynamics, they risk attributing the equilibrium quantity-driven equilibrium to a pricing friction. This has direct normative implications: the case made by the IMF (2019, 2020) that dollar invoicing worsens the cost-benefit calculation for flexible exchange rates — and may bolster the case for capital controls — rests on interpreting low pass-through as evidence of stickiness. If low pass-through instead reflects high demand elasticities and supply-side adjustment, the normative argument for constraining exchange rate flexibility is weakened.

Q8. How does the paper relate to the purchasing power parity puzzle and the Mussa puzzle?

The MCP framework offers explanations for two classic international macro puzzles without assuming nominal rigidities in export prices. On the PPP puzzle (the volatility and persistence of the real exchange rate, Rogoff 1996): in the MCP model, exporters’ optimal reset prices move very little after exchange rate changes — not because of stickiness, but because demand is elastic and marginal costs rise quickly. This predicts limited movement in relative export prices, consistent with empirical evidence in Blanco and Cravino (2020) and Itskhoki and Mukhin (2025). On the Mussa puzzle (the large jump in nominal and real exchange rate volatility after the Bretton Woods collapse): the model’s mechanism via sticky wages is consistent with evidence that depreciations produce slow adjustment of non-tradable prices (Burstein, Eichenbaum, and Rebelo 2005), generating real exchange rate movements despite limited response in traded-good dollar prices.

Key Concepts

Dominant currency pricing (DCP): A framework in which non-U.S. exporters set and maintain prices in U.S. dollars, with sticky dollar prices. As formulated by Gopinath et al. (2020), DCP predicts that exchange rate depreciations by non-U.S. countries do not reduce dollar export prices and therefore do not stimulate export demand — muting the expenditure-switching channel of monetary policy.

Mixed currency pricing (MCP): The framework introduced in this paper. It allows heterogeneous price flexibility and market structure across export sectors, nesting both sticky-price DCP and flexible-price PCP as special cases. Dollar-priced exports face elastic demand from international competition, have flexible prices, and respond to depreciations through quantities rather than prices. Non-traded inputs (wages) remain sticky, providing the source of monetary non-neutrality.

Expenditure-switching channel: The mechanism by which exchange rate depreciations redirect spending toward domestically produced goods, boosting exports and aggregate demand. In PCP models, this works through a fall in relative export prices. In the MCP model, it works through an expansion in export quantities even when dollar prices change little.

Exchange rate pass-through (to export prices): The elasticity of dollar export prices with respect to the nominal exchange rate. In sticky-price DCP models, low pass-through reflects a nominal friction (prices cannot adjust). In the MCP model, low pass-through reflects high demand elasticities and offsetting marginal cost increases: it is an equilibrium outcome, not a friction, and therefore does not imply that export volumes are unresponsive.

Intra-sector international competition: The market structure feature central to the MCP framework. Domestic exporters of a given good compete with foreign suppliers of highly substitutable varieties, making their demand elastic at the variety level even if aggregate elasticities across different goods categories are low. This follows Armington (1969) as implemented by Feenstra et al. (2018).

Supply capacity constraint: In the MCP model, the binding constraint on how much a depreciation can boost exports. With high demand elasticities, demand for domestic exports expands freely; the limit is set by how quickly rising domestic marginal costs absorb the improvement in export profitability. The supply constraint replaces the demand constraint that operates (mechanically, via zero price response) in sticky-price DCP models.

Homogeneous goods (Rauch 1999 classification): Goods traded on organized commodity exchanges or reference-priced in trade publications, as opposed to differentiated goods. McLeay and Tenreyro use this classification to establish that dollar-invoiced exports are disproportionately homogeneous, competitive, and flexible-priced — contrary to the DCP assumption of monopoly power and price stickiness.

Downward Rigidity in the Wage for New Hires

Mon, 01 Jan 0001 00:00:00 +0000

Layer 1 — Summary

Hazell and Taska use wages posted on online job vacancies — matched to job titles and establishment identifiers from Burning Glass Technologies — to measure the wage for new hires at the job level (same job title and establishment) over 2010Q1–2020Q2. They find that this measure of the wage for new hires is rigid downward and flexible upward. At the job level, the nominal posted wage changes infrequently — on average once every 5–6 quarters — and conditional on changing, is four times more likely to rise than to fall. In the cyclical dimension, job-level posted wages rise strongly when state unemployment falls but do not fall when state unemployment rises; real wages exhibit the same asymmetric pattern. These results do not appear in the average wage for new hires (which aggregates across all job types), because time-varying job composition inflates the variance of average wages and raises standard errors roughly twentyfold relative to job-level regressions — explaining why prior work using worker-level survey data found no evidence of downward rigidity. A Heckman (1979) selection correction for firms’ selection into vacancy posting suggests that selection bias in the job-level regression is moderate. The findings provide direct empirical support for models in which downward wage rigidity for new hires — specifically at the job level — amplifies unemployment fluctuations and generates asymmetric unemployment dynamics.

In depth

Q1. Q: What is the central empirical claim of the paper?

A: At the job level — defined as the same job title within the same establishment — the wage posted for new hires is rigid downward and flexible upward. It changes infrequently and, conditional on changing, rises far more often than it falls; and it responds to falls in unemployment but not to rises in unemployment.

Q2. Q: What data does the paper use, and what defines a “job”?

A: The paper uses the Burning Glass Technologies dataset of wages posted on online vacancies, covering January 2010 to June 2020. A “job” is a job title within an establishment whose wages are paid at a given frequency (e.g., hourly or annual). The data come from the near-universe of online job postings — roughly 40,000 sources — and the main regression sample consists of jobs that post wages, have job title and establishment information, and post vacancies in multiple quarters, yielding approximately 3.05 million vacancies, representing about 0.8% of total US vacancies.

Q3. Q: How do the authors validate that posted wages measure the wage for new hires?

A: They construct a measure of the wage for new hires from the Current Population Survey (CPS) — workers switching jobs or entering from unemployment — at the state, industry, and occupation level. Regressing log CPS wages on log Burning Glass wages (using an IV split-sample procedure to correct for attenuation bias) yields a coefficient close to 1 across specifications and levels of aggregation, indicating that average posted wages move roughly one-for-one with average wages for new hires in representative survey data.

Q4. Q: How is the frequency of wage change estimated?

A: Because wages are not observed in quarters without a vacancy posting, the authors adapt a constant-hazard model from the price-setting literature (following Nakamura–Steinsson and Klenow–Kryvtsov). The latent wage evolves stochastically between postings; the observed wage is treated as a draw from this process. The quarterly probability of wage change is estimated at 0.17–0.19 across specifications, implying implied durations of unchanged wages of 4–5 quarters.

Q5. Q: What is the asymmetry in the direction of wage changes?

A: In the unweighted baseline, the quarterly probability of a wage decrease is 0.04, whereas the probability of a wage increase is 0.12 — roughly a three-to-one ratio in probabilities, summarized in the paper’s abstract as wages being “four times more likely to rise than to fall.” The distribution of non-zero wage changes also shows a pronounced pile-up of small positive changes relative to small negative changes, consistent with a downward constraint on wage setting.

Q6. Q: What is the first piece of cyclical evidence for downward rigidity?

A: A binned scatterplot (Figure 1) of job-level wage growth against state-level quarterly changes in unemployment shows a strong, roughly linear relationship when unemployment is falling — wages rise with falls in unemployment, both for small and large declines. When unemployment rises, however, wages do not fall — neither for small nor for large increases in unemployment. This asymmetry is robust to regression-based analysis and to identified labor demand shocks.

Q7. Q: Are real wages also rigid downward?

A: Yes. The paper reports that real wages (nominal posted wages deflated) are also rigid downward and flexible upward, mirroring the pattern for nominal wages.

Q8. Q: What is the job-composition problem, and why does it matter?

A: The average wage for new hires — the object measured in most prior work — aggregates across all job types that are actively hiring. If the composition of jobs hiring shifts over the business cycle (e.g., the share of lower-wage jobs rises in recessions), then average wages can fall even if no individual job cuts its wage, and can stay flat or rise even if every job cuts its wage. Job composition therefore confounds cyclicality estimates based on average wages. By tracking the same job title at the same establishment across successive vacancies, the authors purge wage changes driven by shifting composition.

Q9. Q: Why did prior work find no evidence of downward rigidity for new hires?

A: Prior work used worker-level survey data (e.g., Bils 1985; Pissarides 2009 survey) that controls for worker characteristics but averages across jobs — the average wage for new hires. The volatility of job composition inflates the variance of this average measure. In the Burning Glass data, standard errors from regressions using average wages are roughly twenty times larger than those from job-level regressions, making it impossible to detect downward rigidity even if it exists. Point estimates in prior work suggested procyclicality but were too imprecise to exclude downward rigidity.

Q10. Q: How does this paper relate to Gertler, Huckfeldt, and Trigari (2020) and Grigsby, Hurst, and Yildirmaz (2021)?

A: Both papers attempt to control for job composition at the worker level. Gertler et al. focus on wages of workers hired from unemployment (less affected by composition than all new hires) and find weakly procyclical wages. Grigsby et al. use rich payroll data and worker-level matching to control for composition and also find weakly procyclical wages. The present paper complements these by using job-level data that directly purges composition without relying on worker characteristics, and adds evidence on the asymmetry of rigidity (not just average procyclicality).

Q11. Q: What is the role of the Heckman selection correction?

A: If firms select into vacancy posting depending on business-cycle conditions, the sample of observed posted wages may be non-random, biasing job-level wage-cyclicality estimates. The authors implement a standard Heckman (1979) two-step selection correction. The correction suggests that selection bias in the job-level regression is moderate — it does not overturn the finding of downward rigidity.

Q12. Q: What are the four main caveats the authors acknowledge?

A: (1) The main sample is small — 0.8% of US vacancies — though the authors show it is broadly representative on observables and that wages track representative survey data. (2) The paper measures rigidity only for jobs that post wages; jobs that do not post wages might be more flexible, though the share of vacancies posting wages does not decline during contractions. (3) Posted wages may differ from realized (bargained) wages; however, wages are rigid even in occupations where bargaining is uncommon. (4) The Pandemic Recession is the main contractionary episode in the sample, and it involved labor supply shocks as well as demand shocks; the authors address this through identified labor demand shock regressions and by ending the sample in June 2020.

Q13. Q: What are the implications for models of unemployment fluctuations?

A: In the Diamond–Mortensen–Pissarides search model, Pissarides (2009) emphasizes that the wage for newly hired workers — not continuing workers — is the relevant margin for unemployment fluctuations. Shimer (2005) showed the standard calibration produces too-small unemployment fluctuations; wage rigidity for new hires can resolve this. The paper’s finding of downward-but-not-upward rigidity additionally supports models (e.g., Dupraz, Nakamura, and Steinsson, 2020) in which this asymmetry generates asymmetric unemployment dynamics — unemployment rises sharply in contractions but falls more slowly in expansions.

Q14. Q: How do wages for new hires compare with wages for continuing workers in terms of rigidity?

A: The paper finds approximate parity. The implied duration of unchanged wages from the job-level posted wage data (4–5 quarters) is similar to estimates for continuing workers in the prior literature. This is perhaps surprising because wages could in principle be more flexible for new hires than continuing workers — firms might cut wages for new hires even while insuring continuing workers (Beaudry and DiNardo, 1991). The results instead suggest that internal equity concerns (Bewley, 2002) or other forces produce similar rigidity for both groups.

Key Concepts

Job level wage: The wage across successive vacancies posted by the same job title at the same establishment. This is the unit of observation in the paper’s main analysis and the object for which downward rigidity is documented. Distinct from the average wage for new hires (which aggregates across all job types).

Downward rigidity (as used in this paper): An empirical pattern in which wages at the job level do not fall during contractions — they do not respond to rising unemployment — while rising during expansions in response to falling unemployment. The claim is descriptive: the data show wages do not fall; the paper does not structurally identify the mechanism enforcing this floor.

Job composition problem: The bias introduced when measuring cyclicality of the average wage for new hires using data that aggregates across different types of jobs. If the mix of job types hiring shifts with the business cycle, average wages can change even when no individual job changes its wage, and can mask individual-job wage changes. Job-level data resolve this by holding the job fixed.

Burning Glass Technologies dataset: A database of wages posted on online job vacancies, drawn from approximately 40,000 online sources (job boards and company websites), covering the near-universe of US online vacancies. The paper’s main regression sample uses the subset with posted wages, job title, establishment identifiers, and multiple quarters of postings, spanning January 2010 to June 2020.

Constant hazard model (wage change frequency): An estimation procedure adapted from the price-setting literature to recover the quarterly probability of wage change from a dataset in which wages are only observed when a vacancy is posted. The latent wage evolves with a constant hazard of change between observations; observed wage changes identify the hazard rates for increases and decreases separately.

Average wage for new hires: The mean wage across all workers newly entering employment (or across all new-hire jobs), used in prior work (Bils 1985 and related). Does not control for job composition. Shown in this paper to exhibit no detectable downward rigidity, with standard errors roughly twenty times larger than in job-level specifications — because job composition variance inflates the residual variance.

Heckman selection correction: A two-step procedure (Heckman 1979) to correct for the possibility that firms that post vacancies — and post wages — are a selected sample that differs systematically across the business cycle. The paper applies this to assess whether selection into vacancy posting biases the job-level wage-cyclicality estimates; the correction suggests bias is moderate.

Dynamic Concern for Misspecification

Mon, 01 Jan 0001 00:00:00 +0000

Layer 1 — Overview

Research Question

This paper asks how an agent who fears that none of their probabilistic models is the correct description of the data-generating process (DGP) should update that fear as evidence accumulates, and what long-run behavior such an agent exhibits. The central contribution is making the concern for misspecification endogenous: the better the agent’s structured models explain past observations, the less concerned the agent becomes.

Decision Criterion

The agent posits a finite-dimensional parametric set of structured models Θ, holds a prior µ over Θ, and evaluates each action according to an average robust control criterion. This criterion takes a weighted average (over models) of robust control assessments, where each assessment penalizes expected utility for probability distributions that deviate from the structured model in terms of relative entropy, scaled by a misspecification concern parameter λ > 0. A standard subjective expected utility maximizer is the limiting case as λ → 0 (no concern), and a maxmin agent is approached as λ → ∞.

Endogenous Misspecification Concern

The concern parameter λ is updated each period as a function of the likelihood ratio test (LRT) statistic of the structured models against unstructured alternatives, scaled by a time-normalizing sequence βₜ: λ(hₜ) = LRT(hₜ, Θ) / (2βₜ). The sequence βₜ determines how demanding the agent is in evaluating model fit.

Taxonomy of Agent Types

Three types emerge based on the speed of βₜ:

Statistician type (βₜ = ct, linear): applies a time scaling that keeps the LRT asymptotically informative about the degree of misspecification. This is the unique type satisfying both safety (long-run average payoff at least ε-close to the maxmin guarantee, almost surely) and consistency under almost correct specification (no ε-regret when misspecification is small).
Lenient type (t = o(βₜ)): attributes unexplained evidence to sampling variability; corresponds to the Law of Large Numbers intuition.
Demanding type (βₜ = o(t)): overly penalizes small discrepancies, analogous to the Law of Small Numbers fallacy (Tversky and Kahneman, 1971).

Standard SEU maximization fails safety; robust control with an invariant λ (Hansen and Sargent, 2001; 2022) fails consistency under almost correct specification.

Long-Run Convergence Results (Theorem 1)

For a misspecified agent (no θ ∈ Θ with qθ_{a*} = p*_{a*}), the nature of the limit action a* depends on the agent type:

Lenient type: a* is a Berk-Nash equilibrium — an SEU best reply to beliefs supported on the models with minimum relative entropy from the true DGP.
Demanding type: a* is a maxmin equilibrium — a worst-case best reply to all models absolutely continuous with respect to the true DGP.
Statistician type: if behavior converges, a* is a c-robust equilibrium — a robust control best reply to beliefs on the relative entropy minimizers, with the concern for misspecification endogenously set at minθ R(p*{a*} || qθ{a*}) / c.

For a correctly specified agent (Proposition 2), every limit action is a self-confirming equilibrium, regardless of the agent type.

Cycles and Limit Frequency (Section 4, Theorem 2)

The statistician type’s behavior need not converge. In natural settings, the agent cycles between actions: playing a “safe” action whose consequences are well-explained by Θ reduces concern for misspecification, eventually leading to a riskier action whose poorly-explained consequences raise concern again, inducing a return to the safe action. The paper proves that every limit frequency (empirical distribution over actions) is a mixed c-robust equilibrium — a generalization that allows mixing while tying the concern for misspecification to the frequency-weighted average relative entropy of each action.

Empirical Applications

Monetary policy cycles (Sargent 1999, 2008): In a central bank model where the true DGP includes increased inflation variability under aggressive policy (a feature absent from the bank’s structured models), no pure c-robust equilibrium exists for small c. The model predicts persistent cycles between conservative and aggressive policy. The frequency of the conservative policy is increasing in the strength of the exploitable inflation-unemployment trade-off (θ₁π + θ₁a).
Labor supply under complex tax schedules (Rees-Jones and Taubinsky, 2020): Agents with a “schmeduling” heuristic (linearizing the tax schedule) are misspecified. Berk-Nash equilibrium predicts these agents exert excess effort, with the bias increasing in the complexity (convexity) of the tax code. The c-robust equilibrium attenuates this bias: conditional on the equilibrium, minθ R(p*_a || qθ_a) > 0, so agents maintain positive concern for misspecification and pull back from the biased recommendation. The paper rationalizes the empirical finding that approximately 40% of agents hold the schmeduling belief but only about 20% fewer agents act on it — consistent with endogenous concern reducing the behavioral impact of the biased model.

Axiomatization (Section 5)

The paper axiomatizes the static average robust control criterion (Theorem 3) using: a Variational Axiom (from Maccheroni, Marinacci, and Rustichini, 2006a), a Structured Savage axiom (Sure-Thing Principle for bets on the model identity), an Intramodel Sure-Thing Principle (STP for bets conditional on the model), and Uniform Misspecification Concern (the agent is equally concerned about misspecification regardless of which model is identified as best-fitting). Three additional dynamic axioms characterize preference evolution: Constant Preference Invariance (utility index stable over time), Dynamic Consistency over Models (Bayesian updating over structured models), and Q-Likelihood (misspecification concern increases in the LRT). A novel Asymptotic Frequentism axiom characterizes the statistician type: preferences must become arbitrarily similar (in a precise quantitative sense) after sufficiently long histories with the same outcome frequency.

In depth

Q1. What is the average robust control criterion and how does it generalize prior decision criteria?

A: An agent evaluates action a by averaging over structured models θ a robust control assessment: for each θ, minimize expected utility over probability distributions within relative entropy distance (penalized by 1/λ) of qθ_a, then integrate over θ with prior µ. This nests SEU (λ → 0, perfect trust in models), standard robust control of Hansen and Sargent (2001) (µ is Dirac, single benchmark model), and maxmin expected utility of Gilboa and Schmeidler (λ → ∞). The key extension is allowing µ to be nondegenerate, so the agent is simultaneously uncertain about the best-fitting model and about whether any model is exact.

Q2. What is the role of the likelihood ratio test statistic in driving misspecification concern?

A: The LRT statistic compares the maximum likelihood of the structured models against the best unstructured alternative. It diverges almost surely when the agent is misspecified, regardless of how close the structured models are to the true DGP. The concern parameter λ(hₜ) = LRT(hₜ, Θ) / (2βₜ) uses a time-scaling sequence βₜ to keep this statistic interpretable. Without scaling, a misspecified agent’s concern would always explode to infinity.

Q3. Why does linear time scaling (βₜ = ct) uniquely characterize the statistician type as rational?

A: Proposition 1 establishes two properties: (1) ε-safety — every βₜ = ct-optimal policy achieves average payoff at least ε below the maxmin guarantee, almost surely; (2) ε-consistency under almost correct specification — for DGPs sufficiently close to Θ, the agent avoids long-run regret. Part 2 of Proposition 1 shows that no βₜ with βₜ = o(t) or t = o(βₜ) satisfies both properties simultaneously. SEU fails safety; invariant-λ robust control fails consistency.

Q4. What is a c-robust equilibrium and how does it differ from a Berk-Nash equilibrium?

A: A Berk-Nash equilibrium (Esponda and Pouzo, 2016) requires the action to be an SEU best reply to beliefs supported on the relative entropy minimizers of the true DGP. A c-robust equilibrium requires the same support condition but with the best reply taken under the average robust control criterion, where the concern for misspecification λ equals minθ R(p*{a*} || qθ{a*}) / c — that is, the minimum relative entropy scaled by 1/c. The endogenous λ is positive whenever the agent is misspecified, so the agent does not fully trust even the best-fitting model.

Q5. How does the paper explain that misspecified lenient types converge to Berk-Nash while demanding types converge to maxmin?

A: For the lenient type (t = o(βₜ)), the time scaling makes the concern for misspecification converge to 0 (the LRT grows slower than βₜ relative to t), so the agent effectively behaves as an SEU maximizer with beliefs on the KL-minimizing models — the Berk-Nash condition. For the demanding type (βₜ = o(t)), the LRT diverges relative to βₜ, so λ → ∞ and the agent’s preferences converge to worst-case evaluation over all models absolutely continuous with the true DGP — the maxmin condition. These are Theorem 1, parts 1 and 2.

Q6. Why does the statistician type exhibit cycles rather than convergence?

A: Section 4 and Corollary 1 show in the monetary policy application that no pure c-robust equilibrium exists for small c. Intuitively, the conservative policy (a=0) is a best reply to a high misspecification concern, but it produces outcomes well-explained by Θ, which drives concern down. The aggressive policy (a=1) is a best reply to a low concern, but it generates increased inflation variability not captured in Θ, which drives concern up sharply. There is no fixed point that is self-sustaining, so the agent cycles. Theorem 2 shows that the empirical frequency of actions still converges to a mixed c-robust equilibrium.

Q7. What are the quantitative comparative statics for the monetary policy cycles?

A: Corollary 1 establishes that there exists a threshold c̄ > 0 such that for all c ≤ c̄: (1) no pure c-robust equilibrium exists; (2) a mixed c-robust equilibrium exists; and (3) in the maximal and minimal equilibria, the frequency of the conservative policy α*(0) is increasing in θ₁π + θ₁a — a larger exploitable trade-off between inflation and unemployment implies more time spent on the aggressive policy.

Q8. How does the model rationalize the Rees-Jones and Taubinsky (2020) labor supply finding?

A: Rees-Jones and Taubinsky (2020) find that approximately 40% of agents have incentive-compatible beliefs consistent with the schmeduling heuristic (linearizing a convex tax schedule), but approximately 20% fewer agents act according to that heuristic. In a Berk-Nash equilibrium, the schmeduling agent exerts excess effort relative to the optimum; the more convex the tax code, the larger the excess. In a c-robust equilibrium, the agent retains a positive misspecification concern proportional to the deviation between the convex tax schedule and the linear approximation. Higher effort levels are more exposed to uncertainty in the marginal rate (the misspecified term θ+ε multiplies a higher average income z), so the concern for misspecification provides a natural force that reduces effort below the Berk-Nash prediction. The paper notes this finding is also consistent with an alternative interpretation in Rees-Jones and Taubinsky where all agents hold schmeduling beliefs but under-respond behaviorally.

Q9. What is the mixed c-robust equilibrium and why does it always exist?

A: A mixed c-robust equilibrium is a mixed action α* ∈ Δ(A) such that beliefs ν are supported on the relative entropy minimizers Θ(α*) — computed as the parameter minimizing the α*-weighted average relative entropy across actions — and every action in the support of α* is a best reply under the average robust control criterion with λ = minθ Σ_a α*(a) R(p*_a || qθ_a) / c. Proposition 3 proves existence by mapping this fixed-point condition to a Nash equilibrium in an auxiliary game between the agent and two adversarial Nature players, then invoking Reny (1999) on that game. A pure c-robust equilibrium need not exist, but mixing over actions allows the concern for misspecification to be calibrated to the frequency of poorly-explained actions.

Q10. How does Theorem 2 formally connect cycles to mixed c-robust equilibria?

A: Theorem 2 states that if βₜ = ct for all t and α* is a βₜ-limit frequency (i.e., the empirical action distribution converges to α* with positive probability under some optimal policy), then α* is a mixed c-robust equilibrium. The intuition is that when α* places weight on both a well-explained action and a poorly-explained action, the time-averaged relative entropy stabilizes at a fixed level, producing a stable endogenous concern for misspecification that makes the agent asymptotically indifferent between the actions in the support — sharply reducing the incentive to break the cycle.

Q11. What does the axiomatization contribute beyond the learning results?

A: The axiomatization (Section 5, Theorem 3) provides behavioral foundations observable from choices, without assuming the internal LRT mechanism. Two primary axioms pin down the average robust control criterion within the variational class: Structured Savage (Sure-Thing Principle for bets over model identity) and Uniform Misspecification Concern (equal concern for misspecification regardless of which model is revealed as best-fitting). Dynamic Consistency over Models pins down Bayesian updating. Q-Likelihood axiomatizes that the concern for misspecification is ordinally increasing in the LRT. The novel Asymptotic Frequentism axiom (Axiom 9) pins down the quantitative speed of adjustment: long histories with the same empirical frequency must induce asymptotically similar preferences, and Proposition 5 shows this implies λ_{hₜ} / (LRT(hₜ, Q) / (2tₙ)) converges to a finite limit — exactly the statistician type’s linear scaling.

Q12. What is the correlation between behavioral biases that the model predicts?

A: The paper derives three novel empirical predictions about the cross-sectional and time-series correlation of uncertainty attitudes: (1) long-run uncertainty aversion positively correlates with initial misspecification and with belief in the Law of Small Numbers; (2) these correlations are causal — repeated model failures and overly demanding evaluation induce a shift toward cautious behavior; (3) even holding misspecification and probability reasoning fixed, limit uncertainty attitudes are stochastic, depending on whether the limit action’s outcomes are well-explained by the structured models.

Q13. How does Example 2 (Correlation Neglect) show that endogenous concern can amplify rather than attenuate biases?

A: In a double auction, a buyer who mistakenly treats their own valuation and the ask price as independent (Correlation Neglect, Esponda, 2008) bids below the optimum in Berk-Nash equilibrium. In a c-robust equilibrium, the positive correlation between valuations and prices produces a strictly positive minθ R(p*{a*} || qθ{a*}), so the agent maintains misspecification concern. Since lower bids are accepted with lower probability (and thus are less sensitive to model misspecification), the endogenous concern drives the agent to bid even lower — amplifying the bias rather than attenuating it. This example illustrates that the direction of the correction depends on the geometry of how the misspecification interacts with the payoff structure.

Key Concepts

Average Robust Control Criterion: The decision criterion proposed in the paper. An agent evaluates action a by taking the expectation over structured models θ (with prior µ) of min_{p_a ∈ Δ(Y)} [E_{p_a}[u(a,y)] + (1/λ) R(p_a || qθ_a)]. This is a weighted average of robust control assessments, each penalizing distributions that deviate from a structured model in relative entropy. The parameter λ > 0 governs the intensity of misspecification concern, with SEU as the limit at λ → 0 and maxmin at λ → ∞.

Endogenous Misspecification Concern: Unlike prior robust control models where λ is fixed or set externally, here λ(hₜ) = LRT(hₜ, Θ) / (2βₜ) is a function of how well the structured models explain the observed history hₜ via the likelihood ratio test statistic. The better the models explain past data, the smaller λ becomes and the less the agent hedges.

Statistician Type: An agent who scales the likelihood ratio test statistic with a linear time sequence βₜ = ct for some c > 0. This is the unique agent type satisfying both ε-safety (guaranteed long-run average payoff above the maxmin guarantee minus ε) and ε-consistency under almost correct specification (no long-run regret when misspecification is small). The statistician type’s linear scaling is the only one for which the LRT statistic retains asymptotic informativeness about the degree of misspecification.

c-Robust Equilibrium: A fixed-point concept for the long-run behavior of the statistician type. Action a* is a c-robust equilibrium if it is an average robust control best reply to beliefs supported on Θ(a*) = argmin_θ R(p*{a*} || qθ{a*}), with misspecification concern λ = minθ R(p*{a*} || qθ{a*}) / c. This generalizes Berk-Nash equilibrium by incorporating an endogenous hedging motive proportional to the minimum relative entropy between the true DGP and the best structured model.

Mixed c-Robust Equilibrium: A generalization of c-robust equilibrium to mixed actions α* ∈ Δ(A) for environments where no pure equilibrium exists. The beliefs are supported on the models minimizing the α*-weighted average relative entropy, and the misspecification concern is tied to that average entropy. Every βₜ-limit frequency is a mixed c-robust equilibrium (Theorem 2). This concept characterizes the long-run time-average behavior when the statistician type cycles.

Law of Small Numbers (LSN) Type / Demanding Type: An agent for whom βₜ = o(t), meaning the time scaling grows sub-linearly. This agent is excessively sensitive to early model failures (analogously to the Law of Small Numbers fallacy of Tversky and Kahneman, 1971, where short-run frequencies are treated as the long-run norm). The long-run behavior of such a type converges to maxmin behavior rather than robust control.

Asymptotic Frequentism (Axiom 9): A novel axiom requiring that conditional preferences after sufficiently long histories with the same empirical outcome frequency must be arbitrarily similar (in a quantitative sense defined by measuring rods x, y, E) to a limiting preference. This axiom axiomatically pins down the statistician type’s linear time scaling: it implies that the ratio λ_{hₜ} / (LRT(hₜ, Q) / (2t)) converges to a finite limit c, exactly characterizing βₜ = ct.

Berk-Nash Equilibrium: The equilibrium concept (Esponda and Pouzo, 2016) that describes the long-run behavior of lenient (SEU) agents learning under misspecification. An action a* is a Berk-Nash equilibrium if it is an SEU best reply to beliefs supported on Θ(a*) — the KL-minimizing models — without any additional hedging against misspecification. The current paper shows that lenient types converge to Berk-Nash equilibria, while statistician types converge to c-robust equilibria that differ by incorporating a positive misspecification concern.

Dynamics of the Long-Term Housing Yield: Evidence from Natural Experiments

Mon, 01 Jan 0001 00:00:00 +0000

Each month a fraction of UK property leases are extended by 90 years or more, creating thousands of natural experiments in which the same property’s rent and capital value are revealed simultaneously. This paper uses these lease extensions — and Massachusetts and Cambridge rent-control removals as a second identification strategy — to estimate the expected long-term housing yield (annual rent-to-price ratio) and decompose its dynamics into rent-growth expectations and discount-rate components. The central finding is that housing yield movements are dominated by discount-rate shocks: variation in required returns on housing explains the overwhelming majority of yield variance, while expected rent growth contributes less than 10 percent. Housing booms are therefore primarily driven by falling required returns, not by rational expectations of higher future rents. The yield responds to real long-term interest rates with a slope significantly below one, consistent with a non-pecuniary convenience yield on housing that is not fully displaced by interest rate changes.

In depth

Q1. What do the natural experiments identify?

Lease extensions reveal the market’s valuation of the same physical dwelling at two points — just before and just after the 90-year extension — with the extension itself creating a clean variation in the remaining lease term (and hence in the present value of ownership) without changing the property’s rent-generating characteristics. This design separates the rent and price components of the yield at the property level, allowing identification of discount-rate and rent-growth contributions free of compositional differences across properties.

Q2. Why do discount rates dominate yield variation?

A present-value decomposition of the housing yield into expected rent growth and the discount rate assigns more than 90 percent of variance to the discount rate component, implying that periods of low housing yields (high prices relative to rent) reflect primarily that investors demand a lower return on housing — not that they expect rents to rise faster. This result mirrors Campbell-Shiller findings for equity markets but is especially striking for housing, where naive narratives often attribute booms to expected rent appreciation.

Q3. What does the convenience yield interpretation imply?

Housing yields respond less than one-for-one to real interest rate movements — a slope well below one in the yield-rate regression — implying that housing carries a non-pecuniary convenience yield (liquidity, collateral value, direct utility of ownership) that buffers the required return on housing against interest rate changes. When real rates rise, housing yields rise by less, so price-to-rent ratios decline by less than a frictionless model would predict.

Key concepts

housing yield : the annual rent-to-price ratio on residential property; the paper’s central object, decomposed into discount-rate and rent-growth components.

discount-rate channel : the dominant source of housing yield variation in this paper; movements in investors’ required return on housing, not expected rent growth, drive the observed yield dynamics.

convenience yield : the non-pecuniary value of housing ownership (liquidity, collateral, direct utility) that drives a wedge between the housing yield and the risk-free real interest rate; explains the less-than-one slope in the yield-rate relationship.

Efficiency Criteria, Income Taxation, and Heterogeneous Elasticities

Mon, 01 Jan 0001 00:00:00 +0000

Overview

Research Question. Can income tax schedules be justified as utilitarian-optimal without adopting extreme normative assumptions about how household welfare should be measured? The paper proposes a welfare criterion strictly stronger than Pareto efficiency—called rationalizability with bounded curvature—and asks whether observed US income taxes satisfy it.

Starting Point. Any Pareto-efficient nonlinear income tax schedule can, in principle, be rationalized as utilitarian-optimal under some cardinalization of household utilities (i.e., some choice of how to measure the cardinal scale of each household’s well-being). However, the paper shows that rationalizing Pareto-efficient taxes in this way often requires cardinalizations under which there is no population upper bound on the curvature of utility with respect to consumption. Equivalently, a utilitarian planner’s marginal willingness to transfer resources to households must fall arbitrarily quickly with the size of those transfers—an extreme form of status quo bias violated by virtually all quantitative optimal-tax exercises.

The Proposed Criterion. The authors restrict attention to cardinalizations with locally bounded curvature: there exists a finite (though potentially arbitrarily large) upper bound on the coefficient of relative risk aversion across the population. This admits two interpretations: (i) ex post, it requires that the social value of transfers not change arbitrarily quickly with transfer size; (ii) ex ante, it corresponds to a decision-maker behind a veil of ignorance with bounded risk aversion.

Main Theoretical Result. Within a standard Mirrlees model of nonlinear income taxation with arbitrary preference heterogeneity and intensive-margin labor supply, the paper proves that a tax schedule can be rationalized with bounded curvature if and only if government revenues are both decreasing and concave (not merely decreasing) with respect to a class of narrowly targeted “two-bracket” reforms—reforms that raise retention by $1 local to some income level $z$ and zero elsewhere. This contrasts with Pareto efficiency, which requires only that revenues be decreasing in these reforms (Bierbrauer, Boyer, and Hansen 2023). The additional requirement of revenue concavity is what distinguishes the bounded-curvature criterion from pure Pareto efficiency.

Sufficient Statistics. The paper derives explicit sufficient-statistics expressions for the first- and second-order derivatives of tax revenue with respect to these targeted reforms. The second derivative depends on higher moments of the elasticity distribution, specifically the income-conditional variance of compensated elasticities of taxable income (ETIs). Revenue convexity—which causes the second-order condition to fail—arises when income-conditional ETI variance is sufficiently high, even holding the mean ETI fixed. The economic mechanism is a “sort-and-extort” dynamic: a small tax reform sorts higher-elasticity households into income brackets where marginal taxes fall and lower-elasticity households into brackets where marginal taxes rise; repeating the reform then exploits this sorting by differentially taxing households by elasticity, as if applying group-specific tax schedules within a uniform income tax.

Empirical Findings. Using the NBER panel of US tax returns from 1979 to 1990, the paper estimates income-conditional mean ETIs of approximately 0.2–0.3 at most income levels. Crucially, it estimates a lower bound on income-conditional ETI variance by comparing elasticities of light versus heavy itemizers (defined by whether a household claims above or below the mean value of deductions in its income bracket). The low-elasticity group has an ETI of approximately zero and the high-elasticity group has an ETI of approximately one, implying a lower bound on ETI variance of roughly 0.2 at most incomes and approximately 0.25 at the top of the distribution. This lower bound is close to—and under plausible assumptions above—the threshold required for the second-order condition to fail. The authors conclude that the US income tax schedule in 1990 was likely Pareto efficient but likely not rationalizable with bounded curvature.

Quantitative Welfare Gains. In a calibrated model with a 50% top marginal tax rate, Pareto-tail shape of 2.5, mean ETI of 0.3, and ETI standard deviation of 0.75 (50% above the estimated lower bound), the planner gains significant welfare from either raising or lowering top marginal taxes. The welfare-maximizing top rate below the baseline is 13.3%, generating social value equivalent to a transfer of $1,966 per top earner. The welfare-maximizing top rate above the baseline is 71.2%, generating social value equivalent to a transfer of $972 per top earner. The revenue-maximizing rate is 80.9% under the baseline calibration, ranging from 74.6% to 86.8% as ETI standard deviation varies by ±25% of the lower bound.

Scope Conditions. The theoretical analysis is restricted to intensive-margin labor supply (abstracting from extensive-margin decisions); the empirical application focuses on top incomes where extensive-margin effects are likely small. The empirical period is 1979–1990, covering major federal and state tax reforms. Results concern local efficiency of the tax schedule, not global optimization.

In depth

Q1. What exactly is “rationalizability with bounded curvature” and how does it differ from Pareto efficiency?

A: Pareto efficiency requires that no small reform makes someone better off without making anyone worse off. Rationalizability (with any cardinalization) is equivalent to Pareto efficiency in this setting. Rationalizability with bounded curvature additionally restricts the cardinalization: there must exist a finite upper bound on the coefficient of relative risk aversion (or equivalently, on the curvature of utility with respect to consumption) across the population. This is a strictly stronger criterion than Pareto efficiency. A schedule can be Pareto efficient but not rationalizable with bounded curvature if the only cardinalizations that rationalize it require unbounded consumption utility curvature.

Q2. Why do “extreme” cardinalizations with unbounded curvature arise when rationalizing Pareto-efficient taxes?

A: When a Pareto-efficient schedule is rationalized as utilitarian, the cardinalization must make the set of feasible, recardinalized utilities convex so it can be separated from the set of Pareto-improving allocations. The paper constructs such a cardinalization explicitly: it takes the form of a function whose second derivative approaches negative infinity as utility approaches its baseline value. This implies the planner’s marginal value of transfers to a household falls precipitously as the household is made even slightly better off—an extreme status quo bias. Theorem 2.b establishes that all cardinalizations rationalizing a schedule with convex revenues must share this pathology.

Q3. What is the “sort-and-extort” mechanism and how does it generate revenue convexity?

A: When elasticities of taxable income (ETIs) are heterogeneous within an income level and the income density is declining steeply, a reform that lowers marginal taxes around income $z$ brings more households into the local bracket (because there are more households just below $z$ than above). Crucially, it disproportionately attracts households with higher ETIs, since they respond more strongly to the marginal tax cut and relocate from further away, where the density differs more. Repeating the reform therefore faces a higher-elasticity composition at $z$, generating larger positive behavioral effects—making revenues convex in the size of the reform. The second step (“extort”) involves raising taxes on the now-concentrated low-elasticity households at adjacent brackets, achieving as-if group-specific taxation within a single income tax schedule.

Q4. What is the precise relationship between revenue convexity and ETI variance?

A: The paper shows (Theorem 4) that the second-order revenue derivative with respect to a narrow two-bracket reform around income $z$ equals a positive function of the income density times the expression $-[1-R’_0(z)]\varepsilon(z) + [1-R’_0(z)]\alpha(z)[\varepsilon^2(z) + \text{var}_h[\varepsilon^h | z^h_0=z]]$. The first term is always negative (pushing toward revenue concavity). The second term, which includes the income-conditional variance of ETIs, can dominate and create revenue convexity when ETI variance is sufficiently large. In the benchmark case with a single household type at each income (no within-income heterogeneity), the variance term vanishes and revenues are always concave whenever decreasing.

Q5. What is the sufficient statistics test for rationalizability at the top of the income distribution?

A: At top incomes (assuming no income effects, no super-elasticities, and CES preferences), taxes are Pareto efficient if and only if $\tau_\text{top} < \frac{1}{1+\alpha_\text{top}\varepsilon_\text{top}}$, and they are rationalizable with bounded curvature if and only if additionally $\tau_\text{top} < \frac{2}{1+\alpha_\text{top}(\varepsilon_\text{top} + \sigma^2_\text{top}/\varepsilon_\text{top})}$, where $\tau_\text{top}$ is the top marginal tax rate, $\alpha_\text{top}$ is the Pareto tail shape, $\varepsilon_\text{top}$ is the mean ETI at the top, and $\sigma^2_\text{top}$ is the income-conditional ETI variance at the top.

Q6. How does the paper estimate a lower bound on income-conditional ETI variance?

A: The authors divide households at each income level into “heavy” and “light” itemizers based on whether their total deductions exceed the local income-bracket mean. They then estimate group-specific ETIs using local polynomial regressions of log income changes on log marginal retention changes, interacting tax changes with heavy-itemizer indicators. The within-year difference in elasticities between groups provides a lower bound on within-income ETI variance, since the two-group decomposition captures only a fraction of true variance. The interaction coefficient is allowed to vary by year to isolate within-year, within-income variation in elasticities rather than between-year compositional changes.

Q7. What are the estimated magnitudes of mean and variance of ETIs?

A: Income-conditional average ETIs are estimated at between 0.2 and 0.3 at most income levels, consistent with but somewhat below prior literature estimates. The low-elasticity group (light itemizers) has an ETI of approximately zero, while the high-elasticity group (heavy itemizers) has an ETI of approximately one. Given roughly equal group sizes, this implies a lower bound on ETI variance of approximately 0.2 at most incomes and approximately 0.25 at the ninety-fifth percentile. Subdividing the high-elasticity group into two, three, and four subgroups yields a lower bound of approximately 0.25 for variance at the top.

Q8. How does the back-of-the-envelope calculation work to assess whether the second-order test fails?

A: With $\tau_\text{top} \approx 0.5$, $\alpha_\text{top} \approx 2.5$, and $\varepsilon_\text{top} \approx 0.3$ (from prior literature), the second-order condition fails if and only if ETI variance exceeds approximately 0.27. The authors’ lower bound estimate of ETI variance is already approximately 0.25 (standard deviation approximately 0.5), just below this threshold. The authors note that if the true standard deviation exceeds the lower bound by more than 4%, the second-order condition fails, making it empirically likely that the 1990 US tax schedule was not rationalizable with bounded curvature.

Q9. Why does the paper focus on the top of the income distribution for the empirical test?

A: The second-order condition is most likely to fail at high incomes for three reasons simultaneously: (i) the marginal tax rate is highest, (ii) ETI means are somewhat higher there, and (iii) the Pareto parameter $\alpha(z)$ is largest (income density falls steeply), which amplifies the sort-and-extort mechanism. The authors also note that extensive-margin labor supply responses—which are abstracted away in the theory—are likely small at high incomes.

Q10. What does the calibrated quantitative application reveal about optimal top tax policy?

A: Calibrated with a 50% initial top marginal tax rate, Pareto tail shape of 2.5, mean ETI of 0.3, and ETI standard deviation of 0.75 (50% above the estimated lower bound), the model finds welfare gains in both directions of reform. The welfare-maximizing rate below the baseline is 13.3%, yielding equivalent welfare gains of $1,966 per top earner. The welfare-maximizing rate above the baseline is 71.2%, yielding equivalent gains of $972 per top earner. The revenue-maximizing rate is 80.9%, ranging from 74.6% to 86.8% when ETI standard deviation varies by ±25% of the lower bound. This sensitivity highlights that the optimal direction and magnitude of reform depend substantially on the uncertain degree of ETI heterogeneity.

Q11. How does the paper relate to the “inverse optimum” literature?

A: The inverse optimum approach (Bourguignon and Spadaro 2012; Hendren 2020) infers the first-order welfare trade-offs implicit in an observed tax schedule. This paper goes further by inferring from second-order empirical moments—specifically the income-conditional ETI variance—whether taxes are consistent with minimal requirements on how sensitive the planner’s trade-offs are to household welfare levels. Rather than assuming a welfare function, it tests whether any welfare function with bounded curvature can rationalize the observed schedule.

Q12. Is revenue convexity possible without within-income heterogeneity in preferences?

A: Yes, but only under more specific conditions. The paper provides two supplemental examples. In the first, all households have constant-elasticity labor disutility but differ in both productivity and elasticity across income levels; when lower-income households have higher elasticities, a reform reducing marginal taxes at $z$ attracts higher-elasticity households and raises the average elasticity, leading to convex revenues. In the second, all households have the same initial elasticity but individual elasticities change in response to reforms. However, with the standard additively separable CES preferences and no within-income heterogeneity, revenues are always concave when decreasing—consistent with Werning’s (2007) observation that the Pareto planner’s problem is convex in this case.

Q13. What is the role of random tax reforms in the paper’s logic?

A: Random tax reforms serve as an expository bridge. The paper shows that if the second-order revenue effect of a two-bracket reform is positive at some income $z$, then a “randomized” reform that applies the reform with equal probability in positive and negative directions generates an expected Pareto improvement—because the convexity of revenues implies expected revenues rise, while for any household with bounded risk aversion the reform’s second-order utility effect is also positive when the reform is sufficiently narrow. This establishes that revenue convexity implies random Pareto inefficiency under bounded risk aversion, and then the paper shows the analogous deterministic result for rationalizability.

Q14. What scope conditions attach to the sufficient conditions for rationalizability (Theorem 3)?

A: Theorem 3 requires Assumptions 1 and 3 plus two boundary conditions: the ratio $\delta\text{Rev}(z)/(zg(z))$ must remain bounded away from zero as income approaches 0 or infinity, and at all incomes there must exist households with low enough compensated elasticities. Assumption 1 requires that average and marginal taxes have upper bounds below one, that marginal taxes have a lower bound, and that $zg(z)$ converges to zero at the boundaries. Assumption 3 is a regularity condition on how conditional moments of the elasticity distribution vary with income. These conditions ensure that the narrow, self-financing reforms considered in the necessity proof cannot generate welfare improvements once revenues are both decreasing and concave.

Key Concepts

Rationalizability with Bounded Curvature. The property that a tax schedule is utilitarian-optimal under some cardinalization of household utilities in which there exists a finite (though potentially arbitrarily large) upper bound on the curvature of utility with respect to consumption across the population. Formally, there exists a continuous function $\bar{\rho}$ such that, for all households, the absolute value of $[w_h \circ u_h]_{cc} / [w_h \circ u_h]_c$ is bounded by $\bar{\rho}$ evaluated at the household’s income. This criterion is strictly stronger than Pareto efficiency and strictly weaker than utilitarian optimality under a fixed cardinalization.

Two-Bracket Reform. A targeted tax reform that increases retention (post-tax income) by $1 at incomes local to some level $z$ over a small bracket of width $\ell$, and zero elsewhere (smoothed at the edges). As $\ell \to 0$, this becomes an infinitesimally narrow reform. The first- and second-order revenue effects of these reforms—denoted $\delta\text{Rev}(z)$ and $\delta^2\text{Rev}(z)$—are the paper’s key objects: Pareto efficiency requires $\delta\text{Rev}(z) < 0$ for all $z$, and rationalizability with bounded curvature additionally requires $\delta^2\text{Rev}(z) \leq 0$ for all $z$.

Income-Conditional ETI Variance. The variance of compensated elasticities of taxable income (ETIs) among households with the same income level, $\text{var}_h[\varepsilon^h | z^h_0 = z]$. This is the paper’s primary empirical object of interest and the key determinant of whether revenues are convex or concave in the size of targeted reforms. Unlike the literature’s focus on mean ETIs by income bracket, this within-income variance captures heterogeneity among households sharing the same pre-reform income.

Sort-and-Extort Mechanism. The two-step economic mechanism underlying revenue convexity from ETI heterogeneity. In the first step (“sort”), a marginal tax cut around income $z$ disproportionately attracts higher-ETI households from lower incomes (because they respond more strongly and relocate from further away), shifting the elasticity composition at $z$ upward. In the second step (“extort”), repeating the reform finds higher-elasticity households concentrated where marginal taxes fall and lower-elasticity households where taxes rise, effectively applying differential tax treatment by elasticity within a single income tax schedule.

Local Pareto Parameter $\alpha(z)$. Defined as $-d\log(zg(z))/d\log z$, where $g(z)$ is the income density. This captures the rate at which the income density is falling in income locally at $z$, and governs the strength of the sort-and-extort mechanism. High $\alpha(z)$ at top incomes (reflecting a steeply declining Pareto-type density) amplifies revenue convexity from ETI heterogeneity.

Super-Elasticity. A concept that captures how a household’s compensated ETI would change if its income were different, holding preferences fixed. Formally, it is the derivative of the household’s elasticity with respect to its log income, decomposing into effects from changes in preference curvature and changes in the local curvature of the tax schedule. Super-elasticities are zero in the benchmark case of additively CES preferences and locally CES retention schedules but contribute additional terms to the second-order revenue expression in the general case.

Cardinalizing Function. A strictly increasing function $w_h$ that maps household $h$’s indirect utility $V_h$ to a cardinalized utility level $w_h(V_h)$. The social planner maximizes the expectation of cardinalized utilities. Different choices of ${w_h}_h$ correspond to different stances on interpersonal comparisons, including unbounded curvature (rationalizing any Pareto-efficient schedule) or bounded curvature (the paper’s proposed restriction). Rawlsian social welfare is a limit of utilitarian welfare with increasingly concave cardinalizing functions.

Endogenous Production Networks Under Supply Chain Uncertainty

Mon, 01 Jan 0001 00:00:00 +0000

This paper studies how firms’ optimal technique choices under productivity uncertainty endogenously shape the structure of production networks and aggregate macroeconomic outcomes. Each sector chooses input shares (production techniques) before observing sector-specific TFP realizations. Techniques are selected to maximize a risk-adjusted expected log GDP measure — expected log GDP minus a risk-aversion-scaled variance term — with endogenous productivity shifters that favor balanced use of inputs. When uncertainty about sector TFP rises, firms shift toward suppliers with lower expected productivity but lower variance — a “flight to safety” in input sourcing. The key aggregation result is that the contribution of each sector to aggregate welfare depends on its endogenous Domar weight (expenditure share times adjustment factor), which itself responds to changes in beliefs. The paper establishes propositions characterizing how Domar weights respond to changes in mean (μ) and variance (Σ) of TFP beliefs: higher mean raises a sector’s Domar weight; higher variance lowers it when inputs are gross substitutes, but can lower it even with complementary inputs through belief adjustment. A basic calibration to 37 US BEA sectors (1948–2020) finds that the flexible-network economy has expected log GDP 2.1% higher than a fixed-network alternative. During the Great Recession, elevated uncertainty caused firms to shift toward safer, lower-productivity suppliers, reducing expected log GDP by 0.25% but reducing GDP variance by 2.4% and improving actual realized GDP outcomes by 2.7% relative to a “no uncertainty” benchmark.

In depth

Q1. What is the model’s core setup and how does technique choice generate an endogenous network?

Each of n sectors chooses input shares αi = (αi0, αi1, …, αin) — where αi0 is the labor share — before observing sectoral TFP realizations εt, subject to a convex cost function Ai(αi) that penalizes deviation from fixed-proportion baseline techniques; the equilibrium network α is then the solution to a social planner’s problem that maximizes expected welfare W = E[y] − (ρ/2)V[y], where y is log GDP and ρ is the coefficient of relative risk aversion.* Because cost functions Ai are jointly determined by the input shares chosen and by Hessian matrices Hi that govern substitutability/complementarity of inputs, sectors can substitute or complement in the production of any given good, and the equilibrium network balances expected log GDP gains from choosing more productive suppliers against the variance reduction from choosing safer suppliers.

Q2. What role do Domar weights play, and how do they generalize to the endogenous-network case?

In the standard fixed-network case (Hulten’s theorem), each sector i’s contribution to aggregate log GDP equals its Domar weight ωi = (expenditure on sector i’s output)/(total GDP) — a sufficient statistic for first-order productivity effects. The paper extends this: with an endogenous network, the social planner’s optimality conditions imply that equilibrium Domar weights equal the shadow value of relaxing each sector’s resource constraint, and these shadow values respond to changes in beliefs (μ, Σ) through the induced changes in α.* Lemmas 3–5 characterize these responses: a sector’s Domar weight increases in its expected log TFP (μi), and changes in variance Σij propagate through the network via the adjustment terms in the first-order conditions, so that a single sector’s volatility change affects the Domar weights of all connected sectors.

Q3. What are the key propositions about how beliefs affect aggregate welfare and GDP?

Proposition 6 (monotone welfare response to mean beliefs): welfare W is increasing in each sector’s mean log TFP μi, and the marginal effect equals the sector’s Domar weight; this holds even though expected log GDP E[y] may non-monotonically respond to μi when inputs are gross substitutes, because the variance-reduction benefit of adjusting away from the now-more-productive but higher-variance sector can temporarily dominate. Proposition 7 (variance increases hurt expected log GDP): for substitutable inputs, a rise in Σii decreases E[y] because firms shift away from the more volatile sector toward less productive alternatives; for complementary inputs, the same shift also reduces E[y] because complementary inputs move together. Corollary 4 shows that welfare W always falls when uncertainty rises, combining these effects.

Q4. How does the flight-to-safety mechanism work in a multi-sector economy?

When uncertainty about sector i’s productivity rises, the optimal technique response is to reduce αji for all downstream sectors j that use sector i as an input substitute, and increase labor shares or shares in less volatile inputs; since sectors with lower μ but lower Σ become relatively more attractive on a risk-adjusted basis, the network reconfigures toward “safer” but typically less productive suppliers. The cascading link-destruction example (Section 7) illustrates this: when an industry’s production becomes uncertain, the endogenous deletion of risky links propagates across the network as complementary and substitute linkages amplify or dampen the flight to safety, with the direction depending on whether inputs are gross complements or substitutes in the Hessian Hi.

Q5. What does the calibration to US data find about the quantitative importance of the endogenous network?

The calibrated model with 37 BEA sectors (1948–2020) achieves a cross-sectional correlation between model and data Domar weights of 0.96 (though the model average Domar weight of 0.03 is below the data’s 0.05) and matches the data correlations Corr(ωjt, μjt) = 0.1 and Corr(ωjt, Σjjt) = −0.4 closely (model delivers 0.1 and −0.3 respectively). Comparing the flexible-network baseline to a fixed-network alternative, expected log GDP is 2.1% lower in the fixed-network economy, and welfare differs by a similar 2.1%. This suggests the endogenous reallocation of input shares over the sample period — as some sectors became persistently more productive — delivered substantial gains relative to a static network.

Q6. What happens during high-uncertainty episodes such as the Great Recession?

During the Great Recession (2007–2009), the estimated uncertainty measure Σt spiked sharply; firms responded by shifting techniques toward safer suppliers, resulting in expected log GDP that is about 0.25% lower in the baseline than in a “no uncertainty” (Σ = 0) economy and GDP variance that is about 2.4% lower. The insurance paid off in terms of realized outcomes: realized log GDP in the baseline economy was approximately 2.7% higher than in the “as-if Σ = 0” economy in 2009, because firms had taken out insurance against exactly the kind of bad TFP draws that materialized during the crisis. The perfect-foresight economy (where εt is known before technique choice) outperforms the baseline by up to 3% in realized GDP during the Great Recession — the maximum value of uncertainty resolution.

Q7. What is the key distinction between the effects of mean and variance changes for welfare vs. expected GDP?

Changes in mean beliefs μi and welfare W are co-monotone (Proposition 6), but changes in μi and expected log GDP E[y] can be non-monotone when inputs are substitutes: a small increase in μi for a less productive sector can actually lower E[y] in the short run because firms shift toward that sector at the expense of more productive alternatives, even though this shift reduces variance and raises welfare. The divergence between E[y] and W is the key mechanism: when ρ > 0 (risk-averse households), reducing variance has positive welfare value even when it lowers the level of expected GDP, so the production network adjusts in directions that appear sub-optimal for average productivity but are optimal for welfare. The calibrated relative risk aversion parameter ρ̂ = 4.3 indicates meaningful risk aversion that makes these variance-mean trade-offs quantitatively relevant.

Q8. What is the role of input complementarity versus substitutability in determining network responses?

When inputs i and j are gross substitutes (Hessian element [Hi]ij < 0), an increase in sector j’s uncertainty Σjj induces sectors that use both i and j to shift away from j and toward i, reducing j’s Domar weight and increasing i’s — the network becomes more concentrated in safer inputs. When inputs are gross complements ([Hi]ij > 0), an increase in Σjj also reduces the demand for the complementary input i, because both inputs must be used together and the safe input i becomes jointly less attractive when paired with volatile j; this can cause both E[y] and V[y] to fall simultaneously, resulting in an ambiguous welfare effect that depends on the magnitude of ρ relative to the E[y]-V[y] trade-off (Corollary 4 ensures welfare falls, but the split across E[y] and V[y] depends on complementarity structure).

Key concepts

technique choice : a sector’s endogenous selection of input shares αij prior to observing TFP realizations; the key margin of adjustment in the model through which uncertainty shapes the production network; characterized by convex cost functions Ai that favor balanced input use around baseline shares α°.

endogenous Domar weight : the share of total expenditure on a sector’s output in aggregate nominal GDP, computed in the model’s equilibrium; equals the shadow value of the sector’s resource constraint and responds to changes in beliefs (μ, Σ); in the fixed-network case reduces to the standard Hulten-theorem Domar weight.

flight to safety : the equilibrium response in which firms shift their input shares away from high-mean, high-variance suppliers toward lower-mean, lower-variance alternatives when aggregate uncertainty rises; generates the prediction that network restructuring during recessions reduces GDP volatility while raising expected production costs.

risk-adjusted expected welfare (W) : the social planner’s objective, defined as E[y] − (ρ/2)V[y] where y is log GDP and ρ is the coefficient of relative risk aversion; this non-separable objective function generates the trade-off between expected productivity and risk reduction that drives endogenous network formation.

cascading link destruction : the propagation of reduced sectoral linkages through the network when one sector’s uncertainty rises; in examples with complementary inputs, the reduced demand for a volatile sector also reduces demand for its complements, potentially amplifying the flight to safety beyond the directly affected sector.

Enlightenment Ideals and Belief in Progress in the Run-up to the Industrial Revolution

Mon, 01 Jan 0001 00:00:00 +0000

This paper tests Joel Mokyr’s claim that Britain’s industrialization was preceded and enabled by a cultural shift — specifically, that Enlightenment ideals produced a “progress-oriented” view of science that diffused to artisans and craftsmen. The central research question is whether and when the language of science became more progress-oriented in the build-up to the Industrial Revolution, and whether this shift was concentrated in volumes directly linked to industrial production.

The authors assemble 173,031 unique volumes printed in England and written in English between 1500 and 1900, drawn from the Hathitrust Digital Library. Because copyright law prohibits downloading full text, they use HDL’s Extracted-Features “bag of words” dataset. After removing duplicates and Latin-language volumes from an initial set of 420,081, they apply Latent Dirichlet Allocation (LDA) with cross-validated perplexity minimization to identify an optimal T=60 topics. Topic-pair co-occurrence analysis identifies three categories — science, religion, and political economy — each anchored by three defining topics. Volume-level category weights are derived by multiplying each topic’s weight by its category coefficient. The resulting classification yields 50,090 science volumes, 102,565 political economy volumes, and 14,124 religion volumes.

Progressive sentiment is measured using a seven-word dictionary (progress, improvement, stride, betterment, advance, rise, amelioration) assembled from thesaurus synonyms for “progress,” manually vetted by all four authors, and restricted to words attested in the Oxford English Dictionary before 1643 (Newton’s birth year). Sentiment for each volume equals the count of progress-dictionary words divided by total word count. An analogous optimism-sentiment placebo dictionary is constructed separately.

Industrial relevance is scored using the digitized indexes of all five volumes of Appleby’s Illustrated Handbook of Machinery (1877–1903); the top industrial root words are crane (weight 51), electr (42), weight (37), rope (27), and cost (27). Each volume receives an industry score equal to the weighted occurrence of industrial root words normalized by volume length.

Three main findings emerge. First, the language of science and religion showed little overlap beginning in the 17th century — that is, the secularization of science predates the onset of industrialization. Science volumes shifted from approximately 40 percent religious content around 1700 to only about 10 percent by 1850, with scientific content rising correspondingly from roughly 40 percent to over 60 percent. This trend was stable from 1650 through 1900.

Second, while scientific volumes became more progress-oriented during the Enlightenment, this progressive shift was concentrated in volumes at the nexus of science and political economy. Volumes of “pure” science were largely neutral with respect to progress sentiment, and those at the science-religion nexus had on average negative progress sentiment. The marginal effect of scientific content on progress sentiment was greatest for volumes mixing science and political economy, and most of the increase in predicted sentiment at that nexus occurred during the 18th century, remaining stable thereafter. A placebo test using optimism sentiment finds the opposite pattern: volumes at the science-political economy nexus were among the least optimistic, while the most optimistic language appeared at the religion-political economy nexus. This rules out the interpretation that the measured shift reflects a general increase in positive affect rather than specifically progress-oriented language.

Third, volumes employing industrial terminology that also sat at the science-political economy nexus were distinctively progressive beginning in the mid-18th century. At the 90th percentile of industry score, predicted progress sentiment at the science-political economy nexus was positive throughout the sample; at zero industry score, it was negative until the mid-18th century. Volumes at the religion-political economy nexus showed modestly positive and time-stable progress sentiment regardless of industry score.

The paper concludes that it was the pragmatic, applied volumes — those bridging science and political economy, written for artisans and a broader literate public rather than for the human-capital elite alone — that embodied the cultural values Mokyr identifies as central to Britain’s industrialization.

Q: What gap in the existing literature does this paper address?

A: Prior work on the cultural deep roots of economic growth rarely tracks how culture changes over time, relying instead on cross-sectional variation or qualitative case studies. Quantitative evidence that the language of science itself became more progress-oriented — and that this change reached beyond elite thinkers to artisans and craftsmen — had not been marshaled before. The paper provides inaugural quantitative support by analyzing 173,031 volumes spanning four centuries.

Q: Why does the paper restrict the progress-sentiment dictionary to words attested before 1643?

A: Words that entered English only after 1643 (Newton’s birth year) could not have appeared in volumes from the early Enlightenment, so including them would bias sentiment scores toward the later part of the sample. The restriction ensures the dictionary is applicable and unbiased across the full 1500–1900 period. The final retained words are: progress, improvement, stride, betterment, advance, rise, amelioration.

Q: How does LDA classify volumes, and how is T=60 selected?

A: LDA treats each volume as a bag of words and derives a Dirichlet distribution such that observed documents are generated by repeated topic sampling. The number of topics T is selected by minimizing perplexity on held-out data via 4-fold cross-validation, rotating training and test sets across folds; this procedure yields T=60 as optimal. Each volume is then represented as a mixture over those 60 topics.

Q: What are the three categories and their anchor topics?

A: Political Economy is anchored by topics on law/public opinion, governance/parliament, and trade/price/labour. Religion is anchored by topics on church/Christian doctrine, God/faith/sin, and virtue/fame/religion. Science is anchored by topics on engineering/steam/electricity, chemistry/acid/heat, and geometry/equations/trigonometry. These three sets of topics were selected for high corpus-wide importance and mutual independence.

Q: What does the finding on science-religion separation imply for timing?

A: The separation of scientific and religious language was already visible by 1600 and firmly established by the mid-17th century, well before the Industrial Revolution conventionally dated to the mid-18th century. This supports Mokyr’s argument that the secularization of science was an Enlightenment-era precursor to industrialization rather than a product of it. The trend remained stable from 1650 through 1900.

Q: How does the progressive sentiment differ between pure science and the science-political economy nexus?

A: Volumes of pure science were largely neutral with respect to progress-oriented language and in some periods showed slightly negative predicted progress sentiment. The science-religion nexus showed consistently negative progress sentiment. By contrast, volumes at the science-political economy nexus showed the highest level of progressive sentiment beginning in the mid-18th century, and most of this growth in predicted sentiment occurred during the 18th century, after which it remained stable.

Q: What does the placebo optimism test show?

A: The optimism sentiment scores are nearly the mirror opposite of the progress scores: the most optimistic language appears at the religion-political economy nexus, while volumes at the science-political economy nexus are among the least optimistic. This dissociation rules out the interpretation that the measured progress-sentiment rise reflects a general shift toward positive language rather than a specific cultural embrace of science as a tool for improving human welfare.

Q: How is the industrial score constructed and what are the most heavily weighted terms?

A: The authors digitized the detailed indexes of all five volumes of Appleby’s Illustrated Handbook of Machinery (1877–1903), restricted to words attested before 1643, and weighted each industrial root word by its index frequency. Each corpus volume’s industry score equals the sum of (word count × index weight) across all industrial words, normalized by volume length, yielding a score between 0 and 1. The top-weighted terms are crane (51), electr (42), weight (37), rope (27), and cost (27).

Q: What is the key result linking industrial scores to progressive sentiment?

A: At the science-political economy nexus, volumes with industry scores at the 90th percentile had persistently positive predicted progress sentiment throughout the sample, while volumes at that nexus with zero industry score had negative predicted sentiment until the mid-18th century. The shift to positive sentiment for high-industry volumes at this nexus occurred in the mid-18th century — roughly coinciding with the onset of Britain’s industrialization — and those volumes remained the most progress-oriented in the corpus thereafter.

Q: What is the paper’s interpretation of the science-political economy nexus finding in relation to Mokyr?

A: The authors interpret volumes at the science-political economy nexus as pragmatic, applied works aimed at a broader literate audience including artisans and craftsmen, not exclusively the human-capital elite. These are precisely the volumes Mokyr’s “Industrial Enlightenment” thesis predicts would carry progress-oriented cultural values into the mechanical and artisanal pursuits that drove industrialization. The finding that pure-science volumes were not especially progressive, while applied volumes bridging science and political economy were, is consistent with Mokyr’s argument that it was the diffusion of Enlightenment ideals to skilled practitioners — not just to elite scientists — that mattered.

Q: What qualitative examples support the quantitative findings?

A: Martin Clare’s The Motion of Fluids (1735) explicitly addresses “the Unlearned” and states in its preface that the work is meant to be “of singular Use and Benefit to Mankind” — a direct expression of the progress-oriented language the algorithm detects. George Stephenson’s 1831 railway report argues that rail infrastructure would allow Ireland to “reciprocate with England and with other nations, the products of industry,” exemplifying how progress-oriented language pervaded industrial writing by the early 19th century. These examples confirm that the high progress-sentiment scores for industrial volumes at the science-political economy nexus reflect genuine rhetorical content, not measurement artifacts.

Q: What are the paper’s limitations regarding early sample periods?

A: The corpus is thin in earlier eras, particularly around 1550, so results from the earliest decades must be interpreted with caution. The HDL data derive from digitized scans with OCR output of very old books, introducing errors such as the “long-S” misread (e.g., “juftice” for “justice”) that require manual correction. Additionally, the bag-of-words model discards word order, which may obscure some semantic distinctions.

Q: What future research directions do the authors identify?

A: The authors propose applying the same textual analysis techniques to test whether English-language volumes began reflecting greater freedom of expression in the run-up to Britain’s economic takeoff, connecting to the literature on European political fragmentation and the marketplace of ideas. They also suggest applying the approach to corpora in other languages — Dutch (following McCloskey’s argument about bourgeois values) and Spanish (to examine whether the Counter-Reformation and Spain’s economic lag are reflected in cultural attitudes toward progress and science).

LDA (Latent Dirichlet Allocation): An unsupervised generative statistical model that treats each document as a bag of words and extracts latent topics as multinomial distributions over vocabulary; used here to reduce 173,031 volumes to mixtures of 60 topics without imposing prior scholarly interpretations.

Progressive Sentiment Score: The fraction of words in a volume belonging to a seven-word dictionary of progress synonyms (progress, improvement, stride, betterment, advance, rise, amelioration), normalized by total word count; measures the cultural orientation toward the betterment of humankind as embedded in text.

Industrial Score: A volume-level measure equal to the weighted count of industrial root words — derived from the indexes of Appleby’s Illustrated Handbook of Machinery (1877–1903) — normalized by volume length; captures the degree to which a volume’s vocabulary overlaps with industrial production terminology.

Science-Political Economy Nexus: The region of the topic simplex where volumes carry substantial weight in both the science and political economy categories but low weight in religion; the paper finds this is where progress-oriented language was most concentrated from the mid-18th century onward, interpreted as applied science aimed at artisans and a broader literate public.

Industrial Enlightenment: Joel Mokyr’s (2009) concept describing the diffusion of Enlightenment ideals about the practical utility of science into the mechanical and artisanal pursuits that drove Britain’s industrialization; the paper provides quantitative support for this thesis by showing that industrial volumes at the science-political economy nexus were distinctively progress-oriented.

Culture of Growth: Mokyr’s (2016) broader argument that a pan-European network of elite intellectuals fostered a progress-oriented view of science — the idea that scientific understanding could improve the human condition — and that this cultural norm, in combination with Britain’s stock of skilled craftsmen, made industrialization possible.

Bag of Words: A representation of text that records only word frequencies within a document, discarding word order; used here both because HDL copyright restrictions prevent full-text download and because it is the input format required by LDA.

Equal Pay for Similar Work

Mon, 01 Jan 0001 00:00:00 +0000

Layer 1 — Overview

Research Question

This paper studies the labor market effects of “Equal Pay for Similar Work” (EPSW) policies — laws that require firms to pay equal wages to workers of different protected-class identities (e.g., different genders) who perform “similar” work within a firm. EPSW has become increasingly prevalent: as of January 2023, more of the U.S. workforce falls under state EPSW laws than state “Equal Pay for Equal Work” (EPEW) laws. Despite this spread, the equilibrium consequences of EPSW were previously unknown.

Theoretical Framework

The authors develop two theoretical models. The first is a static cooperative game (whose outcomes coincide with the Nash equilibria of a non-cooperative simultaneous-wage-offer game). Homogeneous firms with constant-returns-to-scale production compete for a continuum of heterogeneous workers. Workers belong to one of two groups A or B (e.g., men and women), with group A constituting a β ≥ 1 majority. Each worker’s productivity v is drawn from a group-specific distribution (FA or FB); firms’ willingness to pay equals each worker’s productivity, but can embed taste-based discrimination. The analysis is framed as applying “within job” in a local labor market — only workers performing “similar” work in the eyes of the law.

The second model is a dynamic search-and-bargaining framework with an arbitrary number of firms, search frictions, reallocation frictions, and Nash-in-Nash bargaining. EPSW is introduced as a surprise, and constrained firms choose whether to segregate for one group or remain desegregated (paying a common wage to all workers).

Main Theoretical Findings

Without EPSW, Bertrand competition among firms drives every worker’s wage to equal her productivity; any wage gap between groups A and B exactly reflects the difference in average productivities (EA(v) − EB(v)), whether or not those productivity differences stem from discrimination.

With EPSW, the equilibrium is qualitatively transformed. In the static model (Proposition 2), firms generically fully segregate their workforces: one firm hires all A-group workers and the other hires all B-group workers. EPSW functions as an enforcement mechanism for this segregation analogous to location choices in Hotelling’s model — poaching a worker from the competing firm is costly because EPSW then requires the poaching firm to pay equal wages to all workers it employs. In the core with EPSW (Proposition 3), the wage gap moves in favor of the majority group (A-group, β > 1) in the sense that all core outcomes except one strictly increase the A-group wage advantage. Moreover, firm profits and the magnitude of the wage gap co-move: firms benefit from selecting equilibria with larger wage gaps. The directional conclusion — EPSW benefits the majority group — holds regardless of the distributions of the two groups’ productivities, conditional only on β > 1 for the wage gap; for the log wage gap the additional regularity condition βEA[v] > EB[v] is required.

In the dynamic search model (Proposition 4), all firms eventually segregate under any equilibrium, with the long-run wage ratio moving in favor of the group toward which more firms segregate. Under equitable search and sufficiently low reallocation frictions (Proposition 5), more firms segregate toward the majority group when βEA[v] > EB[v]. Firms that are nearly segregated at the time of EPSW enactment segregate sooner than others (Proposition 6).

Empirical Setting and Design

The authors test these predictions using Chile’s 2009 EPSW (Law 20.348), the country’s first equal pay law, which prohibited paying women less than men (or vice versa) for similar work. Firms with 10 or more long-term workers at the time of announcement (June 2009) face formal grievance procedures and financial penalties (69–1,384 USD per worker-month of violation); firms below this threshold face no financial penalty, providing a clean threshold-based treatment assignment.

The data are matched employer-employee administrative records from the Chilean unemployment insurance system covering January 2005 – December 2013, a random sample of approximately 4% of all firms stratified by size. The main estimation sample restricts to firms with 6–13 total workers at announcement (41% of active firms), and the design is a difference-in-differences (event study) comparing treated (≥ 10 long-term workers) to control (< 10 long-term workers) firms. The identifying assumption is parallel trends between similarly sized firms.

Main Empirical Findings

First, EPSW increases full gender segregation across firms. The share of fully gender-segregated firms increases by 4.4 percentage points (baseline: 34.3% of firms were fully segregated at announcement). Simultaneously, the share of nearly-but-not-fully segregated firms (majority gender share ∈ [0.8, 1)) declines by 4.0 percentage points — a “missing mass” of near-segregated firms consistent with the search model’s prediction that firms on the margin of full segregation segregate most readily (e.g., by separating the sole worker of the “wrong” gender). Moreover, firms that are nearly segregated at announcement experience an 8.7 percentage point increase in full segregation post-EPSW, compared to 2.8 percentage points for firms not nearly segregated at announcement.

Second, EPSW shifts the gender wage gap in favor of the local labor market majority group. In male-majority local labor markets (defined by industry × county), EPSW increases the gender wage gap in favor of men by 4.3 percentage points. In female-majority local labor markets, EPSW decreases the gender wage gap (i.e., in favor of women) by 6.2 percentage points. The wage gap change is primarily driven by reductions in minority-group wages: women’s average wages in male-majority markets fall by 3.3 percentage points, and men’s average wages in female-majority markets fall by 4.5 percentage points; there are no statistically significant changes in majority-group wages. Because men dominate Chile’s overall labor market (approximately 5/6 of all workers are employed in majority-male local labor markets), the overall effect of EPSW is to increase the gender wage gap (in favor of men) by 2.7 percentage points. Pre-treatment coefficients are statistically indistinguishable from zero across all specifications, supporting the parallel trends assumption. These findings are robust across six alternative specifications covering different samples, fixed-effect structures, and controls.

Scope Conditions

Theoretical results apply within a set of “similar” workers in a given local labor market — the paper does not predict differential effects across job types within a firm (e.g., custodians vs. lawyers) that do not perform similar work. Empirical results are identified for firms with 6–13 workers and pertain to Chile’s formal sector (informal labor share ~25% in 2009). Predictions on the wage ratio (log wage gap) require the additional regularity condition βEA[v] > EB[v], which is consistent with the Chilean data.

In depth

Q1. What is the core mechanism by which EPSW leads firms to fully segregate in the static model?

A: EPSW makes cross-group poaching prohibitively costly. If a firm that hires only A-group workers were to hire even a positive measure of B-group workers, EPSW would — by transitivity — require it to pay the same wage to all workers. This eliminates the firm’s ability to exploit productivity heterogeneity across workers; it would have to raise all wages to match the highest worker, destroying profit. As a result, firms segregate in equilibrium to avoid the bite of EPSW entirely: each firm caters to one group, and the within-group wage schedule remains unconstrained. The mechanism is analogous to Hotelling’s location model: segregation serves as the enforcement device for avoiding the equal-pay constraint.

Q2. How does the equal profit condition generate a wage gap in favor of the majority group?

A: In any core outcome under EPSW (Proposition 3), the Equal Profit Condition requires both firms to earn the same total profit. When there are β > 1 A-group workers (more than B-group workers), the firm serving A-group workers must pay higher average wages per worker to extract the same total profit from a larger pool, relative to the firm serving a smaller B-group. This mechanically raises A-group average wages relative to B-group average wages. Crucially, this directional conclusion — EPSW widens the majority-group wage advantage — holds regardless of the shapes of FA and FB, meaning it is robust to any underlying discriminatory or non-discriminatory productivity differences.

Q3. What is the baseline (without-EPSW) wage gap, and how does EPSW change it?

A: Without EPSW, Proposition 1 establishes that every worker is paid exactly her productivity in any core outcome (full employment, wages = productivity). Therefore, the wage gap equals EA(v) − EB(v) and the wage ratio equals EA(v)/EB(v): any gap reflects only productivity differences (including discrimination embedded in willingness to pay). Under EPSW, Proposition 3 shows that all core outcomes except a single (measure-zero) one strictly widen the wage gap beyond this level. The wage ratio result (Proposition 3, Part 4) requires the additional condition βEA[v] > EB[v] — that the majority group is not sufficiently less productive or more discriminated against to reverse the direction.

Q4. How does the dynamic search model modify the static predictions?

A: In the dynamic model (Proposition 4), full segregation is achieved in finite time T in any equilibrium, not instantaneously. Prior to T, firms make sequential segregation decisions; workers displaced by firm desegregation choices are replaced at rate ρ ∈ [0,1]. The long-run wage ratio is determined by the ratio nA/nB — the number of firms segregating toward group A versus B. If nA > nB, the long-run wage ratio moves in favor of A; if nA = nB, the policy has no long-run effect on the wage ratio. The key departure from the static model is that this outcome depends not only on the majority group size but also on search intensities and reallocation frictions (high firm tenure/low d can make segregating toward the majority costly if the firm already employs many minority-group workers).

Q5. Under what conditions does the dynamic model predict that more firms segregate toward the majority group?

A: Proposition 5 states that for sufficiently large d (fast worker turnover / low reallocation frictions) and equitable search (equal search intensity across firms within a group), the number of firms segregating toward A satisfies nA ∈ [xA−1, xA+1], where xA is defined by an equal-profit condition. Moreover, if βEA[v] > EB[v] (the majority group is collectively more valuable), then nA ≥ nB. Without equitable search, the conclusion holds under more stringent conditions: for any search intensity vector r, there exist d* and β* such that for d > d* and β > β*, any equilibrium yields nA > nB. Empirically, 94% of local-labor-market-by-month units in Chile exhibit more firms segregating toward the majority gender post-EPSW, consistent with these conditions being met.

Q6. Why do firms that are nearly segregated at announcement respond most strongly to EPSW?

A: Proposition 6 establishes that firms with a low ratio of minority-group to majority-group search intensity (i.e., nearly segregated in employment) segregate earliest, provided the discount rate is sufficiently low. The intuition is that for a nearly segregated firm, the cost of segregating — separating the few minority-group workers — is small relative to the costs of remaining desegregated (paying a common wage that compresses profit, and being unable to poach new workers). Empirically, firms nearly segregated at announcement (majority gender share ∈ [0.8,1) at announcement) show an 8.7 percentage point increase in full segregation post-EPSW, roughly three times larger than the 2.8 percentage point effect for firms not nearly segregated at announcement. This “missing mass” pattern (decline in near-segregation matched by increase in full segregation) is also consistent with Proposition 6.

Q7. What is the heterogeneous effect of EPSW on the wage gap by local labor market type?

A: The empirical design allows the wage gap effect to differ by local labor market (LLM) majority type (male vs. female). In male-majority LLMs (firm industry × county pairs where males comprise more than 50% of workers in June 2009), EPSW increases the gender wage gap in favor of men by 4.3 percentage points (SE = 0.0116). In female-majority LLMs, EPSW decreases the gender wage gap (in favor of women) by 6.2 percentage points (SE = 0.0234). These findings precisely match the theoretical prediction that EPSW benefits whichever group is in the majority of the local labor market. The dynamic event studies show no pre-trends in either subsample; effects begin at announcement (τ = 0) and grow over time.

Q8. What drives the wage gap change — majority wages rising or minority wages falling?

A: The change is primarily driven by a reduction in the minority group’s average wages, not an increase in majority wages. Women’s average wages in male-majority labor markets fall by 3.29 percentage points (SE = 0.0111) in treated versus control firms post-EPSW. Men’s average wages in female-majority labor markets fall by 4.45 percentage points (SE = 0.0178) in treated versus control firms post-EPSW. There are no statistically significant changes in the average wages of the majority group of workers within any LLM type. This is consistent with the model’s mechanism: segregation reduces competition for minority-group workers (fewer firms competing for them), depressing their wages.

Q9. What is the aggregate (economy-wide) effect of EPSW on the gender wage gap in Chile?

A: Because approximately 5/6 of all Chilean workers are employed in male-majority local labor markets (men have higher labor force participation, with female labor force participation at roughly 30% in 2009), the overall effect of EPSW is to increase the gender wage gap in favor of men by 2.74 percentage points (SE = 0.0102). This is a net effect that averages the positive (pro-male) gap increase in male-majority markets and the negative (pro-female) gap decrease in female-majority markets, weighted by market sizes.

Q10. How does the identification strategy deal with anticipation and compositional changes?

A: Treatment status is assigned based on firm size at the time of policy announcement (June 2009) rather than enactment (November 2009), creating an intent-to-treat framework: some “treated” firms may fall below the threshold by enactment, and some “control” firms may rise above it, both attenuating the estimates (implying estimated effects are plausible lower bounds). The no-anticipation assumption is supported by the absence of statistically significant pre-trends in either the segregation or wage-gap specifications. To address compositional changes in worker characteristics across LLMs induced by EPSW itself, the wage regressions include time fixed effects interacted with human capital dimensions (education, contract type, age decade) and firm comparison groups, controlling for observable composition shifts. Placebo tests at alternative firm-size thresholds find no statistically or economically meaningful effects, supporting the causal interpretation.

Q11. How does EPSW in Chile compare to EPEW theoretically and in the literature?

A: EPEW requires equal pay only for workers doing exactly equal work, which creates an easily exploitable loophole: firms can proliferate job titles or marginally differentiate duties to avoid compliance. EPSW closes this by requiring equal pay across a coarser “similar work” category, making evasion harder. Theoretically, the prior EPEW literature (Bhaskar et al. 2002, Kaas 2009, Lagerlöf 2020, Lanning 2014) generated ambiguous directional predictions — equal pay laws could either increase or decrease wage disparities within the same paper. The authors attribute this ambiguity to EPEW models’ requirement that workers be exactly equally productive. By contrast, EPSW applies across workers with heterogeneous productivities, and the authors derive unambiguous predictions: full segregation and a wage gap shift toward the majority group, both of which are confirmed empirically.

Q12. What is the analogy to “best-price guarantees” in product markets?

A: The paper draws a methodological parallel to most-favored-customer (MFC) clauses in product markets. MFC clauses commit firms to rebating past consumers if prices fall, which directly equalizes payments across buyers but unintentionally raises firm market power. In the EPSW setting, the policy plays the role of a best-wage guarantee — but because firms compete for workers, the constraint binds off the equilibrium path. Firms segregate so that no firm is ever exposed to the equal-pay constraint in equilibrium, yet the threat of the constraint (if a firm deviates and hires from both groups) effectively differentiates labor costs across groups, driving the unintended wage effects. This is related to “artificial” switching costs that create local market power in consumer markets (Klemperer, 1987).

Key Concepts

Equal Pay for Similar Work (EPSW): A legal constraint requiring that within a firm, workers belonging to different protected-class identities (e.g., different genders) who perform “similar” work receive equal wages. Distinguished from “Equal Pay for Equal Work” (EPEW) by its coarser similarity standard, which cannot be evaded by minor job-title differentiation. In the model, this constraint is formalized as: a firm cannot hire positive measures of workers from two different groups such that all workers in one group receive strictly higher wages than all workers in the other group; by transitivity, a firm hiring from both groups must pay almost all workers the same wage.

Core Outcome: The solution concept used in the static model, drawing on cooperative game theory (Shapley–Shubik assignment game). An outcome (specifying which firm hires each worker and at what wage) is in the core if no firm and subset of workers can form a blocking coalition that makes both the firm and each worker in the coalition strictly better off. The paper uses this concept because its pure-strategy Nash equilibrium outcomes (in the associated non-cooperative simultaneous wage-offer game) exactly coincide with the core outcomes under the restriction that firms pay the same wage to all workers of the same type.

Full Segregation: A labor market outcome in which each firm employs workers from only one group (all A-group workers at one firm, all B-group workers at the other). The paper proves (Proposition 2) that EPSW generically forces full segregation in equilibrium, because any deviation to hire from both groups exposes the firm to the equal-pay constraint. Empirically measured as a binary indicator for whether all workers at a given firm in a given month are of the same gender.

Near Segregation: A firm-level state in which the majority gender constitutes 80–99% of the firm’s workforce (the majority gender share is in [0.8, 1)). The paper uses this as a complementary outcome to full segregation; theory (Proposition 6) predicts a decline in near segregation post-EPSW because firms in this state face the lowest cost of transitioning to full segregation. Empirically, the near-segregation share falls by 4.0 percentage points post-EPSW, mirroring the 4.4 percentage point rise in full segregation.

Local Labor Market (LLM): Defined in the empirical analysis as a firm’s geographic county interacted with its industry code, creating 321 × 21 potential cells. The LLM is classified as male-majority or female-majority based on the share of female workers across all firms in the industry-county pair in June 2009. This is the unit at which the “majority group” for Proposition 3’s wage gap prediction is defined, and the level at which the heterogeneous wage effects of EPSW are estimated.

Equal Profit Condition: A necessary condition of any core outcome (with or without EPSW): both firms must earn the same total profit in equilibrium. Under EPSW with full segregation, this condition determines the relative average wages of the two groups — because firm sizes differ (β A-group workers vs. 1 B-group worker), equal profit requires the firm serving the larger group to pay higher average wages, mechanically moving the wage gap in favor of the majority group.

Nash-in-Nash Bargaining: The bargaining protocol used in the dynamic search model, following Horn and Wolinsky (1988). Each bilateral worker-firm bargain splits the available surplus in proportion to exogenous bargaining power parameter Δ ∈ (0,1), taking as given the outcome of all other bilateral bargains. A worker’s disagreement point is the wage she would receive from bargaining with the next firm in her search order. This generates the result that a worker’s realized payoff is increasing in the number of segregated (non-EPSW-constrained) firms competing for her, connecting firm segregation decisions to wage determination.

Reallocation Friction: In the dynamic search model, represented by a low departure probability d ∈ (0,1) for existing employees. When d is low, firms retain a large fraction of their workforce across periods, making segregation costly because the firm must separate from any existing workers of the “wrong” group. The paper shows (Proposition 5) that for sufficiently large d (low frictions), the equal-profit condition approximately pins down the number of firms segregating toward each group, and for d above a threshold, the majority group attracts weakly more segregating firms.

Exorbitant Privilege Gained and Lost: Fiscal Implications

Mon, 01 Jan 0001 00:00:00 +0000

Layer 1: Overview

This paper studies three centuries of U.K. fiscal history to understand the fiscal implications of safe asset supplier status — what the authors call “exorbitant privilege” — and how it can be gained and lost. Using the discounted cash flow approach to fiscal capacity developed in Jiang, Lustig, Van Nieuwerburgh, and Xiaolan (2019), the paper measures the present discounted value of expected future primary surpluses (inclusive of convenience yield seigniorage) and compares it to the observed market value of outstanding government debt. The central finding is a sharp historical discontinuity: before World War I, when the U.K. was the world’s dominant safe asset supplier and its gilts served as the global reserve asset, roughly only three-quarters of U.K. debt was backed by future surpluses even after accounting for convenience yields earned from global safe asset demand. After World War II, when the U.K. lost its safe asset supplier status to the U.S., the U.K.’s debt became fully backed by surpluses and fiscal capacity became closely tied to its own macro fundamentals. By contrast, the U.S. after World War II shows a pattern similar to the pre-war U.K. but more extreme: less than one-third of outstanding U.S. Treasury debt is backed by future surpluses according to the paper’s estimates, with the gap between debt and estimated fiscal capacity growing sharply over recent decades.

In depth

Q1. How does the paper measure fiscal capacity?

The paper follows the Jiang-Lustig-Van Nieuwerburgh-Xiaolan (2019) methodology, expressing the market value of outstanding government debt as the present risk-adjusted discounted value of expected future primary surpluses under the government’s intertemporal budget constraint — the no-arbitrage condition that rules out rational debt bubbles. The market value of the government debt portfolio equals the present value of tax revenues minus the present value of government spending. A Vector AutoRegression (VAR) imposing cointegration of GDP with tax revenues and government spending is used to forecast the joint dynamics of the surplus. The paper uses the market or output risk premium as the discount rate, imputing the risk properties of GDP to spending and tax revenue claims.

A key methodological challenge is handling structural breaks: before World War I, U.K. fiscal policy was pre-Keynesian — acyclical spending and taxes (except during wars) — so spending and tax revenue as shares of output inherit the risk properties of output, and the market risk premium is the appropriate discount rate. After World War II, spending becomes counter-cyclical and taxes pro-cyclical in the Keynesian framework; the paper argues that applying the market risk premium in this regime produces an upper bound on the PDV of surpluses. For the U.K., this methodology is validated: the correlation of fiscal capacity with the debt/output ratio is 0.90 in the pre-WW-I sample and remains high after WW-II, despite the fiscal regime change.

Q2. What are the quantitative findings for the U.K.?

The paper finds that before World War I, U.K. fiscal capacity fell systematically short of the observed market value of U.K. debt: the average debt/GDP ratio was 87.06% while the estimated fiscal capacity was only 69.32%, with the ratio of fiscal capacity to debt averaging 74.32% — implying roughly 26% of U.K. debt was not backed by future surpluses even after including convenience yield seigniorage. The U.K. earned average long-term convenience yields of approximately 100 basis points per annum from 1873 to 1931 (translating to approximately 0.47% of GDP in annual seigniorage), reflecting its dominant position as the world’s safe asset supplier and the quasi-monopoly position of gilts in global securities markets (U.K. national debt accounted for more than half of the world’s traded securities around 1815). Despite these convenience yields, the gap between fiscal capacity and debt persisted throughout the 19th and early 20th century.

After World War II, the picture reverses: the U.K.’s average post-war fiscal capacity of 82.03% of GDP exceeds its average debt/GDP ratio of 53.42%, leaving more than 50% of fiscal capacity unborrowed. The correlation with debt dynamics persists but the sign changes — U.K. borrowing is now constrained by own macro fundamentals rather than extended by global coordination.

Q3. What do the authors find for the United States?

The U.S. experience mirrors the pre-war U.K. after World War II but with much larger magnitudes: the paper’s estimates indicate that less than one-third (32.20%) of post-war U.S. Treasury debt is backed by future surpluses, with the gap between fiscal capacity and debt growing sharply toward the end of the sample to exceed U.S. GDP. Before World War I, the U.S. did not earn convenience yields — it was forced to borrow at higher rates than the U.K. despite having lower debt-to-output ratios — and its fiscal capacity exceeded its debt, with the ratio of capacity to debt averaging 169.36%. After World War II, when the U.S. became the global safe asset supplier under the Bretton-Woods architecture, the relationship inverted: average U.S. fiscal capacity of 13.20% of GDP represents only 32.20% of outstanding debt. The gap is increasingly large in recent decades as U.S. debt has grown while surplus projections have not expanded commensurately.

Q4. Why does safe asset supplier status allow a country to borrow beyond its fiscal capacity?

The paper argues that global investors coordinate on a single safe asset issuer based on relative macro fundamentals; this coordination is self-reinforcing because each additional investor holding the asset reduces rollover risk and renders the debt safer for all others, creating strategic complementarities that concentrate global fiscal capacity in one country beyond what its own surpluses would warrant. Unlike domestic convenience yields (arising from household demand for safe assets to insure idiosyncratic risks), the global safe asset effect creates a form of extra-fiscal capacity that depends on investors’ common belief about which country is the hegemon. The measured seigniorage from convenience yields — about 0.47% of U.K. GDP before WW-I and 0.36% per year for the U.S. post-war — does not fully capture this coordination benefit; the remaining gap between fiscal capacity and debt reflects the additional “license to borrow” that comes with global hegemon status.

The transition from U.K. to U.S. hegemony illustrates the mechanism: as U.K. macro fundamentals deteriorated relative to U.S. fundamentals after the world wars, investors shifted the concentration of fiscal capacity toward the U.S. The U.K. lost its license to borrow beyond its fundamentals; the U.S. gained it. The paper notes that the U.K. debt/output ratio exceeded 200% after WW-II — a level associated with the loss of hegemony.

Q5. What historical data and institutional context does the analysis use?

The paper uses annual data for the U.K. from 1729 to 2020 (from the Bank of England’s Millennium of Macroeconomics dataset and the Ellison-Scott dataset on individual bond market values from 1694 onward) and for the U.S. from 1791 to 2020 (from Hall-Sargent and CRSP), constructing primary surpluses, tax revenues, spending, GDP, and convenience yields consistently over nearly three centuries. U.K. convenience yields before WW-I are measured as the interest rate differential between U.K. government securities and otherwise comparable bonds from countries on the gold standard; the sample average is approximately 147 basis points at the short end and 110 basis points at the long end, with the spread declining at longer maturities (the opposite of what default risk would predict), providing evidence that convenience yield rather than residual default risk drives the differential. U.S. post-war convenience yields are constructed from the spread between the 3-month Treasury yield and the 3-month CD rate (or bankers’ acceptance rate before 1964), averaging 36 basis points per year from 1947 to 2020.

Q6. What are the implications for models of fiscal capacity and debt sustainability?

The results favor models in which the safe asset supplier’s fiscal capacity is determined partly by relative macro fundamentals (which country the global financial system coordinates on) rather than solely by absolute fundamentals (its own surpluses), and challenge models that treat the transversality condition as a binding constraint at all times for all countries. The finding that a large fraction of U.S. Treasury debt is not backed by future surpluses — even when the market risk premium is used to discount — has implications for debt sustainability analyses: standard present-value-of-surpluses calculations will understate the true fiscal capacity of the safe asset supplier, while overstating it for others. The paper’s framework suggests this extra capacity depends on maintaining relative macro fundamentals and global investor coordination, and can be lost — as the U.K. experience demonstrates — when relative fundamentals deteriorate.

Key Concepts

fiscal capacity: the present risk-adjusted discounted value of a government’s expected future primary surpluses, computed from the government’s intertemporal budget constraint under no-arbitrage; in this paper, inclusive of seigniorage revenue from convenience yields earned on government debt.
exorbitant privilege: the ability of the safe asset supplier country to borrow at below-market interest rates due to global demand for its government debt as a safe asset; quantified in this paper as the gap between the market value of debt and estimated fiscal capacity from surpluses alone, which exceeds fiscal capacity for the pre-WW-I U.K. and post-WW-II U.S.
convenience yield: the yield reduction (below comparable risky borrowing rates) that the safe asset supplier earns from global safe asset demand; measured as approximately 100 bps long-term for the pre-WW-I U.K. and approximately 36 bps on average for the post-WW-II U.S., contributing 0.47% and 0.36% of GDP annually in seigniorage revenue respectively.
transversality condition (TVC): the condition ruling out rational government debt bubbles, requiring the expected discounted value of outstanding debt to approach zero at long horizons; the paper imposes the TVC and finds that for the pre-WW-I U.K. and post-WW-II U.S., the observed debt level exceeds fiscal capacity even under this constraint, interpreted as evidence of the extra borrowing license conferred by safe asset supplier status.

Expectation-driven term structure of equity and bond yields

Mon, 01 Jan 0001 00:00:00 +0000

Overview

Research Question. What drives the joint historical dynamics of the term structure of equity yields and nominal bond yields — and can a single unified equilibrium model explain the procyclical equity yield slope, the switch in bond-stock correlation from positive to negative after the late 1990s, the maturity-declining predictability of dividend strip returns, and standard aggregate stock market puzzles?

Key Departure from Prior Literature. Existing equilibrium models (habit formation, long-run risk, disaster risk) rely on time-varying risk premia to explain asset prices. Recent survey evidence challenges this: De La O and Myers (2021) show that most aggregate stock price movements are driven by cash-flow growth expectations rather than return expectations, and Van Binsbergen et al. (2013) show that equity yields are driven mainly by dividend growth expectations. This paper constructs an equilibrium model in which equity (bond) yield variation is attributable to subjective dividend growth (GDP growth) expectations, with a constant subjective risk premium implied by CRRA utility.

Model Architecture. The representative agent has CRRA utility with risk-aversion coefficient γ = 4 and subjective discount factor β = 1.0065 (calibrated to the average 10-year equity yield). The agent departs from rational expectations by having the “belief in the law of small numbers” (Tversky and Kahneman 1971): she perceives small samples to represent their population as well as large samples, leading to subjective learning gains that differ from the rational Kalman gain. The subjective belief updating rule is a modified Kalman filter in which the likelihood is exaggerated by factor (1+θ), producing a subjective learning gain ν that exceeds the Kalman gain K when overreaction applies and falls below it when underreaction applies.

The model has three blocks of fundamentals, each decomposed into a stable and a transitory component. (1) Real GDP growth is decomposed into PCE growth (stable, with a random-walk trend state µ_g) and a volatile gap component (stationary state x_g, persistence ρ_g = 0.941). (2) Inflation is decomposed into core inflation (stable, with trend state µ_π) and a volatile gap (persistence ρ_π = 0.932). (3) Real aggregate dividend is decomposed into a long-duration dividend component dl (levered on log real GDP with leverage λ = 3) and the share of long-duration dividend ds (stationary with persistence ρ_d = 0.94). This cross-sectional decomposition uses firm-level long-term earnings growth (LTG) forecasts from IBES as a model-free equity duration measure.

Estimation. State-space parameters are estimated by maximum likelihood with the Kalman filter on data from NYSE/NASDAQ/AMEX firms (CRSP/Compustat), quarterly, from 1987Q4 to 2019Q4. Subjective learning gains are estimated by minimizing RMSE between model-implied expectations and consensus forecasts: 1-year real GDP growth and inflation from the Survey of Professional Forecasters (SPF, 1981Q3–2019Q4), and 1-year aggregate dividend growth extended from De La O and Myers (2021) to 2019Q4. Equity yield data are from Giglio et al. (2021); bond yields are end-of-quarter zero-coupon nominal yields from Gürkaynak et al. (2007).

Main Findings.

Equity Term Structure Dynamics. The model’s subjective dividend growth expectations drive equity yields. The 1-year model-implied equity yield correlates 0.68 with data; the 10-year correlates 0.79; the 10Y–1Y slope correlates 0.59 with data. Consistent with “belief in the law of small numbers,” the agent overreacts to dividend news (estimated learning gains νl_d = 0.166 and νs_d = 0.458, both below their Kalman gains, which under the level-to-growth translation implies overreaction to dividend growth news, confirmed by negative CG(2015) regression slope coefficients of −0.69 at 1Y and −0.97 at 5Y).
Procyclical Equity Yield Slope. During recessions, the average equity yield slope (10Y–1Y) in the model is −3.77%; during expansions it is +3.96%, matching the data (−5.50% in recessions, +3.93% in expansions). The sign reversal is driven primarily by the dividend-specific component of the decomposition: in recessions, short-run dividend growth expectations fall much more sharply than long-run expectations.
Bond Pricing. The model’s 1-year and 10-year nominal bond yields achieve correlations of 0.92 and 0.95 with their data counterparts, inheriting the explanatory power of Zhao (2020) for the bond market. The agent underreacts to GDP growth and inflation news (estimated learning gains well below Kalman gains, confirmed by positive CG(2015) slope coefficients of +2.08 at 1Y for GDP growth and +1.01 at 1Y for inflation).
Bond-Stock Correlation Switch. In data, 10Y bond vs. dividend strip return correlation (5Y strip) goes from +0.46 before 2000 to −0.49 after 2000. The model produces +0.14 before and −0.56 after (for the 5Y strip). Decomposing the change in bond-stock return covariance: the “inflation real effect” (correlation between expected inflation and real growth) accounts for approximately 27–31% of total changes (for 5Y to 10Y strips); the “real growth correlation” channel — stronger co-movement between real GDP and real dividend growth expectations after 2000 — accounts for approximately 89–95% of total changes. The paper identifies this real bond hedging channel as the dominant and previously unexamined driver.
Dividend Strip Return Predictability. The price-dividend ratio predicts annual market excess returns with R² of 10.3% (data) vs. 9.0% (model). Strip return predictability is downward-sloping by maturity: in data, the R² is 20.2% for 5-year strips and 14.5% for 10-year strips; the model generates 14.2% and 10.4% respectively. This is decomposed into three sources: bond return predictability (small contribution), dividend forecast error predictability (dominant for short maturities), and forecast revision predictability (negative contribution that offsets). The downward slope occurs because current news has smaller impact on long-term dividend expectations.
Aggregate Market Puzzles. The model-implied log dividend-price ratio correlates 0.86 with data, with AR(1) coefficient 0.96 (data: 0.95). Model-implied average market return is 9% (data: 8%); annualized return volatility 12% (data: 16%). The model replicates the switch of the bond-stock aggregate return correlation from +0.13 before 2000 to −0.46 after 2000 (data: +0.39 to −0.64).

Scope Conditions. Results apply to U.S. equity and bond markets over 1987Q4–2019Q4 (with bond learning using data back to 1959Q1). The model assumes a representative agent with CRRA utility and constant subjective risk premium. It is silent on the term structure of expected returns in the statistical sense (which requires identification of latent states under the physical measure). The aggregate market results require a reduced-form specification for stochastic equity duration H_t linked to the value-weighted LTG average.

In depth

Q1. What is the core psychological mechanism generating subjective beliefs, and how does it differ from the diagnostic expectations approach?

The agent has the “belief in the law of small numbers” (Tversky and Kahneman 1971): she treats small samples as equally representative of their population as large samples. Formally, this is embedded by exaggerating the likelihood in the Bayesian update: p(x_t|I_t) ∝ p(y_t|x_t)^{1+θ} × p(x_t|I_{t-1}), where θ captures the magnitude of cognitive bias. The resulting subjective learning gain ν = (1+θ)P̃ / [(1+θ)P̃ + σ²_ε] can exceed the Kalman gain K when θ is large (overreaction) or fall below it when θ is small (underreaction). This differs from diagnostic expectations (Bordalo et al. 2019, 2020a,b), which are based on the representativeness heuristic; the paper notes the two notions of news are highly correlated in simulation (Table IA.2) and that both can imply overreaction.

Q2. Why does the model generate overreaction to dividend growth news even though the dividend-level learning gains are smaller than the Kalman gains?

The model separates dividend learning into level and growth. Section 2.2 derives that underreaction to dividend level news (νl_d < Kl_d, νs_d < Ks_d, estimated values 0.166 and 0.458 against Kalman gains 0.19 and 0.49 respectively) translates into overreaction to dividend growth news. This is confirmed by the CG(2015) rationality test: regressing forecast errors on lagged forecast revisions yields slope coefficients of −0.69 (1Y) and −0.97 (5Y) for real dividend growth, both statistically significant (t-statistics −3.63 and −3.22). In contrast, the same test yields positive slope coefficients for GDP growth (2.08 at 1Y) and inflation (1.01 at 1Y), confirming underreaction for these series.

Q3. How well does the model match subjective dividend growth expectations in the survey data?

The model-implied 1-year subjective dividend growth forecast is estimated by minimizing RMSE against the consensus dividend growth forecast series (extended from De La O and Myers 2021 to 2019Q4, with a replication correlation of 0.92 over the overlapping sample). The unconditional correlation between model-implied and data 1-year forecasts is 0.80. Although only 1-year forecasts are used in estimation, the model also achieves a correlation of 0.80 for 2-year forecasts, providing an out-of-sample validation.

Q4. What explains the higher volatility of short-term equity yields relative to long-term equity yields?

Short-term subjective dividend growth expectations are more volatile because the agent’s short-run expectation mean-reverts toward the less volatile long-run (levered) GDP growth expectation. In the model’s two-component dividend structure, the transitory dividend-share component xd has persistence ρ_d = 0.94 and its effect on equity yields decays as maturity increases (via the factor (1−ρ^n_d)/n). Similarly, the effect of the transitory GDP growth state x_g decays with maturity. Long-term equity yields are thus anchored by the slower-moving trend components µ_g and µ_d. In the data from Giglio et al. (2021), 1-year yields have a standard deviation of 8.89% annualized vs. 2.70% for 10-year yields; the model generates 8.22% and 1.89% respectively.

Q5. What is the quantitative importance of the “real growth correlation” channel vs. the “inflation real effect” channel in explaining the bond-stock correlation switch?

For the switch in bond-stock return correlation (using the 10-year nominal bond and various maturity dividend strips), the decomposition in Table 4 shows that the “real growth correlation” channel accounts for 89.1% (5Y strip), 92.1% (7Y strip), and 94.8% (10Y strip) of total bond-stock covariance changes, while the “inflation real effect” (correlation between expected inflation and expected real growth) accounts for 27.3%, 29.3%, and 31.1% respectively. The “volatility of shocks to expected inflation and real growth” makes a negative contribution (−16.4%, −21.4%, −25.9%), mostly attributable to more volatile beliefs during the 2008 global financial crisis. The real growth correlation channel reflects that after 2000, real bonds provide a better hedge to aggregate real dividend risks because real GDP growth expectations and real dividend growth expectations became more positively correlated.

Q6. Does the same real growth correlation story hold for the “Fed model” (bond-stock yield correlation)?

Yes, but with a quantitatively different balance. For yield correlations (Table 5), the “real growth correlation” channel accounts for 72.4%–80.1% of bond-stock yield covariance changes (5Y to 10Y strip), while the “inflation real effect” now accounts for 41.2%–43.9%. The inflation real effect is proportionally larger for yield levels because persistent expected inflation correlates strongly with the level of expected real GDP growth — even though inflation expectations do not move fast enough at high frequency to explain return correlation, they co-move strongly with expected growth at low frequency.

Q7. How does the model generate a downward-sloping term structure of return predictability?

The strip excess return is decomposed into three components (Equation 44): maturity-matched bond excess return (Bond), dividend forecast error within the holding period (FE), and forecast revision regarding dividend growth after the holding period (FR). For short maturities, bond predictability contributes little (R² ≈ 6.7% for 5Y strip), while FE predictability (R² ≈ 31.5%) and FR predictability (R² ≈ 35.6%) dominate. As maturity increases, the current news has smaller impact on long-term dividend expectations, reducing the predictability of FE (R² ≈ 26.6% for 10Y) and FR (R² ≈ 26.5% for 10Y). Taken together, total model-implied strip R² declines from 14.2% (5Y) to 10.4% (10Y), matching the data pattern (20.2% to 14.5%). The paper identifies forecast revision predictability as a new channel not previously documented.

Q8. Why do forecast errors and forecast revisions have opposite signs in the predictability regressions?

Bad news (high equity yields, i.e., low current stock prices) triggers excessively pessimistic subjective dividend growth expectations because the agent overreacts to dividend news. These overly pessimistic forecasts tend to be disappointed in the future — actual dividend realizations exceed the forecast — producing positive subsequent forecast errors (FE is positively predicted by high yields, with R² ≈ 31.5% for 5Y strips). However, as dividend levels mean-revert, higher subsequent realizations cause the agent to revise down the forecast for dividend growth thereafter, leading to negative forecast revisions (FR is negatively predicted by high yields, with R² ≈ 35.6% for 5Y strips, opposite sign from FE). The net effect on return predictability is thus a combination of positive (FE) and negative (FR) contributions.

Q9. How does the model handle the aggregate market dividend-price ratio and its persistence?

The aggregate stock price is modeled as the sum of dividend strip prices up to a stochastic horizon H_t, which is parameterized as a linear function of the value-weighted average of LTG forecasts: H_t = a + b·LTG_t. Parameters a and b are estimated by minimizing RMSE between model-implied and data log dividend-price ratio. The model-implied ratio achieves a correlation of 0.86 with data, an AR(1) coefficient of 0.96 (data: 0.95), and an annualized volatility of 26% (data: 30%). The time-variation is driven entirely by strip yield variations and exogenous LTG movements.

Q10. Is the overreaction to dividend news and underreaction to GDP/inflation news consistent in a single framework?

Yes. The model’s subjective learning framework (based on “belief in the law of small numbers”) generates both over- and underreaction depending on the estimated subjective learning gain relative to the Kalman gain. For GDP growth and inflation, the learning gains (ν*_g = 0.012, νgap_g = 0.065; ν*_π = 0.049, νgap_π = 0.228) are below their Kalman gains (0.29 and 0.67 for GDP components; 0.67 and 0.48 for inflation components), producing underreaction. The paper hypothesizes this is related to the Fed’s dual mandate: agents rationally assign lower weight to GDP and inflation shocks expecting the Fed will stabilize them. For dividend growth, a level-to-growth translation converts level underreaction into growth overreaction.

Q11. What are the robustness checks, and what do they show?

The paper checks three alternative equity duration measures: those from Dechow et al. (2004), Weber (2018), and Gonçalves (2021b), as well as the book-to-market ratio following Lettau and Wachter (2007). Table IA.1 shows that replacing LTG with these measures still produces model-implied equity yields that replicate key data moments with high time-series correlations. Changing the cross-sectional breakpoint for long-duration dividends from the median LTG to the 40th or 60th percentile leaves results similar. The paper also presents an Internet Appendix extension in which the agent has ambiguity about real GDP and dividend growth (model misspecification fear), yielding equity yields and returns even closer to data.

Q12. What is the paper’s contribution to the bond market relative to Zhao (2020)?

The bond pricing block closely follows Zhao (2020), inheriting its explanatory power for bond market stylized facts. The model’s 1-year and 10-year nominal bond yields achieve correlations of 0.92 and 0.95 with data, respectively. The new contribution is the joint model covering both equity and bond markets simultaneously, enabling the decomposition of bond-stock covariance and the identification of the real growth correlation as the dominant driver of the bond-stock correlation switch — a channel not addressed by Zhao (2020), which focused on bond market puzzles alone.

Key Concepts

Equity Yield (Dividend Strip Yield). Defined as ey^(n)_t = (1/n)(d$_t − p^(n)_t), where p^(n)_t is the log price of the n-period dividend strip (a claim to the nominal dividend n periods ahead) and d$_t is the log nominal aggregate dividend. It decomposes into the bond yield, a subjective dividend growth component, and a (constant) risk premium component.

Belief in the Law of Small Numbers. A cognitive bias (Tversky and Kahneman 1971) in which the agent perceives small samples to represent their population as well as large samples. Modeled by exaggerating the likelihood in Bayesian updating: p(x_t|I_t) ∝ p(y_t|x_t)^{1+θ} × p(x_t|I_{t-1}). This generates a subjective learning gain ν that can exceed the Kalman gain (overreaction) or fall below it (underreaction) depending on θ and the signal-to-noise ratio.

Subjective Learning Gain. The coefficient ν in the subjective Kalman filter update ẽ_t x_t = ρẽ_{t-1}x_{t-1} + ν(y_t − ρẽ_{t-1}x_{t-1}). It equals (1+θ)P̃ / [(1+θ)P̃ + σ²_ε], where P̃ is the subjective predictive variance. When ν > K (the rational Kalman gain), the agent overreacts to news; when ν < K, the agent underreacts.

Long-Duration Dividend Component. The portion of aggregate real dividend (dl_t) attributable to “long-duration” firms — those with above-median analyst LTG forecasts in CRSP/Compustat/IBES data. Levered on log real GDP with leverage parameter λ = 3, it carries aggregate risk. The complementary short-duration dividend share ds_t is stationary and carries no aggregate risk. The decomposition allows the model to exploit cross-sectional cash-flow duration information when learning about future aggregate dividend growth.

Real Growth Correlation Channel. A bond-stock covariance component defined as Cov(RGDP^(N), RDIV^(n)), where RGDP^(N) is the real GDP growth expectation component of 10-year nominal bond returns and RDIV^(n) is the real dividend growth expectation component of n-period strip returns. This channel captures whether real bonds hedge aggregate real dividend risks. The paper shows this channel accounts for approximately 89–95% of the post-2000 bond-stock covariance change for dividend strips.

Inflation Real Effect. The covariance component Cov(INFL^(N)_B, RGDP^(n) + RDIV^(n)), defined as the correlation between shocks to expected inflation (embedded in nominal bond returns) and shocks to expected real growth (in strip returns). In the paper’s framework this is distinct from the standard inflation risk premium story, as it concerns the correlation between subjective beliefs rather than realized covariances under the physical measure.

Forecast Error (FE) and Forecast Revision (FR) Predictability. Two of three components of realized strip excess return (Equation 44). FE = ∆d${t+1:t+h} − ẽ_t∆d${t+1:t+h} is the realized dividend growth forecast error within the holding period; FR = (ẽ_{t+h} − ẽ_t)∆d$_{t+h+1:t+n} is the forecast revision for dividend growth beyond the holding period. Because the agent overreacts to dividend news, bad news triggers overly pessimistic forecasts (positive subsequent FE) and, as dividends mean-revert, downward forecast revisions (negative FR). These two have opposite signs in predictive regressions, generating the downward-sloping term structure of return predictability.

Fed Model. The empirical positive correlation between equity yields (real) and nominal bond yield levels. The paper shows that this yield-level correlation switched from strongly positive (≈ 0.85 before 2000) to significantly negative (≈ −0.60 to −0.62 after 2000) for 5Y–10Y dividend strips, and that the same real growth correlation and inflation real effect decomposition applies, albeit with the inflation real effect proportionally larger (≈ 40%) for yield levels than for returns (≈ 30%) because persistent inflation expectations co-move with the level of expected real GDP growth.

Explicit consumption functions with borrowing constraints: A continuous-time approach

Mon, 01 Jan 0001 00:00:00 +0000

Layer 1 — Overview

Research question. The paper asks whether an explicit, global, closed-form solution exists for the consumption function in the standard income fluctuation problem with a borrowing constraint and constant income, a problem that has resisted closed-form solution since at least Schechtman (1976). All prior continuous-time work (Park 2006, Holm 2018, Fischer 2024) produced only implicit expressions; Achdou et al. (2022) produced explicit expressions valid only locally, near zero assets or as assets diverge to infinity, and only for r > 0.

Model. A single agent with CRRA utility (coefficient of relative risk aversion γ > 0) maximizes discounted utility over an infinite horizon, subject to the flow budget constraint da/dt = ra + y − c, with a borrowing constraint a(t) ≥ 0. The agent receives a constant, deterministic income stream y ≥ 0 and discounts at rate ρ, with the impatience condition ρ > r maintained throughout. The paper takes a continuous-time formulation arrived at by letting the discrete period length Δ → 0, nesting Helpman (1981)’s discrete-time analysis as a special case.

Key analytical device. A one-to-one mapping exists between initial assets a and the time T it takes for the consumer to fully run down her assets. This map, denoted T = h(a; y), is well-defined, strictly increasing, and concave in a (established in Proposition 1 via the Hadamard-Lévy theorem). Expressing the optimal consumption function as c*(a; y) = y · exp(ρh(a;y)/γ) evaluated at t = 0 reduces the problem to explicitly inverting the transcendental equation relating a to T.

Main result (r = 0). For the case of a zero net real interest rate, the transcendental equation can be solved explicitly using the second branch W₋₁(·) of the Lambert W function. The closed-form consumption function is (Theorem 2 and Corollary 2.1):

c*(a; y) = y · exp(ρ h(a;y) / γ), where h(a; y) = −(a/y + γ/ρ) − (γ/ρ) W₋₁(f(a;y)), and f(a;y) = −exp(−b(a + γy/ρ)/y), b := ρ/γ.

This is a global solution (valid for all a ≥ 0), in contrast to the local solutions in prior work. The paper notes that for the illustrative parameter values r = 0.01, γ = 0.5, ρ = 0.08, y = 3 (broadly consistent with average U.S. real interest rates in 2025), there is a visually sizable gap between the constrained and unconstrained consumption functions except as a → ∞, where the two converge (in line with the asymptotic linearity result of Benhabib et al. 2015).

Main result (r > 0). For positive interest rates, the Lambert W function cannot invert a sum of exponentials with different exponents (an open mathematical problem). The paper instead derives a global closed-form approximation valid for r ∼ 0, by expanding e^(−rT) ≈ 1 − rT to first order and applying the same Lambert W inversion. The approximating consumption function has the same structural form but with modified coefficients b_r, c_r, d_r that collapse to their r = 0 counterparts as r → 0 (Proposition 2). Numerical comparison against the implicit-expression solution of Park (2006) confirms the approximation is close for small r.

Characterization of the MPC and supermodularity (Section 3). Leveraging the explicit expression, the paper derives the full Jacobian vector and Hessian matrix of c*(a; y) in closed form (Propositions 3 and 4). Key findings, all proved formally and holding under the impatience condition ρ > r:

Consumption is increasing in both assets and permanent income (both entries of the Jacobian are strictly positive — Corollary 2.2). The second result (∂c*/∂y > 0 for all a) is new for the borrowing-constrained setting; Achdou et al. (2022) provided only suggestive evidence for the limiting case a ∼ 0.
Consumption is strictly concave in both assets and permanent income (both diagonal entries of the Hessian are strictly negative — Corollary 2.3). Concavity in assets was known (Carroll and Kimball 1996); concavity in permanent income under borrowing constraints is new.
The consumption function is supermodular: the cross-derivative ∂²c*/∂a∂y is strictly positive (Corollary 2.3). This means assets and permanent income are complements in generating consumption. Equivalently, the MPC out of permanent income is strictly increasing in the level of initial assets — a counter-intuitive result, since high MPCs are usually associated with poor (low-asset) agents. An identical result was obtained by Commault (2025) for a life-cycle model without borrowing constraints; the current paper confirms it holds in the presence of a borrowing constraint. By symmetry of the Hessian, the MPC out of assets is also strictly increasing in permanent income.

Intuition for supermodularity. When assets are low, an increase in permanent income produces little additional consumption because the risk of hitting the borrowing constraint is high. When assets are higher, the agent has buffer savings, faces a lower constraint-risk, and can smooth the higher future income stream into current consumption.

Scope conditions. Results are derived under CRRA utility, constant (deterministic) income, no stochastic variation, and the impatience condition ρ > r. The exact closed form applies to r = 0; the approximation is characterized as valid for r ∼ 0 and is not a local expansion in assets.

In depth

Q1. What is the longstanding gap in the literature that this paper addresses?

A: Since Zeldes (1989) noted that no closed-form solution exists for the consumption function with stochastic income and CRRA utility, researchers settled for numerical solutions or local analytical approximations. In the constant-income/borrowing-constraint version studied here, Park (2006), Holm (2018), and Fischer (2024) derived only implicit continuous-time expressions. Achdou et al. (2022) gave explicit local solutions valid near a ∼ 0 or a → ∞ under r > 0. No prior work produced an explicit, global closed-form for any case.

Q2. Why does moving to continuous time enable progress that discrete time did not?

A: In discrete time, the consumption function is piecewise linear (Helpman 1981), with kinks at the sequence of asset thresholds µ(T) for T = 0, Δ, 2Δ, …. As Δ → 0, the piecewise-linear function converges to a smooth function whose governing ODE can be solved analytically. This convergence to smoothness, illustrated in Figure 1, is what enables the application of the Lambert W function to invert the resulting transcendental equation.

Q3. What is the role of the Lambert W function, specifically its second branch W₋₁?

A: The optimal asset-depletion time T satisfies the transcendental equation e^(bT) = yT + c (for r = 0), which cannot be solved with elementary functions. Via the change of variables z := −bT − bc/y, the equation reduces to ze^z = α, whose solution is z = W(α). The argument α lies in (−1/e, 0) for a ∈ (0, +∞), and it is precisely on this interval that the Lambert W function is double-valued; the relevant branch is W₋₁ (the second, lower branch), which is well-defined and strictly less than −1 on (−1/e, 0). It is the properties of W₋₁ on this domain — specifically that 1 + W₋₁(α) < 0 — that drive the sign conclusions for the Hessian.

Q4. Why does the Lambert W approach fail for r > 0, and what is the approximation strategy?

A: For r > 0, Equation (8) contains two exponentials with different exponents — e^((ρ−r)T/γ) and e^(−rT) — and their sum cannot be inverted by the Lambert W function, which handles only a linear-plus-single-exponential structure. Inverting a sum of exponentials with different exponents is stated in the paper to be an open problem. The approximation strategy exploits the fact that for r ∼ 0, e^(−rT) ≈ 1 − rT + o(r), reducing the equation to a single-exponential transcendental form (Equation 15) with modified coefficients b_r, d_r, c_r, all of which converge to their r = 0 analogues as r → 0.

Q5. What does Proposition 1 establish, and why is it necessary before stating the main theorem?

A: Proposition 1 establishes that the mapping µ(T) from depletion time T to initial assets a is smooth (infinitely differentiable), bijective (one-to-one and onto) on ℝ₊, and strictly convex. The Hadamard-Lévy theorem then guarantees that its inverse h(a;y) = µ⁻¹(a) exists, is unique, is strictly increasing, and is strictly concave in a. This is a necessary prerequisite for Theorem 2 because h(a;y) is the central object in the closed-form consumption function; without establishing its existence and uniqueness, Theorem 2 would have no well-defined object.

Q6. What does the Jacobian characterization (Proposition 3 and Corollary 2.2) contribute?

A: Proposition 3 gives explicit formulas for ∂c*/∂a = (ρ/γ) · w/(1+w) and ∂c*/∂y in terms of w = W₋₁(f(a;y)). Corollary 2.2 proves both are strictly positive using the property w < −1 on (−1/e, 0), which ensures w/(1+w) > 0 and that the bracketed term in the expression for ∂c*/∂y is strictly positive. The contribution is that the positivity of ∂c*/∂y for all a was previously unproven in a borrowing-constrained setting with constant income.

Q7. What is the structure of the Hessian matrix and what signs do its entries take?

A: All four entries of Hc are proportional to w/(1+w)³. Since w < −1, we have 1 + w < 0, so (1+w)³ < 0, making w/(1+w)³ > 0. The diagonal elements ∂²c*/∂a² = −(ρ²/γ²y) · w/(1+w)³ and ∂²c*/∂y² = −(ρ²a²/γ²y³) · w/(1+w)³ are both strictly negative (concavity). The off-diagonal elements ∂²c*/∂a∂y = (aρ²/γ²y²) · w/(1+w)³ are strictly positive (supermodularity/complementarity).

Q8. What is the precise counter-intuitive implication of supermodularity for MPC heterogeneity?

A: Supermodularity (∂²c*/∂a∂y > 0) means the MPC out of permanent income — conventionally associated with low-wealth households — is in fact increasing in the level of initial assets. This contradicts the conventional narrative that high MPCs are a hallmark of poor agents. The paper’s intuition is that low-asset agents face high risk of hitting the constraint, suppressing their consumption response to income news, while high-asset agents can freely smooth the increased income stream. The same supermodularity implies, by the symmetry of the Hessian, that the MPC out of assets is also increasing in permanent income.

Q9. How does this result relate to Commault (2025)?

A: Commault (2025) proved, in a life-cycle model with a permanent/transitory stochastic income process but without borrowing constraints, that the MPC out of permanent income is increasing in assets. The current paper obtains the same qualitative finding in the opposite environment — constant income with a borrowing constraint. The paper treats these as complementary, noting that the result thus appears robust to these different modeling choices.

Q10. What does concavity in permanent income (∂²c*/∂y² < 0) add that was not previously known?

A: Carroll and Kimball (1996) established concavity of the consumption function in assets for a broad utility class. Concavity in permanent income — that the marginal consumption response to a windfall increase in y is diminishing — had been proved by Commault (2025) only in the absence of borrowing constraints. The current paper provides the first formal proof of this property in a setting with a borrowing constraint (albeit for constant, deterministic income and CRRA utility in continuous time).

Q11. What is the potential use of these closed-form results for numerical methods?

A: The paper notes in the conclusion that the closed-form solutions for r = 0 and the approximation for r ∼ 0 can serve as benchmarks for assessing the reliability of continuous-time numerical methods when computing objects such as the MPC out of assets. Because the exact solution is known analytically, numerical implementations can be compared against it to detect discretization errors or convergence failures.

Q12. What parameter values are used to illustrate the consumption function, and what do they imply?

A: The paper uses r = 0.01, γ = 0.5, ρ = 0.08, y = 3, where r = 0.01 is described as roughly in line with the average real interest rate in the U.S. in 2025. With these values, Figure 1 shows a visually sizable gap between the constrained and unconstrained consumption functions at low to moderate asset levels, with the two converging as a → ∞ as guaranteed by asymptotic linearity (Benhabib et al. 2015).

Key Concepts

Income fluctuation problem (with borrowing constraint): The standard infinite-horizon single-agent savings problem in which the agent faces a non-negativity constraint on assets (a(t) ≥ 0), so that the agent cannot borrow. In the paper’s formulation: maximize ∫ e^(−ρt)u(c(t))dt subject to da/dt = ra + y − c and a(t) ≥ 0, with constant income y and CRRA utility. The borrowing constraint creates the concavity of the consumption function and was the source of intractability in prior closed-form attempts.

Lambert W function (second branch W₋₁): A special transcendental function defined as the solution to we^w = x. It is double-valued on (−1/e, 0); the second branch W₋₁ takes values strictly less than −1 on this interval. In this paper, the transcendental equation linking initial assets to asset-depletion time is reduced to the form ze^z = α, enabling explicit inversion via W₋₁. The property that 1 + W₋₁(α) < 0 on (−1/e, 0) is the algebraic engine driving all sign results in the Hessian.

Asset-depletion time T = h(a; y): The time it takes for the optimal consumer to fully run down her initial assets before settling into perpetual income consumption of y. The paper establishes a bijective mapping from initial assets a to depletion time T (Proposition 1); the closed-form solution is obtained by explicitly inverting this mapping. In the paper’s formulation, h(a; y) = µ⁻¹(a) where µ(T) is derived from the ODE governing the consumption path.

Supermodularity of the consumption function: The property that the cross-derivative ∂²c*/∂a∂y is strictly positive, meaning assets a and permanent income y act as complements in generating consumption. This is an equilibrium property of the consumption function (not an assumption on the utility function), and the paper identifies it as new to the income fluctuation literature. It implies the MPC out of permanent income is increasing in a, and the MPC out of assets is increasing in y.

MPC out of permanent income (∂c/∂y):* The marginal increase in current consumption per unit increase in the constant income stream y, holding initial assets constant. This object is less studied than the MPC out of a transient asset windfall. In the paper’s setting, it is shown to be strictly positive for all a (Corollary 2.2) and, counter-intuitively, strictly increasing in a (supermodularity).

Global vs. local closed-form solution: A global solution holds for all values of the state variable (here, all a ≥ 0), while a local solution is valid only in the neighborhood of a particular value (e.g., a ∼ 0 or a → ∞). Achdou et al. (2022) produced local closed-form expressions; the current paper’s Theorem 2 (r = 0) is the first global explicit closed-form for this class of problems.

Piecewise-linear consumption function (discrete time): In Helpman (1981)’s discrete-time formulation with period length Δ = 1, the optimal consumption function is piecewise linear in assets, with slope changes at the asset thresholds µ(T) for integer T. As Δ → 0, this becomes a smooth function, enabling the passage to the continuous-time closed form derived in the paper.

Failing Banks

Mon, 01 Jan 0001 00:00:00 +0000

Correia, Luck, and Verner ask a foundational question in banking: why do banks fail? Specifically, they seek to adjudicate between two theoretical views — the solvency view (failures caused by deteriorating asset quality and insolvency) and the bank runs view (failures caused by depositor coordination failure that can bring down otherwise solvent banks) — using the longest micro-level panel of U.S. commercial bank balance sheets assembled to date.

The authors construct a panel covering approximately 37,000 distinct banks across two samples: a historical sample of all national banks from 1863 to 1941 (sourced from OCC Annual Reports, digitized via OCR) and a modern sample of all commercial banks from 1959 to 2024 (from FFIEC Call Reports merged with the FDIC failure list). More than 5,000 banks fail across the full sample, with 2,887 failures before 1935 and 2,233 after 1959. The sample spans institutional regimes before and after the Federal Reserve (founded 1913) and the FDIC (founded 1933/1934).

Three sets of findings emerge. First, failing banks are characterized by deteriorating fundamentals well before failure: rising non-performing loans and declining solvency (equity-to-assets falls by 8 percentage points in the five years before failure in the modern sample), increasing reliance on expensive noncore funding (rising by 18% of assets in the decade before modern-era failures), and a boom-bust pattern in real assets (expanding by 34% from ten years to three years before failure before contracting). These patterns are consistent across the pre-FDIC and modern eras.

Second, bank failures are highly predictable from publicly available accounting data. Using simple regression models with insolvency risk, noncore funding reliance, and asset growth as predictors, the area under the ROC curve (AUC) for predicting failure within one year reaches 86% in the historical sample and 90–95% in the modern sample. Pseudo-out-of-sample performance is nearly as strong as in-sample performance. A bank in the top 5th percentile of both insolvency risk and noncore funding vulnerability faces a three-year failure probability of 27% in both the historical and modern samples, compared to unconditional rates of 2.5% (historical) and 1% (modern) — a 10- to 25-fold increase.

Third, while large deposit outflows consistent with bank runs were common in pre-FDIC failures — deposits declined on average by 14% immediately before failure in 1880–1934, and by 21% in the period before the banking holiday — failures with runs are as predictable as failures without runs, and they occur in banks with similarly weak fundamentals. Recovery rates on failed banks’ assets averaged only 52% of book value in pre-FDIC failures. Using a framework comparing recovery rates to leverage, the majority of pre-FDIC failed banks appear to have been fundamentally insolvent. Even under the extreme assumption of zero value destruction from failure, runs on banks that were not fundamentally insolvent account for fewer than 8% of pre-FDIC failures; under an assumption of 20% value destruction from failure, this share rises to 22%.

OCC bank examiners classified fewer than 2% of pre-FDIC failures as caused by runs or liquidity issues; most were attributed to losses, fraud, or external shocks. The aggregate failure rate is also largely predictable: regressing the actual bank failure rate on predicted aggregate failure risk yields an R-squared of 40%.

Scope conditions: the historical sample covers only national banks (market share ranging from ~80% in the 1870s to ~45% in the 1930s); the modern sample excludes de novo banks (younger than three years); deposit outflow data for the historical period begin in 1880; and FDIC failure transaction data for the modern period begin in 1993.

Q: What are the two main theoretical views the paper evaluates, and how does the paper distinguish between them? A: The solvency view holds that bank failures are caused by deteriorating asset quality and insolvency, with the runnable nature of liabilities playing no essential causal role. The bank runs view holds that the runnable nature of demandable deposits is central, with depositor coordination failure capable of bringing down otherwise solvent banks (Diamond and Dybvig, 1983) or weak-but-solvent banks (Goldstein and Pauzner, 2005). The paper distinguishes between them using three empirical tests: predictability of failures from fundamentals, deposit outflows before failure, and asset recovery rates in failure.

Q: How predictable are bank failures, and what does predictability imply for the bank runs view? A: In the historical pre-FDIC sample (1863–1934), the in-sample AUC for predicting failure within one year is 86%; in the modern sample (1959–2024) it is 90–95%. Pseudo-out-of-sample AUC is nearly as strong as in-sample AUC. High predictability is consistent with the solvency view and fundamental-based panic run models, but is inconsistent with non-fundamental self-fulfilling runs (Diamond and Dybvig, 1983), which should strike randomly. Predictability also cuts against the assumption of rational, forward-looking depositors in fundamental-run models, since attentive depositors would act on observable signals and accelerate failure, reducing predictability.

Q: What is the boom-bust pattern in failing banks’ assets? A: In the decade before failure, failing banks’ real total assets expand by 34% from ten years to three years before failure, then contract over the final two years. The boom-and-bust pattern is present in both the historical and modern samples but is more pronounced in the modern period. The boom is driven primarily by loan growth (particularly real estate lending and C&I lending in the modern sample) rather than by growth in liquid assets, consistent with the view that rapid credit expansion produces future credit losses.

Q: How does noncore funding behave in failing banks, and why does it matter? A: In failing banks in the modern sample, noncore funding (time deposits plus wholesale funding) rises by 18% of assets over the decade before failure, while demand deposits decline as a share of assets. In the historical sample, noncore (wholesale) funding also rises gradually. Noncore funding is a signal of failure for multiple reasons: it is more expensive than core deposits, eroding profitability; it can finance risky asset growth; it reflects realized losses being funded at the margin; and it increases funding fragility, making banks more vulnerable to shocks.

Q: How strong is the joint signal from insolvency and noncore funding? A: A bank in the top 5th percentile of both insolvency risk and noncore funding vulnerability faces a three-year failure probability of 27% in the historical sample and 27% in the modern sample. The unconditional three-year failure probability is 2.5% in the historical sample and 1% in the modern sample. This amounts to a 10- to 20-fold increase in failure probability, illustrating that the combination of solvency and funding weakness is a powerful joint predictor.

Q: Were deposit outflows common before the FDIC, and did they decline after its introduction? A: In the 1880–1934 historical sample, deposits in failing banks declined on average by 14% between the last call report and failure, with 25% of pre-FDIC failures preceded by outflows exceeding 20%; during the period before the banking holiday the average deposit decline was 21%. In contrast, in the modern sample (1993–2024), average pre-failure deposit outflows were only 2.5%, and outflows exceeding 20% occurred in only 3% of failures, consistent with deposit insurance insulating most depositors.

Q: Are failures with large deposit outflows (runs) less connected to weak fundamentals than other failures? A: No. The paper finds that failures with large deposit outflows are as predictable as failures without large deposit outflows. The relationship between insolvency risk or noncore funding and three-year failure probability is similar for failures with and without large deposit outflows. This implies that runs did not disproportionately strike banks with otherwise strong fundamentals.

Q: What do asset recovery rates reveal about the insolvency status of pre-FDIC failed banks? A: Recovery rates on pre-FDIC failed banks averaged 52% of book value of assets. Under the extreme assumption that receivership destroys zero bank value, runs on non-fundamentally-insolvent (weak but solvent) banks account for fewer than 8% of pre-FDIC failures. Under the equally extreme assumption that failure destroys 20% of bank value, this share rises to 22%. The majority of pre-FDIC failed banks therefore appear to have been fundamentally insolvent.

Q: What did contemporary OCC bank examiners attribute as the causes of bank failures? A: OCC bank examiners classified most pre-FDIC failures as caused by losses, fraud, or external economic shocks. Runs and liquidity issues together account for fewer than 2% of OCC-classified failures, notwithstanding the common occurrence of large deposit outflows before many of these failures. This examiner evidence supports the solvency view.

Q: Can bank-level fundamentals predict systemic banking crises and aggregate failure waves? A: Yes. The authors aggregate out-of-sample predicted failure probabilities to construct a predicted aggregate bank failure rate. The R-squared from regressing the actual aggregate bank failure rate on this predicted rate is 40%, indicating that spikes in bank failures during systemic crises are substantially accounted for by the prior deterioration of bank-level fundamentals.

Q: Why is predictability higher in the modern sample than in the historical sample? A: The authors identify several reasons. Accounting data quality is higher in the modern sample. Historical national banks operated as unit branches with less geographic diversification, making idiosyncratic shocks more important and harder to predict. Modern-era failures are preceded by larger lending booms that produce more predictable downstream losses. Additionally, in the modern context bank failures are largely supervisory decisions, and frictions in the supervisory process may delay closure and thereby increase predictability.

Q: What role do the authors assign to depositor inattention? A: The high predictability of failures combined with the finding that many failing banks had high predicted failure probabilities before actually failing suggests that depositors were often slow to react to observable signals of bank weakness. The authors note this points to behavioral frictions such as neglect of downside risk (Gennaioli et al., 2012) and sleepy or inattentive depositors (Hanson et al., 2015; Jiang et al., 2023), rather than the rational, forward-looking depositor assumption embedded in standard bank run models.

Q: What is the paper’s overall interpretive conclusion about the relative importance of solvency versus runs? A: The primary cause of bank failures is almost always and everywhere a deterioration of bank solvency. Runs were more common in the historical pre-FDIC data as a mechanism triggering failure, but they typically closed banks that were already fundamentally insolvent. Non-fundamental, self-fulfilling runs on otherwise healthy banks appear to be an uncommon cause of bank failures. Under the solvency view, even when runs occur, they are the trigger and final mechanism rather than the root cause.

Insolvency risk: A bank’s proximity to default, proxied in the historical sample by surplus profits relative to equity (capturing profitability and capitalization) and in the modern sample by net income to assets. High insolvency risk reflects declining profitability and eroding capital buffers.

Noncore funding: Expensive, risk-sensitive funding sources outside core demand deposits, including time deposits, wholesale funding (bills payable, rediscounts), and non-deposit wholesale borrowings. Banks relying heavily on noncore funding face higher funding costs, reduced profitability, and greater fragility to funding shocks.

Fundamental run: A run triggered when bank fundamentals are so weak (theta at or below the lower threshold in the Goldstein-Pauzner framework) that all depositors have an incentive to withdraw regardless of others’ actions — the bank is effectively insolvent and failure is inevitable.

Panic-based run: A run triggered when bank fundamentals are moderately weak (below the threshold equilibrium in Goldstein-Pauzner) but the bank would have been able to pay all creditors absent the run; the run itself destroys value and causes failure.

Non-fundamental (self-fulfilling) run: A run on an otherwise solvent bank driven purely by depositor coordination failure, as in Diamond and Dybvig (1983); failure arises from one of two equilibria and is not predicted by fundamentals.

Recovery rate: Funds ultimately collected by the receiver throughout receivership proceedings divided by the book value of assets at suspension; used as a proxy for the degree of fundamental insolvency at failure. Pre-FDIC recovery rates averaged 52% of book value.

Area Under the ROC Curve (AUC): A measure of binary classification performance used to quantify the predictability of bank failures; an uninformative predictor has AUC of 0.5, while AUC of 1.0 indicates perfect classification. In this paper, AUC ranges from 86% (historical, one-year horizon) to 95% (modern).

Boom-bust pattern: The systematic tendency of failing banks to experience rapid loan-driven asset growth in the years preceding failure followed by asset contraction in the final two years before failure — present in both the historical and modern samples, more pronounced in the latter, with real assets expanding by 34% from ten to three years before failure.

Financial Frictions: Micro versus Macro Volatility

Mon, 01 Jan 0001 00:00:00 +0000

Overview

Research Question. How do consumer credit spreads — the gap between household borrowing rates and deposit rates — affect aggregate business cycle dynamics and the distribution of consumption across the wealth distribution? And what is the welfare trade-off between macroeconomic stabilization and household-level consumption volatility when bank capital requirements are tightened?

Data and Empirical Approach. The empirical analysis draws on Danish administrative register data for 2003–2018, combining approximately 15.5 million household-year observations. Income tax return data, which capture housing wealth, portfolio wealth, bank deposits, and bank and mortgage debt, are merged with bank-level reporting of interest rates submitted to Danmarks Nationalbank (MFI data). Household-specific credit spreads are constructed as the difference between the loan rate at a household’s primary loan bank and the deposit rate at its primary deposit bank in a given year. Consumption is imputed from household balance sheets following the method of Crawley and Kuchler (2023). The empirical specifications include household and time fixed effects, and quantile regressions are run across bins of the net wealth distribution.

Model. The authors develop a Heterogeneous Agent New Keynesian (HANK) model with explicit banking intermediation. Banks, subject to an agency friction following Gertler and Karadi (2011) — in which bankers can divert a fraction λ = 0.381 of assets — combine household deposits with net worth to invest in corporate equity and consumer loans. This leverage constraint generates an endogenous, countercyclical spread between borrowing and saving rates. Households face idiosyncratic income risk and a kink in their budget constraint at zero net worth due to the spread. The supply side features New Keynesian sticky prices (Rotemberg quadratic adjustment costs) and a Taylor rule. Aggregate shocks include monetary policy surprises, total factor productivity (TFP), and capital quality shocks (affecting bank net worth). The model is solved by first-order perturbation using the method of Bayer and Luetticke (2020) and calibrated to Danish macro and micro moments for 2003–2018.

Main Empirical Findings.

The average consumer credit spread in Denmark is strongly countercyclical, with a cross-correlation with HP-filtered output of −0.44 in the data (−0.31 in the model).
Higher credit spreads increase the transition rate into the zero net wealth state for households with moderately positive wealth at the beginning of the year, and reduce the outflow rate for households already at zero net wealth.
Pooled OLS (with household and time fixed effects) finds that a higher spread is negatively associated with consumption (coefficient −0.266), and the interaction between spread and log income is positive (coefficient 1.366), indicating that higher spreads raise income sensitivity of consumption. For below-median wealth households, the income–consumption link is stronger and the negative spread effect on consumption is larger.
The consumption-income elasticity derived from quantile regression estimates has a standard deviation of 2.4 percent and a cross-correlation with output of −0.53 when spread variation is incorporated; holding spreads constant roughly halves the volatility (to 1.3 percent) and reduces the countercyclicality (cross-correlation −0.31).

Model Aggregate Findings.

Consumer credit is procyclical (cross-correlation with output 0.56 in data, 0.67 in model) and more than twice as volatile as output (standard deviation ratio 2.11 in data, 1.51 in model).
Capital quality shocks and monetary policy shocks are amplified at the aggregate level through a financial accelerator working through endogenous spread movements. TFP shocks generate little spread amplification because households’ labor supply responses partially insulate banks’ net worth.
A 1 percentage point contractionary monetary policy shock leads to a sharp, persistent decline in aggregate output and investment, and is amplified relative to a constant-spread HANK benchmark.

Distributional Findings.

In response to a contractionary monetary policy shock, consumption of households at the 10th percentile of the consumption distribution (who are indebted) falls sharply in the short run, while consumption of the 90th percentile (wealthy households) rises in the short run due to higher returns on savings. The responses converge across the distribution in the medium run as spreads normalize.
When the consumer credit spread is held constant, consumption paths move in parallel across the wealth distribution, demonstrating that endogenous spread movements are the key driver of distributional effects for monetary policy and capital quality shocks.
The MPC is countercyclical in the model, with a cross-correlation with output of −0.60 (unconditional), compared with −0.53 for the empirically-estimated consumption-income elasticity. The consumption-income elasticity and MPC are correlated at 90 percent in the model at the annual rate.

Macroprudential Regulation.

A tightening of bank capital requirements reducing leverage by 10 percent (diversion parameter λ rising from 0.381 to 0.445) reduces output volatility by 5.5 percent and investment volatility by 10.1 percent, and does so at apparently no long-run aggregate cost in the HANK setting (precautionary savings stimulate output and consumption in the stationary equilibrium).
However, the regulation increases the annual consumer credit spread by 40 basis points, raises household consumption volatility across the wealth distribution (from about 8 percent to 10 percent for the poorest households under idiosyncratic shocks alone), and generates welfare losses across all deciles equivalent to 0.24–4.28 percent of consumption (with aggregate welfare loss of 0.79 percent).
When aggregate shocks are included, the lower cyclical sensitivity of spreads partially mitigates welfare losses for the poorest 80 percent of the population, but the overall welfare effect remains negative with an aggregate loss equivalent to 0.58 percent of consumption. The paper thus documents a trade-off between macro volatility (stabilized) and micro volatility (increased).
Results are robust to the extension of the model to three assets (including illiquid assets), which provides a better fit to micro data without materially changing the welfare conclusions.

In depth

Q1. What is the specific Danish dataset used, and how is consumption constructed?

A: The dataset covers 2003–2018 from Statistics Denmark administrative registers, combining income tax return data (which report end-of-year balances on all bank accounts, housing wealth, portfolio wealth, bank deposits, bank loans, and mortgage debt) with bank-level MFI interest rate reporting submitted to Danmarks Nationalbank. The total sample is approximately 15.5 million household-year observations (about 1.76–1.97 million households per year). Consumption is imputed as after-tax labor income plus after-tax financial income minus the change in end-of-year net worth, following Crawley and Kuchler (2023). Households with self-employment, housing transactions in the current or prior year, negative imputed consumption, or in the bottom and top 1 percent of wealth or income distributions are excluded.

Q2. How are household-specific credit spreads constructed from the administrative data?

A: Each household’s primary loan bank is defined as the bank where it holds the largest loan balance at end of calendar year, and the primary deposit bank as the one holding the largest deposit balance. The household-specific spread is the difference between the loan rate applied by the primary loan bank and the deposit rate applied by the primary deposit bank, both measured as averages over the calendar year. If a household has no loans, the loan rate of the primary deposit bank is used. This construction yields a household-level interest rate spread that moves countercyclically at the aggregate level (cross-correlation with HP-filtered output of −0.44).

Q3. What do the empirical results say about the relationship between spreads and the probability of a household reaching zero net wealth?

A: Equation (2) is estimated as a linear probability model for the transition to zero net wealth (defined as net assets within plus or minus two weeks of 2007 median weekly income). Higher spreads significantly increase the transition rate into zero net wealth for households with moderately positive net wealth at the beginning of the year (those in the third to sixth net wealth bins), and reduce the outflow rate from zero net wealth for households already in that state. Higher spreads also appear to increase debt repayments for indebted households (third to fifth bins), making it more difficult for them to accumulate wealth. Households at the extremes of the wealth distribution (very poor or very wealthy) show essentially no sensitivity of transition rates to spread movements.

Q4. What do the consumption regressions in Table 1 find, and what is the key identification caveat?

A: The pooled regression (column 1) finds a positive income–consumption coefficient of 0.372, a negative spread coefficient of −0.266, and a positive income–spread interaction of 1.366, all statistically significant with standard errors clustered at the household level (15,610,327 observations, R² = 0.591). When interacted with below-median wealth (column 2), the income coefficient is larger (0.397 versus 0.335 for above-median), the spread effect is more negative for below-median wealth (−0.362 versus −0.101 for above-median), and the income–spread interaction is stronger for below-median wealth (1.640 versus 0.875). The authors explicitly note that these results should not be given a causal interpretation, as income and consumption are likely jointly determined. Institutional features of the Danish mortgage market (covered bonds, competitive market, rates independent of borrower credit situation) minimize confounding from mortgage rate correlation with consumer credit spreads.

Q5. How do the quantile regression results and the derived consumption-income elasticity demonstrate countercyclical MPC?

A: Quantile regressions across five-percent bins of the net wealth distribution show that income coefficients decline with wealth (from nearly 0.5 for the poorest to about 0.35 for the wealthiest households), spread coefficients are negative for households with negative, zero, and moderately positive wealth and positive for significantly wealthy households, and the income–spread interaction term is positive for all but the richest households (largest near zero net wealth). The consumption-income elasticity is computed as β₀,ⱼ + β₂,ⱼ × spread at the household level, then averaged cross-sectionally. When only wealth distribution shifts are allowed, the elasticity’s standard deviation is 1.3 percent and its cross-correlation with HP-filtered output is −0.31. When spread variation is also incorporated, standard deviation rises to 2.4 percent and the cross-correlation becomes −0.53. This measure is highly correlated (90 percent) with the model MPC, supporting the inference that the MPC is countercyclical.

Q6. What is the structure of the banking sector in the HANK model, and how does the agency friction generate a countercyclical spread?

A: A continuum of banks combines household deposits with net worth to invest in corporate equity and consumer loans. Bankers can divert a fraction λ = 0.381 of assets, and if they do so, depositors can recover only the remaining fraction (1 − λ). This threat of diversion constrains the supply of deposits, resulting in banks needing to earn excess returns — Et(RK,t+1 − RS,t+1) > 0 — on their assets relative to the deposit rate. The leverage ratio is bounded above by ϱt/λ, where ϱt is a value multiplier that depends on current and expected future excess returns. When an adverse shock (capital quality shock or monetary tightening) reduces banking sector net worth, the leverage constraint tightens, banks reduce asset supply, and the spread between the return on capital (and hence the consumer loan rate, which is proportional to RK at markup ωB = 0.0075) and the deposit rate rises. This generates the observed countercyclical credit spread.

Q7. In the model, how do aggregate shocks affect the distribution of consumption, and why is the monetary policy shock particularly distributional?

A: A one-percent capital quality shock reduces both wages and bank net worth, causing spreads to rise. In the baseline economy, rising borrowing rates lead to a large reduction in consumption for indebted households (10th percentile) while the constant spread model shows near-parallel movements across the distribution. A one-percentage-point monetary policy shock reduces equity returns, depressing bank net worth and (with a lag) raising spreads. Indebted households face both lower labor income and higher borrowing costs, producing a sharp consumption decline at the 10th percentile; wealthy households gain from higher returns on savings, so their consumption rises in the short run. Responses converge as spreads return to normal over the medium run. This matches empirical evidence from Holm, Paul, and Tischbirek (2021) for Norway. For TFP shocks, banks’ net worth is less affected because households’ higher labor supply partially offsets the productivity decline, so spreads move little and distributional effects are smaller (driven mainly by wage effects across the distribution).

Q8. How does the financial accelerator in the HANK model compare to the RANK version?

A: In response to capital quality shocks and monetary policy shocks, the HANK model with banking frictions generates amplification relative to a constant-spread HANK benchmark, confirming the presence of a financial accelerator. However, relative to the RANK model, the incomplete markets model implies slightly less amplification of aggregate investment and consumption. This is because, in the HANK model, households facing higher credit spreads increase their labor supply (precautionary motive), which partially stabilizes aggregate income and moderates the financial accelerator. The finding that heterogeneous agent aspects are less important at the aggregate level is consistent with Berger, Bocola, and Dovis (2020). For TFP shocks, the financial accelerator through spreads is largely absent in both HANK and RANK, as spread changes are minor.

Q9. What are the long-run aggregate effects of tightening bank capital requirements (reducing leverage by 10 percent) in the HANK versus RANK model?

A: In the RANK model, higher capital requirements increase the annual spread between the return on capital and the deposit rate by 25 basis points, reduce the aggregate capital stock by 2.4 percent, output by 0.5 percent, and aggregate consumption by 0.8 percent. In the HANK model, the spread increases by 40 basis points annually, but the mechanism differs: much of the spread change is absorbed by a reduction in the deposit rate (from 3.81 percent to 3.54 percent annually) rather than an increase in the capital return. Households respond to the lower deposit rate and higher credit costs by increasing precautionary savings and labor supply, so aggregate output and consumption actually rise slightly in the HANK stationary equilibrium. The capital requirements thus appear costless at the aggregate level in the HANK model — but this masks welfare costs that operate through the idiosyncratic risk channel.

Q10. What are the quantitative welfare costs of macroprudential regulation, and how do they vary across the wealth distribution and between idiosyncratic and aggregate shocks?

A: Welfare is measured as the fraction of lifetime consumption households are willing to give up to stay in the unregulated baseline. In the face of idiosyncratic shocks only, welfare losses range from 0.24 to 0.43 percent of consumption for the first seven wealth deciles, and reach 4.28 percent for the richest decile (primarily because of the reduction in the return on their savings), with an average welfare loss of 0.79 percent. When aggregate shocks are added, the losses are substantially reduced for the poorest 80 percent (due to lower cyclical sensitivity of spreads), but remain large for the wealthiest decile (4.23 percent) and in aggregate (0.58 percent). These results are robust to the three-asset model extension, where the poorest households are approximately welfare-neutral under the regulation when aggregate shocks are included (0.00 percent), but aggregate welfare losses remain at 0.75 percent.

Q11. How does the three-asset model extension (with illiquid assets) affect the key results?

A: In the three-asset extension, households can hold illiquid capital (calibrated with an adjustment probability of φk = 0.0025 per quarter, targeting the Danish ratio of bank deposits to output of 34 percent), creating wealthy hand-to-mouth households who have illiquid assets but no liquid assets. The consumption impulse responses across the wealth distribution remain very similar to the two-asset baseline: endogenous spread movements generate heterogeneous consumption dynamics in response to capital quality and monetary shocks, while constant-spread models produce near-parallel responses. The three-asset model provides a better fit to the micro data (consumption-spread-income relationship across the wealth distribution), but the welfare conclusions from macroprudential regulation are essentially unchanged: welfare losses across the distribution in the stationary equilibrium, partially mitigated when aggregate shocks are added, with losses concentrated in the richest decile.

Q12. What robustness checks are reported for the empirical consumption regressions?

A: Three robustness exercises are reported. First, capitalizing car purchases using their official tax value (rather than treating car purchases as current expenditure) yields coefficients similar to the baseline (Table 10). Second, excluding households who purchase a car in the current or prior year (reducing the sample to 13.24 million observations) also leaves results unchanged. Third, first-differenced specifications (equation 42, with and without household fixed effects) produce results similar to the levels specification; the main exception is the spread effect for above-median wealth households when household fixed effects are omitted from the differenced specification (Table 11). The income–spread interaction is consistently positive and significant across all robustness checks.

Q13. What evidence does the paper provide that the model’s MPC is countercyclical and that credit spreads are the primary driver?

A: Figure 7 shows impulse response functions of the average MPC to each of the three aggregate shocks. In all three cases, the MPC rises in recessions (countercyclical). The key mechanism is that adverse shocks cause spreads to rise, increasing the mass of households at the kink in the budget constraint (zero liquid assets), where MPCs are highest. When the consumer credit spread is held constant, the MPC remains countercyclical but close to constant, indicating that spread movements account for most of the cyclical variation in MPC. Eliminating the spread altogether implies an acyclical MPC (Table 12, Appendix D). The unconditional cross-correlation of the model MPC with output is −0.60, compared with −0.53 for the empirically estimated consumption-income elasticity in the Danish data.

Key Concepts

Consumer credit spread (borrowing-saving spread): In the paper, this is the difference between the gross real interest rate on consumer loans (RL,t) charged by banks and the gross real return on deposits (RS,t) received by savers. It is not an abstract measure of credit conditions but a household-specific, bank-derived rate gap that moves countercyclically due to banking agency frictions and creates a kink in households’ budget constraints at zero net worth. Distinct from mortgage spreads (which in Denmark are market-determined and independent of borrower credit conditions).

Kink in the budget constraint: The household budget constraint has a kink at zero net assets because borrowers face RL,t > RS,t; households at exactly zero liquid assets (type IV in the paper’s taxonomy) face a discrete jump in the cost of additional borrowing. This kink creates a mass point in the wealth distribution at zero net wealth, and households at this kink have higher MPCs than unconstrained savers or borrowers. The size of the mass point increases when the spread rises.

Financial accelerator (in the HANK-with-banking context): The amplification mechanism in which shocks that reduce banking sector net worth tighten banks’ leverage constraints, raise credit spreads, reduce asset supply to both the corporate sector and households, and further depress investment and consumption — which in turn reduces bank net worth further. In this paper, the accelerator operates through the consumer credit spread channel in addition to the standard corporate lending channel, and is present for capital quality and monetary policy shocks but not materially for TFP shocks.

Countercyclical MPC: The MPC — defined as the response of consumption to a small transitory income shock — rises during recessions and falls during expansions in this model. The mechanism is that recessions are associated with higher consumer credit spreads, which expand the mass of households at or near the zero net wealth kink (high MPC), and contract the mass of unconstrained savers (low MPC). This is a distinct source of MPC cyclicality from the wealth distribution channel alone.

Agency friction (diversion problem): Banks can divert a fraction λ of their assets; if they do so, depositors can recover only the fraction (1 − λ) and the bank is liquidated. This threat limits depositors’ willingness to supply funds, resulting in an incentive-compatibility constraint on bank leverage: assets cannot exceed ϱt/λ (where ϱt is the bank’s franchise value multiplier). When ϱt declines (because expected excess returns fall), the constraint binds more tightly and the spread between the return on assets and the deposit rate must be positive to sustain bank participation.

Macro versus micro volatility trade-off: The paper uses this phrase to describe the finding that tighter bank capital requirements (restricting leverage) reduce the cyclical volatility of aggregate output and investment (macro volatility falls) while simultaneously increasing the volatility of individual household consumption streams due to higher credit spreads and lower deposit returns (micro volatility rises). Welfare costs from increased micro volatility outweigh the aggregate stabilization benefits.

Consumption-income elasticity (d log c / d log y): A time-varying cross-sectional average measure derived from quantile regression parameter estimates, equal to β₀,ⱼ + β₂,ⱼ × RSi,t for household i in wealth bin j. It is used in the paper as an empirical proxy for the MPC (not a direct estimate), and is shown to be highly correlated with the model MPC (cross-correlation of 90 percent at the annual rate). Its cyclicality is stronger when spread variation is incorporated (standard deviation 2.4 percent, cross-correlation with output −0.53) than when spreads are held fixed (standard deviation 1.3 percent, cross-correlation −0.31).

Financial shocks and leverage of financial institutions: When do they matter?

Mon, 01 Jan 0001 00:00:00 +0000

This paper investigates the role of leverage of financial institutions in amplifying the transmission of financial shocks to the macroeconomy, with particular attention to whether that amplification differs across economic regimes. The authors develop a new endogenous regime-switching structural vector autoregression (RS-SVAR) model with time-varying transition probabilities, in which the probability of switching regime depends on the contemporaneous state of the economy (endogenous switching). The model extends the Sims and Zha (2006) and Sims, Waggoner, and Zha (2008) Markov-switching SVAR framework by: (1) incorporating a time-varying transition matrix in which the probability of staying in a regime is a logistic function of lagged endogenous variables; and (2) introducing new identification techniques for RS-SVARs, including non-recursive zero restrictions, sign restrictions, and narrative sign restrictions, which can in some cases uniquely identify structural shocks rather than merely set-identify them.

The leverage measure is market-based — book assets divided by market equity — constructed from CRSP/Compustat institution-level data covering publicly listed depository institutions, bank holding companies, and nonbank financial institutions. The sample runs monthly from December 1988 to December 2019. The five-variable VAR includes industrial production growth, core CPI inflation, the 2-year Treasury rate, market leverage of financial institutions, and the Chicago Fed’s National Financial Conditions Index (NFCI). The authors estimate three model variants that substitute in turn the leverage of: (i) all depository institutions, (ii) Global Systemically Important Banks (GSIBs), and (iii) securities brokers and dealers.

The model identifies two coefficient regimes — a “financial constraint” regime and “normal times” — using the criterion that the first regime has higher smoothed probability during September 2008 to August 2009. The financial constraint regime covers the end of the Savings and Loan crisis, the 1990/91 recession, the Russian debt default, the Global Financial Crisis (GFC), and the European sovereign debt crisis.

The core finding is that real effects of financial shocks are amplified in the financial constraint regime but not in normal times. In the financial constraint regime, the output response to a financial shock is significantly negative, large, and protracted; GSIB leverage initially rises sharply (as falling asset prices erode equity) and then declines as institutions deleverage. In normal times, the output growth response is negative but non-persistent, and market leverage remains insignificant over the entire horizon.

The counterfactual experiment holding GSIB market leverage constant as of October 2008 is the sharpest quantitative result: if GSIB leverage had not risen further at the onset of the GFC, the decline in industrial production growth would have been approximately 20 percentage points smaller, with a faster subsequent recovery in output growth and inflation and higher short-term interest rates. The counterfactual probability of staying in the financial constraint regime would have fallen as low as 0.1 for some draws, compared to the actual probability remaining elevated. By contrast, for a system using depository institution leverage, the lower-bound counterfactual probability of staying in the constraint regime does not fall below 0.90, indicating substantially weaker heterogeneity effects for the broader depository sector.

Securities brokers and dealers show leverage that rises more on impact than other institutions and then declines immediately, consistent with their willingness to expand balance sheets going into the crisis amplifying losses and forcing a sharp post-crisis contraction.

A separate counterfactual holding the NFCI constant (rather than leverage) shows that the probability of staying in the constraint regime does not decline, confirming that market leverage and the financial conditions index provide distinct characterizations of the financial system and have different implications for shock propagation and regime persistence. Results are robust to substituting the GZ corporate spread for the NFCI and to imposing narrative restrictions for shock identification.

Q: What is the central research question? A: The paper asks whether and how the leverage of financial institutions amplifies the transmission of financial shocks to the real economy, and whether this amplification differs between a financial constraint regime and normal times. A secondary question concerns heterogeneity: do GSIBs, depository institutions broadly, and nonbank securities dealers transmit shocks differently?

Q: What is novel about the econometric framework? A: The RS-SVAR model allows the probability of remaining in a given coefficient regime to vary over time as a logistic function of lagged endogenous variables, so regime switching is endogenous to the state of the economy rather than governed by a fixed transition matrix. The paper also introduces sign restrictions, zero restrictions, and narrative sign restrictions into the RS-SVAR class, enabling identification of both structural shocks and regimes within a single framework; in roughly 20 percent of posterior draws these sign restrictions uniquely identify the financial shock.

Q: Why does the paper use market leverage rather than book leverage? A: Market leverage (book assets divided by market equity) is argued to be more timely than book leverage because book equity incorporates losses with a delay, giving institutions time to adjust book leverage to avoid regulatory limits. Market capitalization reflects market participants’ assessment of an institution’s creditworthiness, and low market-to-book ratios signal that institutions are more leveraged than their books indicate. Market leverage is therefore a more informative early-warning indicator of financial fragility and the need for rapid deleveraging.

Q: How are the two regimes identified? A: For each estimated regime, the authors count the number of months between September 2008 and August 2009 (inclusive) for which the smoothed probability of being in that regime exceeds 0.70; the regime with the higher count is labeled “financial constraint” and ordered first. Shock identification uses sign restrictions: in the financial constraint regime, a positive financial shock must have a contemporaneously negative effect on output, inflation, and the short-term interest rate, but positive effects on the financial conditions index and leverage; in normal times, only the financial conditions index is required to respond positively on impact.

Q: What regimes does the model assign historically? A: The smoothed probability of the financial constraint regime is elevated during the end of the Savings and Loan crisis, the 1990/91 recession, the Russian debt default, the GFC and associated recession (where the probability reaches 1.0 at end-2008 and beginning-2009 before declining sharply to approximately 0.6 percent in 2009/2010), and the European sovereign debt crisis.

Q: What do the impulse responses show in the financial constraint regime? A: In the financial constraint regime, the output response to a positive financial shock (tightening) is significantly negative, large, and protracted. GSIB leverage initially rises due to a sharp decline in asset prices eroding market equity, then falls as GSIBs deleverage in response. The authors interpret this pattern as evidence that deleveraging produces procyclical financial amplification effects with adverse real consequences.

Q: What do the impulse responses show in normal times? A: In normal times, the output growth response is large and negative but non-persistent, in contrast to the financial constraint regime. Market leverage remains statistically insignificant across the entire horizon in normal times, indicating that the leverage amplification channel is inactive outside of financial constraint episodes.

Q: What does the GSIB leverage counterfactual show quantitatively? A: Holding GSIB market leverage constant as of October 2008 implies a decline in industrial production growth that is approximately 20 percentage points smaller than actually occurred, along with a faster recovery in output growth and inflation and higher short-term interest rates. The counterfactual probability of staying in the financial constraint regime declines to as low as 0.1 for some posterior draws, compared to remaining elevated in the actual data.

Q: How do depository institutions compare to GSIBs in the counterfactual? A: For the model using broad depository institution leverage, the lower-bound counterfactual probability of staying in the financial constraint regime does not fall below 0.90, compared to as low as 0.1 for the GSIB specification. This implies that GSIB deleveraging has substantially more detrimental macroeconomic effects and a much larger effect on regime persistence than the broader depository sector.

Q: What is distinctive about securities brokers and dealers? A: Broker-dealer market leverage rises more on impact than leverage of other financial institutions following a financial shock, and then immediately declines due to rapid deleveraging. The authors interpret this as reflecting that dealers’ willingness to expand balance sheets ahead of the crisis amplified growth and losses, followed by a sharp post-crisis contraction — a pattern consistent with the procyclical leverage mechanism described in Adrian and Shin (2014).

Q: How do the authors distinguish the role of market leverage from the financial conditions index? A: A counterfactual holding the NFCI constant (rather than leverage) as of October 2008 shows that the probability of staying in the financial constraint regime does not decline, unlike the leverage counterfactual. This demonstrates that market leverage and the NFCI provide distinct characterizations of financial conditions and have different implications for the propagation of shocks and the persistence of the constraint regime.

Q: How robust are the results? A: Substituting the GZ corporate bond spread for the NFCI yields very similar results, specifically that the probability of staying in the constraint regime declines much more in the counterfactual than in the actual data, suggesting the findings are not driven by the choice of financial conditions proxy. Imposing narrative restrictions for shock identification (exploiting the known high-stress period around Lehman’s failure in September 2008) yields results that are “rather robust” relative to the baseline sign-restriction identification.

Q: What are the policy implications? A: The results confirm the leverage ratio as a useful financial stability indicator, with particular emphasis on market leverage as providing timely information for monitoring. The heterogeneity findings suggest that regulatory attention to GSIB leverage is especially warranted, since GSIB deleveraging can have substantially more detrimental macroeconomic effects and a much larger influence on the persistence of financial constraint regimes than deleveraging by the broader depository sector. The leverage ratio is characterized as complementary to the risk-weighted capital ratio as a regulatory tool.

Market leverage: Measured as book assets divided by market equity (not book equity), constructed from CRSP/Compustat institution-level data at monthly frequency. The paper argues market leverage is more timely than book leverage because market equity immediately reflects losses, preventing institutions from masking fragility through delayed book adjustments.

Financial constraint regime: One of two identified coefficient regimes in the RS-SVAR, characterized by a significantly negative, large, and protracted output response to financial shocks and by active leverage amplification. Identified empirically as the regime with the highest smoothed probability during September 2008 to August 2009.

Endogenous regime switching: A modeling approach in which the probability of transitioning between regimes depends on lagged values of the endogenous variables themselves (via a logistic function), rather than being governed by a fixed constant transition matrix. This allows regime dynamics to respond to the state of the economy.

Time-varying transition probabilities: The diagonal elements of the coefficient-regime transition matrix follow a logistic transformation of a linear function of lagged endogenous variables, so the probability of remaining in any given regime changes each period as a function of current financial and macroeconomic conditions.

Procyclical financial amplification: The mechanism by which financial institution deleveraging in response to falling asset prices further tightens financial conditions and reduces real output, generating a feedback loop. The paper provides empirical evidence for this channel operating specifically in financial constraint regimes.

Heterogeneity of financial institutions: The finding that GSIBs, broad depository institutions, and securities brokers and dealers differ substantially in how their leverage affects the transmission of financial shocks. GSIB deleveraging is shown to have much more detrimental macroeconomic effects and a much larger influence on the probability of remaining in the financial constraint regime than depository institution deleveraging more broadly.

Narrative sign restrictions in RS-SVARs: An identification technique extended from Antolin-Diaz and Rubio-Ramirez (2018) to the regime-switching context, which uses known historical episodes (here, the Lehman failure in September 2008) to impose restrictions on which regime the economy was in or on the sign of structural shocks at particular dates, thereby aiding identification of both shocks and regimes.

Firm Accommodation After Workplace Disability: Labor Market Impacts and Implications for Subsidy Design

Mon, 01 Jan 0001 00:00:00 +0000

Layer 1 — Overview

Research Question

This paper studies (1) how firm accommodation decisions respond to financial incentives in the context of workplace disability under workers’ compensation, (2) what the causal effect of accommodation is on workers’ subsequent labor market outcomes, and (3) whether the equilibrium level of accommodation is socially efficient, and what the welfare implications of wage subsidies for accommodation are.

Empirical Context and Data

The analysis uses the universe of Oregon workers’ compensation claims from 2005 through 2017 — over 131,000 disabling claims — linked to longitudinal quarterly earnings records from the Oregon Employment Department. The setting exploits Oregon’s Employer at Injury Program (EAIP), which subsidizes employers who provide “transitional work” accommodations (primarily through wage subsidies) to workers with temporary workplace disabilities. EAIP accounts for roughly 25 percent of claims on average, with the wage subsidy component representing over 96 percent of EAIP expenses.

Identification Strategy

The authors exploit a policy change in July 2013 that reduced the EAIP wage subsidy rate from 50 percent to 45 percent. They construct a firm-level “exposure” measure — the fraction of a firm’s claims that used EAIP in a baseline period (2005–2009) — and estimate a continuous difference-in-differences specification in which the interaction of exposure and a post-2013 indicator instruments for accommodation. The identifying assumption is strong parallel trends: firms with low baseline exposure are unlikely to respond to the subsidy reduction, while high-exposure firms respond more, generating cross-firm variation in accommodation rates after 2013. An MTE framework (Heckman and Vytlacil 2005) is then used to explore heterogeneous treatment effects along an unobserved resistance-to-treatment dimension.

Main Empirical Findings

The subsidy reduction from 50% to 45% decreased accommodation rates by 2.9 percentage points (9.3 percent) for claims in firms with average exposure, implying a subsidy elasticity of accommodation of 0.9.
The policy change led to a 0.95 percentage point decrease in employment and a $120 decrease in quarterly earnings four quarters after disability for claims in average-exposure firms (roughly 1.3–1.5 percent declines relative to means), with no significant effect on worker turnover to other firms.
IV estimates of the effect of accommodation itself (using predicted EAIP as instrument) show accommodation increases the probability of employment four quarters after disability by 33 percentage points and increases quarterly earnings by approximately $4,100.
The MTE analysis reveals negative selection on gains: workers with workplace disabilities who are least likely to receive accommodation have the highest potential gains from it, driven largely by severe disabilities with high accommodation costs.
Descriptive and IV evidence is consistent with accommodation operating primarily as general human capital investment: accommodation has no statistically significant effect on the probability of moving to a new firm, and earnings gains are not systematically lower for workers who change employers after accommodation.

Structural Model and Counterfactual Findings

A two-period frictional labor market model with risk-averse workers, risk-neutral firms, Nash bargaining, imperfect experience rating in workers’ compensation, and firm accommodation as human capital investment is developed and estimated. Two inefficiency sources are identified: (1) a human capital externality — because accommodation builds general human capital, firms cannot capture the full surplus when workers separate, reducing accommodation incentives; and (2) a fiscal externality — imperfectly experience-rated firms do not fully internalize the workers’ compensation cost savings from accommodation, further depressing it below the efficient level. Counterfactual simulations show:

Eliminating wage subsidies (from 50% to 0%) reduces accommodation rates from 33% to 11%, leading to a 7% decline in post-disability employment and a 15% decline in post-disability quarterly wages (roughly $1,358).
A revenue-neutral reform eliminating wage subsidies reduces average welfare and the welfare of more than 90% of workers.
Welfare gains from the subsidy are larger for low-skilled workers than high-skilled workers.
Conditional on experiencing disability, eliminating wage subsidies decreases welfare by about 10%, while increasing the subsidy to 100% raises welfare for disabled workers by around 30%.
Firm profit is maximized at a subsidy rate around 80%, after which higher taxes offset accommodation gains.

In depth

Q1. What is the Employer at Injury Program (EAIP), and how does it differ from standard workers’ compensation?

A1: EAIP is an optional component of Oregon’s workers’ compensation system that subsidizes employers for the costs of accommodating workers with temporary disabilities during a transitional return-to-work period. Unlike standard workers’ compensation premiums (which are experience-rated at the firm level), EAIP is funded through a flat payroll tax on all firms that is not experience-rated — meaning firms that use EAIP do not pay higher premiums. The wage subsidy component accounts for over 96 percent of EAIP expenses; other reimbursable costs (worksite modifications up to $5,000, retraining up to $1,000, clothing up to $400) are rarely used. Eligible employers must be the employer at which the disability occurred, and accommodation is limited to a transitional period during which workers cannot simultaneously receive time-loss benefits.

Q2. How is firm-level “exposure” constructed, and what is the rationale for using it as an instrument?

A2: Exposure is the fraction of a firm’s workers’ compensation claims that used EAIP during a five-year baseline period from 2005 to 2009 — a separate historical period chosen to reduce volatility and avoid mean-reversion. The rationale draws on prior work (Aizawa et al., 2022) showing that firm fixed effects account for nearly 25 percent of variation in accommodation, far more than worker or disability characteristics (1 and 3 percent, respectively), suggesting permanent firm-level heterogeneity in the relative benefits and costs of accommodation. Firms with zero historical exposure are unlikely to change accommodation behavior in response to a subsidy reduction, while high-exposure firms respond more, creating differential quasi-experimental variation in accommodation rates after July 2013.

Q3. What are the first-stage and reduced-form results from the DID specification?

A3: The first-stage DID coefficient shows that a ten-percentage-point increase in exposure is associated with a one-percentage-point decrease in EAIP take-up after 2013, implying a 2.9 percentage point decrease for claims in firms with average exposure (mean 0.27). The corresponding reduced-form results show a 0.35 percentage point decrease in employment four quarters post-disability and a $45 decrease in quarterly earnings for every ten-percentage-point increase in exposure, scaling to 0.95 percentage points and $120 at average exposure. There is no statistically significant effect on the probability of moving to a new firm. Pre-trend tests show parallel accommodation trends across exposure terciles prior to 2013, supporting the identifying assumption.

Q4. What do the IV estimates imply about the causal effect of accommodation on labor market outcomes?

A4: Under the exclusion restriction that the subsidy change affects labor market outcomes only through accommodation, the IV estimates imply that receipt of accommodation increases the probability of employment four quarters after disability by 33 percentage points (against a mean of 72 percent) and increases quarterly earnings by approximately $4,100 (against a mean of $7,807). There is no significant effect on the probability of working at a new firm four quarters later. The authors note these large estimates reflect local average treatment effects for compliers — workers whose accommodation status was changed by the instrument — who disproportionately have high unobserved resistance to treatment and high accommodation returns, explaining the magnitude.

Q5. What does the MTE framework reveal about the distribution of accommodation effects and selection?

A5: The MTE curves show that workers with the highest unobserved resistance to treatment (least likely to receive accommodation) have the highest potential employment and earnings gains from accommodation. This negative selection on gains arises because these workers tend to have worse employment outcomes in the untreated state, consistent with more severe disabilities commanding higher accommodation costs. IV weights are concentrated at high-resistance values, explaining the large IV estimates. Negative selection on gains is also found along observable dimensions: workers in self-insured firms, healthcare support occupations, women, and those with wounds/cuts/burns show larger gains but lower likelihood of receiving accommodation.

Q6. What evidence supports characterizing firm accommodation as general rather than firm-specific human capital investment?

A6: Three pieces of evidence point toward general human capital. First, the IV estimate shows accommodation has no statistically significant effect on the probability of working at a new firm four quarters after disability. Second, a triple-interaction specification (DID interacted with new-firm indicator) yields suggestive evidence of even larger earnings gains for workers who move to a new firm post-accommodation, though this is not statistically significant — a pattern inconsistent with firm-specific human capital. Third, the subset of claims that receive non-wage EAIP benefits (worksite modifications, retraining) do show lower mobility, but this comprises fewer than 5 percent of the sample, meaning the predominant form of investment in the context is general in nature.

Q7. What are the two sources of market inefficiency in accommodation identified in the model?

A7: The first is a human capital externality operating through worker turnover. Because accommodation builds general human capital that workers carry to new employers, a firm accommodating a worker does not capture the portion of future surplus that accrues to future employers upon separation. In a Nash bargaining framework with lack of commitment, this dynamic inefficiency is larger when industry-wide turnover rates are higher — consistent with the descriptive finding that accommodation rates are strongly negatively associated with industry separation rates. The second is a fiscal externality from imperfect experience rating: firms whose workers’ compensation premiums are not fully linked to their own claim costs do not fully internalize the cost-savings from accommodation (i.e., reduced time-loss benefit payments), leading them to accommodate at inefficiently low rates.

Q8. How is heterogeneity incorporated in the structural estimation, and what do the estimated parameters show?

A8: The model incorporates observed heterogeneity (firm insurance status, worker skill type — measured by pre-disability wages — firm baseline exposure, and pre/post policy change) and unobserved heterogeneity mapped to the MTE framework’s unobserved resistance to treatment. Indirect inference matches cross-sectional accommodation rates, earnings by subgroup, and the DID coefficients. Key findings: net output during the disability period is negative (accommodation is a costly short-run investment), while post-disability output is higher for accommodated workers. Low-skilled workers experience larger productivity gains from accommodation than high-skilled workers. Accommodation cost shock variance is lower for higher unobserved types, meaning high-gain workers are also more sensitive to subsidy changes, consistent with the large IV estimates. The model fits the DID coefficients for accommodation, employment, and wages well.

Q9. What do the counterfactual simulations show about the welfare effects of varying the subsidy rate?

A9: Eliminating wage subsidies from the current 50% rate reduces the accommodation rate from 33% to 11% and lowers post-disability employment by 7 percentage points and post-disability quarterly wages by 15% ($1,358). From a welfare perspective, eliminating subsidies in a revenue-neutral reform reduces average ex-ante worker welfare and lowers welfare for more than 90% of workers. Conditional on experiencing disability, eliminating subsidies reduces welfare by about 10% while raising the subsidy to 100% increases welfare of disabled workers by around 30%. Firm profit is increasing in the subsidy rate up to about 80%, then decreases. Ex-ante worker welfare gains from the current 50% subsidy relative to no subsidy are modest in consumption-equivalent terms (at most 0.6% increase in consumption), partly because the disability probability is low (2.2%) and because unaccommodated workers still receive two-thirds wage replacement through time-loss benefits.

Q10. What distributional implications do wage subsidies have across worker and firm types?

A10: Welfare gains from higher wage subsidies are larger for low-skilled workers than high-skilled workers, so the subsidy has a redistributive dimension beyond efficiency correction. Welfare gains are also larger for workers in imperfectly experience-rated firms, where the fiscal externality creates the greater wedge from the efficient level. Self-insured firms, which already internalize workers’ compensation cost savings and thus accommodate closer to the optimal rate, benefit less from the subsidy and can even be made worse off if subsidies are set very high (since they bear higher flat payroll taxes with smaller marginal accommodation gains). The fraction of worker-firm matches experiencing welfare gains exceeds 90% under the benchmark subsidy level, indicating broad rather than narrowly concentrated gains.

Q11. How do the experience-rating channel and the worker-turnover channel interact in comparative statics?

A11: Model comparative statics show that reducing the job-to-job transition rate of workers with disabilities to one-quarter of its estimated value substantially raises accommodation rates, and this effect is more pronounced for imperfectly experience-rated firms than for self-insured firms. This occurs because self-insured firms already have a strong incentive to accommodate (to reduce workers’ compensation premiums), so turnover is less marginal for them. Forcing all firms to be self-insured (perfect experience rating) would substantially increase accommodation rates in currently imperfectly rated firms. Lowering the accommodation cost during the disability period (increasing net output during the disability period) also raises accommodation rates for both firm types.

Key Concepts

Firm Accommodation (EAIP): In this paper’s specific sense, accommodation refers to a firm’s decision to offer a worker with a temporary workplace disability “transitional work” — alternative tasks, modified duties, or flexible arrangements — during their recovery period, funded in part through Oregon’s Employer at Injury Program wage subsidy. Accommodation is distinct from simple early return to work; it functions as a form of human capital investment by potentially providing skill development opportunities and preventing human capital depreciation.

Exposure (Instrument): A firm-level continuous measure defined as the fraction of a firm’s workers’ compensation claims that used EAIP during a five-year baseline period (2005–2009). Exposure captures permanent, time-invariant firm-level propensity to accommodate, and is used to construct a difference-in-differences instrument for the causal effect of accommodation by interacting exposure with a post-2013 indicator (when the subsidy rate was cut from 50% to 45%).

Imperfect Experience Rating: The degree to which a firm’s workers’ compensation insurance premium adjusts to reflect that firm’s own claims costs, rather than being set at an industry average. Fully experience-rated (self-insured) firms internalize 100% of claim costs and thus have strong incentives to accommodate. Partially experience-rated firms face a fiscal externality: because their premiums do not fully reflect their own time-loss benefit expenditures, they do not capture all the cost savings from accommodating workers, leading to under-accommodation relative to the social optimum.

Human Capital Externality (Dynamic Inefficiency in Accommodation): The mechanism — analogous to Acemoglu and Pischke (1999) and Fang and Gavazza (2011) — by which worker turnover reduces firms’ incentives to invest in general human capital (here, accommodation). When accommodation raises workers’ general productivity, part of the future surplus from this investment accrues to future employers upon job-to-job separation. With Nash bargaining and lack of commitment (re-bargaining in the second period), the accommodating firm cannot capture this surplus, creating a dynamic inefficiency that is more severe in high-turnover industries.

Negative Selection on Gains: The empirical finding, established via the MTE framework, that workers with workplace disabilities who are least likely to receive accommodation (highest unobserved resistance to treatment) have the largest potential employment and earnings gains from accommodation. This pattern arises because workers with more severe disabilities have high accommodation costs (making firms unwilling to accommodate them) but also face far worse counterfactual labor market outcomes without accommodation, creating large potential gains.

Marginal Treatment Effect (MTE): Following Heckman and Vytlacil (2005), the treatment effect of accommodation evaluated at a specific quantile of unobserved resistance to treatment — defined here as the propensity score value at which a worker is indifferent between treatment and non-treatment. The MTE curve maps out the full distribution of treatment effects and reveals who benefits (and by how much), how IV estimates are weighted averages over this distribution, and which compliers drive the large IV estimates.

General vs. Firm-Specific Human Capital (in Accommodation Context): Accommodation is characterized as general human capital investment if the productivity and earnings gains it produces are transferable across employers — i.e., if accommodated workers who move to new firms retain their wage gains. It is firm-specific if gains are tied to the current match. In this paper, general human capital is supported by the null effect of accommodation on new-firm employment probability, suggestive evidence of non-lower (possibly larger) earnings gains for new-firm movers, and the observation that fewer than 5% of claims use non-wage EAIP benefits associated with firm-specific investment.

Revenue-Neutral Counterfactual: A counterfactual policy experiment in which the wage subsidy rate for accommodation is varied while imposing that both the time-loss benefit program and the EAIP wage subsidy program remain budget-balanced. Higher subsidy rates raise firm accommodation, reduce time-loss benefit payouts (lowering base premiums for imperfectly experience-rated firms), but require a higher flat EAIP payroll tax on all firms, some of which is passed through to workers via lower first-period wages.

Firm dynamics and random search over the business cycle

Mon, 01 Jan 0001 00:00:00 +0000

Layer 1 — Overview

Research Question

How do aggregate economic fluctuations reallocate workers across the firm productivity distribution over the business cycle? In particular, to what extent do recessions impede workers’ movement up the job ladder toward more productive firms?

Model and Methodology

The paper develops a tractable random search model combining three features that had not previously been integrated in a single quantitative framework: (i) firm dynamics driven by idiosyncratic productivity shocks, with endogenous entry and exit; (ii) on-the-job search, generating a job ladder in which workers gradually move toward more productive firms; and (iii) aggregate productivity shocks. Multi-worker firms post employment contracts, choose hiring rates, and decide whether to continue or exit. The key tractability result — called “size-independence” (Result 1) — shows that, under a constant-returns hiring cost technology, firms’ optimal policies (contract value, hiring rate, exit decision) are all independent of firm size, so the relevant state space reduces from the full joint distribution of firm productivity and size to the employment-weighted distribution of firm productivity alone. A further result (“rank-monotonic equilibrium,” Result 2) guarantees, under a sufficient convexity condition on hiring costs (hc’’(h)/c’(h) ≥ 1), that the optimal employment contract is increasing in firm productivity, so the job ladder maps one-for-one onto the firm productivity ladder. The optimal wage contract then admits a closed-form solution.

The model is calibrated to British data for 1997–2018. Worker-level transition rates (unemployment-to-employment, employment-to-unemployment, and job-to-job) are drawn from the British Household Panel Survey (BHPS). Firm-level data on labor productivity (value added per worker) and employment costs per worker come from the Annual Respondents Database (ARD) and Annual Business Survey (ABS), merged with the Business Structure Database (BSD). The numerical solution adapts ideas from Krusell and Smith (1998), approximating the employment-weighted productivity distribution by a small set of moments and parameterizing value functions as polynomials in the aggregate state; standard linearization methods are inapplicable because endogenous firm entry and exit introduces a discontinuity in value functions.

Main Findings

Model validation via the OP decomposition. The paper’s central validation exercise uses the Olley-Pakes (OP) decomposition of a labor productivity index constructed from firm-level data. The aggregate employment-weighted labor productivity index is decomposed into (a) the unweighted average firm productivity and (b) an interaction term (the “OP term”), which captures the covariance between employment shares and productivity — i.e., how well workers are allocated to productive firms. In the British firm-level data, approximately 20 percent of the variance of the aggregate labor productivity index is accounted for by this interaction (OP) term, with the remaining ~80 percent attributable to the unweighted average of firm productivity. The baseline model, with this moment untargeted, successfully replicates this 80/20 split. By contrast, the leading benchmark model of Moscarini and Postel-Vinay (2016) (MPV2016), calibrated to the same British data, attributes nearly all of the variance of labor productivity to the OP/worker reallocation term, grossly overstating the importance of job-ladder dynamics.

Structural decomposition of labor productivity. Using the calibrated baseline model to decompose the variance of aggregate labor productivity over the post-war British business cycle (“GDP shocks” going back to 1955), the baseline model attributes approximately 30 percent to the direct effect of the aggregate productivity shock, approximately 50 percent to changes in the distribution of active firms (the “firm ladder” or firm selection component), and approximately 20 percent to the worker reallocation component (the OP interaction term). This result is robust to an alternative calibration with a lower curvature of the hiring cost function (c1 = 1).

Persistence and mechanisms. The impact of recessions on the job ladder is persistent: while the aggregate productivity shock is typically close to its pre-recession value four years after a typical recession onset, the overall allocation of workers to firms remains clearly worse relative to the pre-recession level at that same horizon. The Great Recession, viewed through the lens of the model, is a large but not unusually large recession.

Firm selection with multiple aggregate shocks. An unexpected finding concerns the direction of firm selection. With a single aggregate productivity shock, the model generates a standard “cleansing” mechanism: negative shocks raise the firm exit threshold, so surviving firms are on average more productive. However, when additional shocks to the exogenous separation rate (δ) and hiring cost scale (c0) are included — as required to match the volatility of labor market flows — firm selection instead amplifies the decline in labor productivity. The mechanism is a general equilibrium one: a higher separation rate lowers the optimal wage contract (since greater separation risk is passed on to workers), which in turn lowers the entry-exit threshold. Less productive firms become viable because their employees face higher unemployment risk and therefore accept lower wages; moreover, a larger pool of unemployed workers makes it easier for low-productivity firms to recruit.

Wage flexibility tension. The model implies a pass-through elasticity of wages to productivity shocks of approximately 0.7, well above the 0.05–0.2 range typically found empirically.

Scope Conditions

All calibration and quantitative results pertain to Britain for the period 1997–2018 (firm-level data) and 1955–2018 (GDP-based aggregate shocks). The model abstracts from decreasing returns to scale in production and from nominal rigidities. The tractability results rely on specific assumptions about the hiring cost function; the rank-monotonicity condition requires sufficient convexity (hc’’(h)/c’(h) ≥ 1).

In depth

Q1. What is the central tractability result and why does it matter for computational feasibility?

A: Result 1 (“size-independence”) shows that, because both the production technology and the hiring cost function are constant returns to scale, the firm’s present discounted value of profits is linear in employment. As a result, per-worker profits are independent of firm size, and optimal firm policies — the hiring rate, the contract value offered to workers, and the continuation/exit decision — all depend only on the firm’s current productivity, not on its size. This collapses the state space from the full joint distribution of firm productivity and employment size to the employment-weighted measure of firm productivity Lt(p), a uni-dimensional object. Without this result, the model would require tracking the entire joint firm distribution, making it computationally intractable.

Q2. What is a rank-monotonic equilibrium (RME) and what conditions guarantee it?

A: An RME is a recursive equilibrium in which the optimal contract offered by a firm is weakly increasing in that firm’s current productivity realization, for all aggregate states. Result 2 provides sufficient conditions: (i) the Markov process for firm-specific productivity satisfies first-order stochastic dominance (more productive firms today are more likely to be more productive tomorrow), (ii) the distribution of offered contracts is everywhere differentiable (ruling out mass points), and (iii) the hiring cost function satisfies hc’’(h)/c’(h) ≥ 1 — a sufficient convexity condition. The economic interpretation of the convexity condition is that firms must find retention (offering higher wages) sufficiently costly relative to new hiring that more productive firms optimally choose to use the wage margin to limit quits. The baseline calibration yields c1 ≈ 5.9 (so costs are highly convex in the hiring rate), though results are also reported for the minimum permissible c1 = 1.

Q3. What does the optimal employment contract look like in a rank-monotonic equilibrium, and what does it reveal about rent extraction?

A: In an RME, the optimal contract V(p,ω,L) is a weighted average of the value of unemployment U(ω,L) and the firm-workers’ joint surplus S(p,ω,L), where the weights are determined endogenously by the employment-weighted measure of firm productivity L. Specifically, the contract integrates the surplus of all firms with productivity below p, weighted by the share of employed workers at those firms, and divided by the mass of job seekers willing to accept the contract. As the employed workers’ relative search intensity s approaches zero, the contract converges to the value of unemployment — workers receive no rents. The endogenous bargaining weight evolves with the aggregate state over the business cycle, unlike standard Nash bargaining models with a fixed exogenous weight.

Q4. What firm-level moments are used to calibrate the steady-state model, and what is the logic behind the parameter-moment mapping?

A: Eight moments are targeted. From the BHPS worker data: the average UE rate (0.058) pins down the scale of hiring costs c0; the average EU rate (0.003) pins down the exogenous separation rate δ; and the average EE (job-to-job) rate (0.016) pins down the relative search intensity s. From the firm-level ARD/BSD data: average firm size (12.1 employees) pins down the entry probability µ; the share of job destruction from firm exits (0.526) disciplines the flow value of unemployment b; the autocorrelation of firm employment ln(n) (0.949 annually) disciplines the persistence of idiosyncratic productivity ρp; the interquartile range of firm-level labor productivity (1.129 log points) disciplines the volatility of idiosyncratic shocks σp; and the regression coefficient of firm employment growth on lagged labor productivity (0.136) disciplines the curvature of hiring costs c1. The baseline calibration fits all eight moments closely.

Q5. How does the calibrated model match non-targeted moments, and what does this establish?

A: The model generates several realistic features not targeted in calibration. It produces a realistic Pareto tail for the employment-size distribution (Pareto tail exponent of 1.033 in the model vs. 1.066 in the data), which arises from the combination of size-independent growth rates and firm entry and exit — conditions identified in the literature as generating power law distributions. The model also matches the dispersion of employment costs per worker across firms (capturing about 70 percent of the interquartile range of ECi,t), the slope of a regression of employment costs on labor productivity (model: 0.685 vs. data: 0.704), and the slope of a regression of employment growth on employment costs (model: 0.162 vs. data: 0.131). These non-targeted matches provide independent validation of the model’s wage-determination mechanism.

Q6. Why is a single aggregate productivity shock insufficient to match labor market fluctuations, and what additional shocks are needed?

A: With a single aggregate productivity shock calibrated to match the autocorrelation and standard deviation of log GDP, the model generates labor market fluctuations that are roughly an order of magnitude smaller than in the data. For example, the standard deviation of the EU transition rate is 4.1×10⁻⁴ in the single-shock model versus 2.3×10⁻³ in the data. Adding a discount rate shock (ω,r) partially helps but still leaves the job-finding rate (UE) more than 50 percent too smooth. Adding a separation rate shock (ω,δ) substantially increases EU and UE volatility but generates insufficient EE (job-to-job) volatility. The combination (ω,δ,c0) — adding a shock to the scale of hiring costs c0 — brings the standard deviations of EU and UE close to the data (2.0×10⁻³ and 4.0×10⁻⁴ vs. data 2.3×10⁻³ and 2.7×10⁻⁴), though the model still generates slightly under half the observed volatility in EE rates. This combination is the baseline for the quantitative analysis.

Q7. What is the OP decomposition, how is it computed from the firm-level data, and what does it measure in the model?

A: The aggregate labor productivity index LPt is constructed from firm-level data as the employment-share-weighted average of log value added per worker across firms. The OP decomposition writes this as LPt = LPt_bar + OPt, where LPt_bar is the unweighted (simple) average of firm-level productivity and OPt is the covariance between employment shares and labor productivity (the “interaction term”). In the data, OPt increases when workers are disproportionately employed at above-average-productivity firms. In the model, LPt_bar maps onto the average (log) productivity of active firms — the support of the job ladder — while OPt maps onto the difference between the employment-weighted and the unweighted averages of firm productivity, directly measuring how high up the ladder workers are located relative to the set of active firms. Around 20 percent of the variance of LPt in the British data is accounted for by OPt, and the model replicates this.

Q8. How does the Great Recession appear in the OP decomposition, and does the model fit the decomposition during this episode?

A: During the Great Recession (2008q2–2009q3 in the UK), around 20 percent of the overall fall in the labor productivity index is accounted for by the fall in the OP interaction term, with the remaining 80 percent coming from the fall in the unweighted average firm productivity. The model, even though it does not target this decomposition in calibration, successfully matches both the average firm productivity component and the interaction (OP) component during the Great Recession. This matching holds both in the baseline calibration (c1 ≈ 5.9) and in the alternative calibration with c1 = 1. The model also matches the analogous decomposition for employment costs per worker (ECt), an additional non-targeted validation.

Q9. Why does firm selection amplify rather than cleanse in the baseline multi-shock calibration?

A: In the single-shock (productivity ω only) model, a negative productivity shock lowers surplus at all firms, raising the exit threshold pE and thus selecting out low-productivity firms — the standard “cleansing” mechanism. In the multi-shock baseline, the additional separation rate shock (δ) generates a less intuitive mechanism. A higher δ lowers the optimal wage contract (since increased separation risk is passed on to workers: ∂V/∂δ ≤ 0), which reduces the value of continued employment. This lowers the joint firm-worker surplus threshold for exit, making it viable for low-productivity firms to remain active. Moreover, the larger pool of unemployed workers (generated by the δ shock) depresses the outside option of workers and makes it easier for low-productivity firms to recruit. As a result, the entry-exit threshold pE,t falls — the set of active firms becomes less productive on average — producing a negative firm selection contribution to labor productivity and a positive (amplifying rather than cleansing) contribution to the variance of LPt.

Q10. What is the structural variance decomposition of labor productivity in the baseline model?

A: Simulating the baseline model over the post-war British business cycle (1955–2020, GDP shocks), the variance of aggregate labor productivity LPt decomposes into three structural terms: approximately 30 percent (0.296) from the direct effect of the aggregate productivity shock ln(ωt); approximately 50 percent (0.541) from changes in the average productivity of active firms E[KP bar_t(ln p)] — the “firm ladder” or firm selection component; and approximately 20 percent (0.163) from the worker reallocation component OPt = E[LP bar_t(ln p)] − E[KP bar_t(ln p)]. This decomposition implies that roughly 70 percent of fluctuations in labor productivity are driven by worker reallocation broadly defined (the firm ladder plus the interaction term), with the firm selection component being the largest single driver. The result is robust to the alternative c1 = 1 calibration (30/49/22 percent split).

Q11. How does the baseline model compare to MPV2016 in the variance decomposition?

A: In the multi-shock calibration (ω,δ,c0), the MPV2016 model calibrated to the same British data attributes approximately 97.7 percent (0.977) of the variance of LPt to the worker reallocation (OP) term, with essentially none attributed to a firm selection term (since there is no firm entry and exit in MPV2016). This is nearly five times the 20 percent share attributed to worker reallocation in the data and in the baseline model. In the single-shock (ω) calibration, both models attribute a more modest share to worker reallocation (7.2 percent for the baseline model, 0.1 percent for MPV2016 with c1=5), and the difference narrows considerably. The contrast thus stems from the interaction of firm dynamics with multiple aggregate shocks: allowing for endogenous firm entry and exit is critical to prevent the model from overstating the role of the job ladder.

Q12. How persistent is the impact of recessions on the job ladder, based on the model simulations?

A: The paper simulates the structural decomposition of labor productivity starting from each of seven post-war British recessions (defined by two consecutive quarters of negative GDP growth). On average across these recessions, the aggregate productivity shock ln(ωt) is close to its pre-recession level by four years after the recession onset. However, the overall employment-weighted average productivity E[LP bar_t(ln p)] — reflecting workers’ position on the job ladder — remains clearly below its pre-recession value at the four-year horizon, indicating persistent misallocation. The OP interaction term accounts for approximately 20 percent of the total drop in the employment-weighted productivity measure three years after a typical recession onset. Through the model’s lens, the Great Recession is a large recession but not an outlier relative to the historical distribution.

Q13. What does the counterfactual with countercyclical unemployment benefits reveal about the tradeoff between firm selection and worker reallocation?

A: When the flow value of unemployment is made countercyclical (falling in recessions, rising in expansions — mimicking US unemployment insurance extension programs), the model generates a sign reversal in the firm selection (“firm ladder”) component. With countercyclical b, the unemployment value rises in recessions, which raises the minimum wage firms must offer and raises the exit threshold pE,t: fewer low-productivity firms survive, improving the composition of active firms. However, countercyclical benefits also amplify the slowdown in job-to-job reallocation: the higher value of unemployment reduces workers’ willingness to accept job offers, and all firms cut recruitment since optimal wage contracts must rise. The OP interaction term therefore falls more sharply than in the baseline model. The counterfactual with ϵb,ω ∈ {−100, −50} finds that the positive “firm ladder” effect dominates on net, so the overall allocation of workers to firms improves relative to the baseline after a typical recession under countercyclical unemployment benefits.

Q14. What is the numerical solution method, and why are standard linearization approaches inapplicable?

A: The model is solved in two steps. First, aggregate shocks are shut down and the steady-state rank-monotonic equilibrium is solved numerically by discretizing the firm productivity process (401 grid points via Tauchen’s method) and iterating on the value function and the employment-weighted productivity measure until convergence. Second, aggregate shocks are reintroduced using a simulation-based approach adapted from Krusell and Smith (1998): the employment-weighted distribution of productivity is summarized by Nm = 2 moments (plus the unemployment rate), and the value functions are parameterized as polynomials in the aggregate state, with coefficients updated by regression until convergence. Standard linearization methods (Reiter 2009) are inapplicable because the endogenous entry-exit decision creates a kink (discontinuity) in value functions at the productivity threshold pE, making first-order approximations around the steady state inaccurate. Accuracy tests based on den Haan (2010) show that the polynomial approximation generates errors of at most 0.065 percent for value functions and at most 1 percentage point for the unemployment rate across simulation paths.

Key Concepts

1. Rank-Monotonic Equilibrium (RME) A recursive equilibrium in which the optimal state-contingent employment contract V(p,ω,L) offered by a firm is weakly increasing in the firm’s current productivity realization p, for all aggregate states (ω,L). This property implies that the job ladder maps one-for-one onto the firm productivity ladder: workers always prefer to work at more productive firms. The paper shows this property holds under a sufficient convexity condition on hiring costs (hc’’(h)/c’(h) ≥ 1) and first-order stochastic dominance of the productivity process.

2. Size-Independence The property that a firm’s optimal policies — the hiring rate h(p), the employment contract V(p), and the entry/exit decision χ(p) — are all independent of the firm’s current employment size n. This follows from constant returns to scale in production and hiring, which implies that firm profits are linear in employment. Size-independence reduces the model’s relevant state space to the employment-weighted distribution of firm productivity, enabling tractability.

3. Employment-Weighted Distribution of Firm Productivity (L_t(p)) The measure recording, for each productivity level p, the total employment at firms with productivity at most p. This is the sufficient statistic for the state of the job ladder at any point in time: combined with the aggregate shock ω, it determines all equilibrium policy functions and value functions. In the model, it replaces the full joint distribution of firm productivity and employment size that would otherwise be required.

4. OP Decomposition (Olley-Pakes Decomposition) The decomposition of the aggregate employment-weighted labor productivity index LPt into: (a) the unweighted average firm productivity LPt-bar, which summarizes the productivity of active firms (the support of the job ladder); and (b) an interaction term OPt, the covariance between employment shares and firm-level productivity, which measures how well workers are allocated across the productivity distribution (i.e., how high up the ladder workers sit given the set of active firms). In the model, (a) maps to E[KP bar_t(ln p)] and (b) maps to OPt = E[LP bar_t(ln p)] − E[KP bar_t(ln p)].

5. Contract Posting The wage-setting protocol in which each firm commits upon entry to a full state-contingent employment contract — a schedule mapping each future realization of aggregate and idiosyncratic productivity to a wage and continuation decision — and is bound by an equal treatment constraint to offer the same contract to all employees. Workers cannot renegotiate based on outside offers. This protocol produces a well-defined closed-form for the optimal contract in an RME and differs from alternating-offer bargaining (Nash bargaining) in that the bargaining weights are endogenous rather than fixed.

6. Firm-Workers’ Joint Surplus (S_t(p)) The total present discounted value accruing to the firm-worker pair: firm profits per worker plus the contract value promised to workers. Because utility is transferable (risk neutrality) and the firm fully commits to its contract, this surplus depends only on the firm’s current productivity and the aggregate state — not on the promised contract value V. The surplus S_t(p) is the key object determining firm entry/exit (the firm continues if and only if S_t(p) ≥ U_t) and optimal hiring (the marginal return to an additional hire equals S_t(p) − V(p)).

7. Cleansing vs. Anti-Cleansing Firm Selection In models with endogenous firm entry and exit, a negative aggregate shock can either raise or lower the productivity threshold for firm survival. “Cleansing” refers to the standard mechanism where a negative productivity shock raises the exit threshold, selecting out low-productivity firms and improving the average quality of survivors. “Anti-cleansing” (as in the baseline multi-shock calibration) occurs when separation rate or hiring cost shocks lower the optimal wage contract and reduce the exit threshold, allowing less productive firms to survive and worsening average firm productivity.

Firm idiosyncratic risk and productivity investment: Macroeconomic implications

Mon, 01 Jan 0001 00:00:00 +0000

This paper quantifies how idiosyncratic firm-level risk affects aggregate output, TFP, and firm life-cycle growth in an environment where firm productivity evolves endogenously through risky investment. The paper embeds endogenous productivity investment into a Lucas span-of-control model with risk-averse firm owners and endogenous entry and exit, and studies the effects of mean-preserving increases in the variance of returns to productivity investment. A mean-preserving increase in the variance of firm productivity shocks that raises the firm exit rate by 10% (from 0.10 to 0.11) is estimated to cause a 0.73% decline in output, a 0.38% decline in measured TFP, and a 3.69% decline in firm productivity investment; these elasticities remain approximately constant in the empirically relevant range. The driving force is that risk-averse firm owners reduce their risky productivity investment as variance rises; if capital financing constraints are present—as is common in developing economies—these effects are amplified and the increase in uncertainty may also slow firm life-cycle growth. Previously circulated as “Uncertainty, Firm Lifecycle Growth, and Aggregate Productivity.”

In depth

Q1. What distinguishes this paper from standard models of firm misallocation?

Unlike the bulk of firm misallocation literature (Hsieh-Klenow 2009; Gopinath et al. 2017; Sraer-Thesmar 2023), which takes firm productivity as exogenous, this paper models productivity as an endogenous outcome of risky investment, so that idiosyncratic uncertainty affects allocative efficiency not only through selection effects but also through its discouragement of productivity investment by risk-averse owners. The paper incorporates endogenous productivity investment into a standard Lucas span-of-control model, allowing the model to capture how higher uncertainty reduces the incentive to invest in productivity, on top of any selection effects from the exit option.

Q2. What are the two opposing effects of higher idiosyncratic risk?

Higher idiosyncratic firm-level risk has two opposing effects on aggregate productivity: (i) a selection effect—a mean-preserving increase in variance leads to stronger selection and raises the productivity of survivors while reallocating exiters to alternative productive uses—that tends to raise average productivity; and (ii) a productivity investment effect—risk-averse owners reduce risky productivity investment in response to higher variance—that tends to reduce aggregate productivity and firm life-cycle growth. The paper shows quantitatively that the productivity investment effect dominates in the baseline calibration, so that higher idiosyncratic risk reduces output and TFP despite positive selection effects.

Q3. What are the main quantitative findings?

A mean-preserving increase in the variance of firm productivity shocks calibrated to raise the firm exit rate by 10% (from 0.10 to 0.11) results in a 0.73% decline in output, a 0.38% decline in measured TFP, and a 3.69% decline in firm productivity investment; these elasticities remain approximately constant in the empirically relevant range. The exit-rate increase from 0.10 to 0.11 is also associated with a 7.5% increase in the job destruction rate and a 14.6% increase in the standard deviation of firm growth rates—the latter is less than one-fifth of the increases in these risk measures observed when comparing India or Mexico to the U.S.

Q4. How do capital financing constraints interact with the results?

When firms face capital financing constraints—as is common in developing economies—the negative effects of higher idiosyncratic risk are amplified and the increase in uncertainty may also slow firm life-cycle growth. The mechanism is that constrained firms must rely more heavily on internal financing, making risk-averse owners even more sensitive to increases in variance. The paper implies that the macro-financial implications of idiosyncratic risk are more severe in developing economies where both idiosyncratic risk levels and financing constraints are greater—consistent with cross-country patterns of firm growth dynamics.

Key concepts

productivity investment : endogenous spending by firms on activities that shift their productivity process; in the model, this investment exposes firm owners to idiosyncratic risk via the innovation in the productivity process; the key margin through which higher uncertainty reduces aggregate productivity and output. mean-preserving increase in variance : a statistical experiment that increases the spread of the distribution of returns to productivity investment while leaving the mean unchanged; used here to isolate the pure risk effect on firm behavior and aggregate outcomes from any change in expected returns. span-of-control model : the Lucas (1978) model of firm size distribution with decreasing returns to scale in the entrepreneurial input; used as the production environment; extended here by adding endogenous productivity investment and endogenous entry and exit.

Firm Responses and Wage Effects of Foreign Demand Shocks with Fixed Labor Costs and Monopsony

Mon, 01 Jan 0001 00:00:00 +0000

Layer 1 — Overview

Research Question. The paper asks three related questions in the context of Belgium, a small open economy: (1) What do firms’ responses to demand shocks reveal about their cost structures? (2) What are the worker and wage impacts of foreign demand shocks? (3) How sensitive are the aggregate wage effects of foreign demand shifts to firms’ cost structures and imperfect competition in the labor market?

Data. The analysis combines administrative micro-data from Belgium for 2002–2014, provided by the National Bank of Belgium. The linked dataset covers 995,739 firm-year observations from private, non-financial firms with at least one FTE employee, and integrates: (a) a Business-to-Business (B2B) VAT transactions registry capturing all annual domestic firm-to-firm sales above €250; (b) customs records and intra-EU declarations for imports and exports at the 8-digit product level; (c) annual accounts containing data on sales, labor costs, intermediate inputs, capital, and firm characteristics; and (d) employer-employee matched data from the Belgian social security administration (BCSS) for a random sample of 500,000 workers in firms with 10 or more FTE employees, covering 2003–2014.

Identification Strategy. To isolate variation in firms’ sales driven by foreign demand rather than supply-side factors, the authors construct a firm-specific foreign demand instrument following Hummels et al. (2014) and Dhyne et al. (2021). The instrument is the weighted average of changes in world import demand facing a firm, using lagged export shares as weights and excluding Belgian imports from the world import measure. Crucially, the instrument captures both direct foreign demand exposure (for exporters) and indirect exposure through the domestic production network — including the foreign demand shocks passing through to upstream domestic suppliers via buyer-supplier links. Firm and industry-year fixed effects control for time-invariant heterogeneity and industry-level trends.

Key Empirical Facts. Within-firm analysis over four-year windows finds that intermediate input purchases respond nearly proportionally to changes in sales (slope coefficient 0.82), while labor costs respond less than proportionally (slope coefficient 0.57). The less-than-proportional response of labor costs — with the employment slope of 0.48 and the average wage slope of 0.09 — is consistent with sizable fixed overhead costs in labor inputs and upward-sloping labor supply curves. Output prices co-move more with input prices than with average wages, consistent with labor constituting a smaller share of variable costs than intermediate inputs.

IV Estimates of Firm Responses. In response to a foreign demand shock inducing a 10 percent instantaneous increase in a firm’s sales, the firm’s cumulative sales over four years increase by approximately 7.6 percent (balanced panel). Over the same four-year horizon, total input purchases increase by about 7.0–7.8 percent, while labor costs increase by only 3.5–4.1 percent — a substantially less-than-proportional response. Roughly one-quarter of the labor cost change comes from changes in average wages rather than employment changes. Domestic input purchases increase by 5.3–6.0 percent, indicating that firms pass on a large share of foreign demand shocks to their domestic suppliers.

Structural Parameters. The implied IV estimate of the labor cost elasticity with respect to sales is 0.53 (standard error 0.08), statistically significantly below one. The implied elasticity of total input purchases is 1.05 (standard error 0.15), close to one, so the fixed share of intermediate inputs is approximately zero. The labor supply elasticity estimated from the ratio of wage and employment responses is approximately 3.9 in the full sample and 2.3 in the stayer subsample; the implied wage markdown is 21 percent and 30 percent respectively. Incorporating upward-sloping labor supply into equation (15), the estimated share of total labor inputs that is fixed overhead is approximately 53 percent. By comparison, the fixed share of total costs (labor and intermediate inputs combined) is approximately 29 percent in Belgium — higher than the 18–22 percent found in U.S. data (De Loecker et al. 2020) and the 20 percent found in U.S. manufacturing plants (Ederhof et al. 2021).

General Equilibrium Counterfactuals. The authors parameterize and solve a small open economy general equilibrium model with monopsonistic competition in labor markets, monopolistic competition in product markets, and fixed and variable labor and intermediate input costs. Using the Dekle-Eaton-Kortum (2007) “hat algebra” technique, they simulate a 5 percent increase in foreign tariffs on all Belgian exports and compare four counterfactual economies: (1) baseline Belgium with fixed costs and imperfect labor market competition (ε = 3.9); (2) fixed costs and perfectly elastic labor supply (ε = ∞); (3) no fixed costs with imperfect competition; (4) no fixed costs and perfectly competitive labor markets.

Main Findings on Wages. In the baseline Belgian economy, a 5 percent increase in foreign tariffs produces a 4.9 percent fall in the average real wage. With fixed costs but perfectly elastic labor supply, the real wage falls by 4.8 percent — nearly identical. With upward-sloping labor supply but no fixed costs, the real wage falls by only 3.0 percent; without fixed costs and with perfectly competitive labor supply, the fall is only 2.8 percent. The paper concludes that fixed overhead costs in labor substantially amplify real wage declines, while incorporating upward-sloping labor supply appears quantitatively less consequential for aggregate wage outcomes. Standard models that assume no fixed costs and perfectly elastic labor supply — the typical modeling choice in the trade literature — may substantially understate (by roughly 43–75 percent of the true effect) the aggregate wage decline from a negative foreign demand shock.

Mechanism. Fixed overhead costs reduce labor’s share of variable costs. When labor is a smaller share of variable costs, output prices are less sensitive to changes in wages. With a fixed aggregate labor supply, the economy must lower prices through wage reductions to restore equilibrium after a negative demand shock; the required wage decline is larger when fixed labor costs are taken into account. The findings are robust to adjustment cost specifications, a nested logit extension of the labor market model, and controlling for location-year fixed effects and import price changes.

In depth

Q1. What two motivating empirical facts about Belgian firms does the paper establish?

A1: First, within-firm four-year changes show that intermediate input purchases respond nearly proportionally to changes in sales (slope coefficient 0.82), while labor costs respond less than proportionally (slope coefficient 0.57). The labor cost response decomposes into an employment slope of 0.48 and a wage slope of 0.09. Second, output prices co-move more strongly with input (intermediate goods) prices than with average wages, consistent with labor constituting a smaller share of variable costs than intermediate inputs.

Q2. How does the instrument for foreign demand shocks capture indirect exposure through production networks?

A2: The instrument for firm k is a weighted average of changes in world import demand, where the weights reflect both the firm’s own direct export shares across countries and products and the firm’s indirect export exposure through its domestic buyers’ export shares. The term H̃_{kn,t-1} captures the share of firm k’s total sales purchased by firm n directly and indirectly through all upstream chains. This means even non-exporting firms receive a non-zero instrument through their sales to directly-exporting firms. In fact, non-directly-exporting firms sell on average nearly 10 percent of their output indirectly to foreign markets.

Q3. What is the estimated magnitude of the labor supply elasticity facing Belgian firms, and what does it imply for wage markdowns?

A3: In the full main estimation sample (balanced panel), the IV estimate of the firm-specific labor supply elasticity is approximately 3.9, implying a wage markdown of about 21 percent relative to the marginal revenue product of labor. In the stayer subsample (incumbent workers only, holding workforce composition fixed), the estimated labor supply elasticity is approximately 2.3, implying a markdown of about 30 percent. The paper can reject perfect competition (infinite elasticity, zero markdown) at a significance level of 0.06 in the full sample and 0.001 in the stayer sample using the closure method.

Q4. What is the estimated labor cost elasticity with respect to demand-driven sales changes, and what does it imply about fixed labor costs?

A4: The IV estimate of the labor cost elasticity with respect to sales is 0.528 (standard error 0.085), statistically significantly below one. If labor supply were perfectly elastic, this would directly imply a fixed labor cost share of approximately 47 percent. Incorporating the estimated upward-sloping labor supply curve through equation (15), the model implies that approximately 53 percent of total labor inputs are fixed overhead. For context, occupational data from Belgium’s 2014 Structure of Earnings Survey shows that clerical support workers and managers together account for 21 percent of total earnings, and adding professionals raises this to 51 percent — broadly consistent with the estimated fixed share.

Q5. What does the estimated elasticity of input purchases with respect to sales imply about fixed intermediate input costs?

A5: The IV estimate of the elasticity of total input purchases with respect to sales is 1.050 (standard error 0.150), close to one. The implied fixed share of total intermediate inputs is therefore approximately zero. However, there is substantial heterogeneity by input type: purchases from the manufacturing sector (roughly half of all input purchases) have an elasticity close to one, whereas service-sector inputs (roughly 30 percent of total input purchases) have an implied fixed cost share of approximately 36 percent, with a size-weighted average cumulative response of 4.3 percent against a total cumulative sales increase of 6.7 percent.

Q6. How does the paper rule out alternative explanations for the less-than-proportional response of labor costs?

A6: The paper considers three main alternatives. First, adjustment costs: even in the presence of labor adjustment costs, under a homothetic constant-returns production function a permanent shock should eventually produce a proportional labor response. The paper focuses on four-year cumulative responses where firm responses change little after the first couple of years, and shows identification of fixed costs holds even in models with quadratic or Calvo-style adjustment costs. Second, a non-homothetic CES production function without fixed costs: Appendix B.3 shows that such a specification predicts that if the labor cost elasticity is below one, the input purchase elasticity must be above one — at odds with the data, which shows the input purchase elasticity is close to one while the labor cost elasticity is well below one. Third, variable markups: a uniform markup change would reduce both elasticities proportionally, not create the large gap between labor cost and input purchase elasticities observed.

Q7. Why are firms’ domestic suppliers affected by foreign demand shocks, and how large are the pass-through effects?

A7: Firms pass on foreign demand shocks to their domestic suppliers through buyer-supplier production network links. When a foreign demand shock increases a firm’s sales by 10 percent instantaneously, its domestic input purchases increase cumulatively by approximately 5.3–6.0 percent over four years. Total input purchases increase by 7.0–7.8 percent over the same period; the difference between total and domestic input purchases reflects service inputs (which have smaller responses) and the composition of imported versus domestic inputs.

Q8. What is the aggregate real wage effect of a 5 percent increase in foreign tariffs on Belgian exports in the baseline model?

A8: In the baseline counterfactual representing the actual Belgian economy (with fixed overhead costs and labor supply elasticity ε = 3.9), a uniform 5 percent increase in foreign tariffs on all Belgian exports produces a 4.9 percent fall in the average real wage. The median firm reduces output by 3.8 percent, marginal costs by 4.8 percent, and wages by 7.9 percent. The fall in wages is driven by a general equilibrium mechanism: since the foreign price is exogenous and trade balance must hold, wages are the key adjusting margin.

Q9. How much does the modeling of fixed overhead costs versus imperfect labor market competition matter for the aggregate wage counterfactual?

A9: Fixed overhead costs account for nearly all of the amplification relative to the standard model. With fixed costs but perfectly elastic labor supply, the real wage falls 4.8 percent — almost identical to the 4.9 percent in the baseline. Without fixed costs but with the estimated upward-sloping labor supply, the fall is only 3.0 percent. Without either, the fall is 2.8 percent. Thus, incorporating fixed overhead costs in labor raises the estimated wage decline by approximately 1.9 percentage points, while incorporating imperfect labor market competition adds only about 0.1 percentage points. The paper concludes that fixed overhead costs, not monopsony, are the essential feature for accurately predicting tariff impacts on wages.

Q10. What is the mechanism by which fixed overhead costs amplify the aggregate wage decline from a negative demand shock?

A10: Fixed overhead costs reduce the share of labor in firms’ total variable costs. When labor constitutes a smaller fraction of variable costs, output prices are less sensitive to changes in wages. With aggregate labor supply fixed, the economy restores equilibrium after a negative demand shock by reducing prices through wage cuts. To achieve the same magnitude of price reduction when labor is a smaller fraction of variable costs, wages must fall by a larger amount — amplifying the aggregate wage impact. Fixed overhead costs in labor also make foreign inputs relatively more important in variable costs, as shown empirically in Appendix D.1.

Q11. Is the conclusion about the relative importance of fixed costs versus labor market imperfections robust to alternative specifications of the labor market?

A11: Yes. The paper extends the model to a nested logit structure for worker preferences (following Lamadon et al. 2022), which allows Belgium to contain multiple labor markets (defined as industry-region nests), permits heterogeneous markdowns across markets, and is still identified from the data. Empirically, incorporating multiple labor markets and heterogeneous markdowns does not quantitatively alter the aggregate counterfactual predictions for the wage effects of foreign demand shocks.

Q12. Are heterogeneous responses to the foreign demand shock observed across exporters, importers, and domestic-only firms?

A12: The paper finds no systematic differences in the elasticities of labor cost and input purchases between firms that trade internationally and those that do not. This implies that exporters and importers have higher absolute fixed costs (consistent with fixed export and import costs) but comparable fixed cost shares — since these firms tend to be larger and thus spread higher absolute fixed costs over larger output volumes.

Q13. Do the findings about fixed overhead costs extend beyond foreign demand shocks?

A13: Yes. The paper shows in Appendix D.4 that a uniform 5 percent reduction in the productivity of all Belgian manufacturing firms generates qualitatively and quantitatively similar conclusions: fixed overhead costs amplify the predicted wage effects of domestic productivity shocks, while imperfect competition in the labor market matters to a lesser but still meaningful extent.

Key Concepts

Fixed Overhead Costs (Fixed Labor Costs / Fixed Intermediate Input Costs): In the paper’s model, each firm has firm-specific fixed overhead input requirements for labor (denoted ℓ̄_k^f) and intermediate inputs (denoted q̄_k^f) that must be satisfied regardless of the firm’s output level. These fixed requirements are separate from the variable inputs used in production. Fixed labor costs may reflect administration, worker management, facility maintenance, and other tasks that do not directly translate into output. Fixed intermediate input costs include waste management, accounting services, and electricity payments that occur irrespective of sales. The share of total labor inputs that is fixed is identified by how much less than proportionally labor costs respond to demand-driven changes in sales.

Monopsonistic Competition in the Labor Market: The paper models each firm as facing an upward-sloping firm-specific labor supply curve arising from workers’ heterogeneous idiosyncratic preferences over non-wage firm attributes (amenities). Because workers’ idiosyncratic tastes are private information, firms cannot price-discriminate and thus face an increasing marginal cost of labor. Each firm is infinitesimal within the aggregate labor market but has wage-setting power at the firm level. This gives rise to a constant-elasticity firm-level labor supply curve ℓ_k = A_k w_k^ε, where ε is the labor supply elasticity facing the firm.

Wage Markdown: The firm’s equilibrium wage is marked down relative to the marginal revenue product of labor by the factor ε/(1+ε), which is less than one when ε is finite. With a labor supply elasticity of 3.9, the implied markdown is approximately 21 percent; with a supply elasticity of 2.3 (stayer sample), the markdown is approximately 30 percent. Perfect competition corresponds to ε = ∞ and a markdown of zero.

Labor Cost Elasticity: The elasticity of a firm’s total labor cost with respect to a demand-driven change in the firm’s sales, as derived from the model’s comparative statics (equation 15). This elasticity depends on both the variable share of labor inputs (ℓ_k^v / ℓ_k) and the labor supply elasticity ε. It lies strictly between zero (all labor fixed) and one (all labor variable), and is declining in ε for a given variable share. The paper estimates this elasticity at 0.528 via IV, implying substantial fixed overhead in labor.

Total Foreign Demand Shock: The firm-level measure of foreign demand used as an instrument, defined as the weighted average of changes in world import demand (excluding Belgium) across country-product pairs, where the weights reflect both the firm’s own lagged direct export shares and its indirect exposure through the domestic production network (via the Leontief inverse matrix H̃). This measure captures both direct exporter exposure and indirect upstream exposure for non-exporting firms that supply to exporters.

Indirect Export Exposure: The share of a firm’s output that reaches foreign markets indirectly through sales to domestic buyers who subsequently export. Defined recursively: the total export share of firm k equals its direct export revenue share plus the sum over all domestic buyers of the product of k’s revenue share from that buyer and the buyer’s own total export share. Even non-direct-exporting firms sell on average approximately 10 percent of their output indirectly to foreign markets in the Belgian data.

Dekle-Eaton-Kortum Hat Algebra: A technique for solving general equilibrium counterfactuals in trade models by expressing all outcomes as proportional changes (“hats”) relative to the observed equilibrium, without needing to recover the underlying structural parameters. The paper uses this approach to compute counterfactual wages under alternative tariff scenarios, holding fixed the observed firm-level expenditure shares from the reference year (2012) while allowing parameters such as productivity and technology weights to vary across counterfactual economies to rationalize identical observed firm-level observables.

Worker Rents: In the monopsony model, inframarginal workers earn rents defined as the excess return over what would be required to make them indifferent between employers. These rents arise because firms cannot price-discriminate across workers with heterogeneous amenity valuations. The additional rents accruing to workers from a demand-driven increase in firm sales decompose into: (1) wage increases for incumbent workers multiplied by current employment, (2) rents for new hires (the excess of their wage bill over the amount required to induce them to switch to the expanding firm), and (3) a correction term related to the fraction of the labor cost increase borne by expanding employment rather than wages.

FraNK: Fragmentation in the NK Model

Mon, 01 Jan 0001 00:00:00 +0000

Moro and Nispi Landi develop FraNK, a multi-country New Keynesian model designed to study geoeconomic fragmentation — defined, following Aiyar et al. (2023), as a policy-driven reversal of economic integration guided by strategic considerations. The model extends Gali and Monacelli (2005) along three dimensions: it is multi-country rather than small-open-economy; it assumes incomplete international financial markets, relaxing perfect risk sharing; and it incorporates commodities as intermediate inputs in production, capturing both domestic and imported commodity sourcing. A fragmentation shock is modeled as a simultaneous increase in three tax rates imposed on rival countries: a tax on imports of final goods, a tax on imports of commodities, and a tax on the purchase of foreign bonds (capital controls).

The paper proceeds in two stages. First, under a symmetric two-bloc calibration, closed-form analytical results establish the distinct macroeconomic channels of each tax. The good import tax operates through both demand (households reduce consumption of foreign goods) and supply (firms face higher real marginal costs), with the demand channel dominating: output falls unambiguously and PPI inflation decreases, though CPI inflation rises on impact due to the direct pass-through of import prices. The commodity import tax operates exclusively through supply — raising intermediate input costs — so both output and PPI inflation move in the same direction: output falls and PPI inflation rises. The bond tax is neutral under symmetric calibration: because each country’s net foreign asset position is unchanged (each country reduces its holdings of rival-bloc bonds by exactly as much as it reduces its own issuance), output and inflation are unaffected.

Second, the model is calibrated to four asymmetric regions: the United States (US), US-allied countries including the European Union (WE), the China-Russia-aligned bloc (CR), and a neutral rest of the world (NE). Bloc assignment follows Den Besten et al. (2023), using a political alignment index combining sanctions data, military imports, Belt and Road Initiative participation, and UNGA voting on Russia’s invasion of Ukraine. The US and WE impose all three taxes on CR, and vice versa; NE neither imposes nor receives taxes.

Five main findings emerge from the asymmetric simulation. First, fragmentation predominantly affects CR and WE: both experience substantial declines in consumption and production across all three tax scenarios, with CR most affected when goods or asset taxes are applied. Second, the US is largely insulated: its lower trade and financial exposure to the rival bloc relative to WE limits the pass-through of fragmentation. Third, spillovers to neutral NE are nearly negligible: the expenditure-switching channel (which raises demand for untaxed NE goods) and the global income channel (which reduces demand for all goods as the world becomes poorer) roughly cancel each other out. Fourth, fragmentation is not necessarily inflationary: whether PPI inflation rises or falls depends on the relative weight of commodities in production and the mix of taxes applied — a goods tax lowers PPI inflation, while a commodity tax raises it. Fifth, the bilateral exchange rates most affected are those of the CR bloc, which appreciate under goods and asset taxes and depreciate under commodity taxes.

Sensitivity analyses confirm robustness across higher elasticity of substitution between domestic and foreign goods (eta raised from 1.5 to 5), lower elasticity of substitution between labor and commodities (xi lowered from 0.4 to 0.1), tighter financial market integration (bond transaction costs multiplied by 5), and permanent shocks (persistence rho raised to 1). Under permanent shocks, the goods-tax effect on PPI inflation approaches zero — consistent with the closed-form result — while commodity-tax effects on production become larger and more persistent.

Q: What is the core research question of FraNK? A: The paper asks how geoeconomic fragmentation — modeled as policy-driven increases in taxes on rival countries’ goods, commodities, and bonds — affects output, inflation, exchange rates, and capital flows at both the global and country level. It also asks whether different sources of fragmentation (real versus financial) have distinct macroeconomic implications, and whether neutral countries experience meaningful spillovers.

Q: How does the model depart from the Gali-Monacelli (2005) benchmark? A: Three departures are made. The model is multi-country (N countries) rather than a single small open economy facing the rest of the world. Financial markets are incomplete, so international risk sharing is imperfect — a realistic assumption in a fragmented world. And intermediate-good production uses a CES bundle of labor and a commodity bundle that includes both domestic and imported commodities, which is essential for capturing commodity market disruptions such as those following Russia’s invasion of Ukraine.

Q: What are the three tax instruments and what does each represent? A: The goods import tax (tau_ijt) is a tariff on final goods imports, representing trade barriers. The commodity import tax (tau_O_ijt) is a tariff on imported commodity inputs, representing sanctions or restrictions on energy and raw material trade. The bond tax (theta_ijt) is a capital control discouraging purchases of bonds issued by rival countries, representing financial fragmentation or sanctions on financial assets.

Q: What does the closed-form symmetric-calibration result establish about output? A: Under the symmetric calibration, both the goods import tax and the commodity import tax reduce output unambiguously (Proposition 3.3). The bond tax is neutral for output under symmetry because each country’s net foreign asset position is unchanged — any reduction in holdings of rival-bloc bonds is exactly matched by a reduction in own-bond issuance, leaving net positions and aggregate demand unaffected (Proposition 3.4).

Q: Why does the goods import tax reduce PPI inflation while the commodity import tax raises it? A: The goods import tax operates through two opposing channels: a demand channel (households substitute away from foreign goods, reducing aggregate demand) and a supply channel (import taxes raise firms’ real marginal costs). The closed-form solution establishes that the demand channel dominates, so PPI inflation falls. The commodity import tax operates only through the supply channel — raising the cost of intermediate inputs directly — so PPI inflation rises unambiguously. CPI inflation rises on impact under the goods tax because import prices are directly included in the CPI even as PPI falls.

Q: Under what condition does simultaneous fragmentation (goods and commodity taxes together) produce PPI inflation? A: When both taxes are imposed simultaneously, the net effect on PPI inflation is ambiguous. The paper shows analytically that PPI inflation rises if and only if omega * gamma_O_tilde > gamma_tilde * (phi/sigma), where omega is the commodity weight in production, gamma_O_tilde captures commodity import weights, and gamma_tilde captures goods import weights. That is, fragmentation tends to be stagflationary the larger the weight of commodities in the production function, consistent with the empirical finding in Caldara et al. (2024) of stagflationary effects from elevated geopolitical risk.

Q: Why is the US more shielded from fragmentation than its WE allies? A: The US has relatively lower trade and financial exposure to the CR bloc compared to WE. Because the trade and financial weights calibrated from UN Comtrade, IMF CPIS, BIS LBS, and IMF CDIS data place WE in closer economic relationships with CR countries, a tax on CR imports or assets falls more heavily on WE than on the US. This asymmetry is a direct consequence of the calibration: no structural or strategic advantage of the US is assumed beyond its actual pattern of trade and financial linkages.

Q: What happens to the CR bloc’s exchange rate under each tax scenario? A: Under the goods import tax, the CR exchange rate appreciates: CR’s own tax reduces demand for US/WE goods, increasing domestic demand relative to the rest of the world, and the reduced demand for CR bonds from abroad raises CR interest rates, further attracting capital. Under the commodity import tax, the CR exchange rate depreciates: lower commodity demand reduces CR commodity prices and production, shifting labor toward goods, increasing goods supply, and lowering the CR price level relative to trading partners. Under the bond tax, the CR exchange rate also appreciates, as reduced CR demand for US/WE bonds is interpreted by markets as a shift in capital flows favoring CR assets.

Q: What explains the near-zero spillovers to neutral countries? A: Two forces operate on NE in opposite directions. The expenditure-switching channel raises demand for NE goods and commodities, as taxing countries divert purchases away from taxed rival goods toward untaxed NE products — a positive demand shock for NE. The global income channel reduces demand for all goods, including NE’s, as the taxing and taxed regions become poorer and reduce imports from everywhere. In the calibration these two forces approximately cancel, leaving NE macroeconomic variables nearly unchanged.

Q: How is the commodity sector modeled, and why does this matter for the commodity tax result? A: Each country has a representative commodity firm using a linear production function (Y_iOt = A_iO * H_iOt), where A_iO is interpretable as a per-capita endowment of natural resources. Intermediate-good firms use a CES bundle of labor and commodities (domestic and imported) with elasticity xi=0.4 between the two. When the commodity import tax is imposed, firms face higher commodity input costs, raising real marginal costs and PPI inflation while depressing production. The asymmetry between commodity exporters (CR, NE) and importers (WE) under this tax is the main source of differential regional effects.

Q: How are financial openness differences across country pairs captured, and what effect do they have? A: Bond transaction costs psi_ijF differ across pairs: psi_12F = psi_21F = 0.01 for the US-WE pair (reflecting high financial integration), while all other pairs have psi_ijF = 1 — one hundred times higher — reflecting limited cross-bloc financial integration. The sensitivity analysis multiplies all psi_ijF by 5 (less open financial markets) and finds that bond position volatility falls but qualitative results are unchanged, confirming that the financial openness calibration does not drive the main results.

Q: What are the main caveats acknowledged by the authors? A: The model omits capital accumulation, so investment dynamics are absent. Cross-country production networks (global value chains) are not modeled, which the authors acknowledge limits the richness of the production structure relative to Baqaee-Farhi (2024) style models. Domestic financial markets are assumed frictionless. The model has no role for dollar dominance in the global economy, which may matter for exchange rate and capital flow dynamics in reality. These are flagged as directions for future research.

Q: What is the key result for permanent (rho=1) versus temporary (rho=0.9) fragmentation shocks? A: Under permanent shocks, output reductions become permanent rather than transitory. For the goods import tax, the effect on PPI inflation approaches zero in the permanent case, consistent with the closed-form prediction that the demand channel effect on PPI vanishes when the tax persists indefinitely (households no longer have an intertemporal substitution motive). The commodity tax permanent shock induces a larger and more persistent fall (rise) in production for commodity importers (exporters). Bond tax permanent shock has larger magnitude effects but is otherwise qualitatively similar to the temporary case.

Q: How does FraNK relate to the existing DSGE literature on sanctions and trade wars? A: The paper positions FraNK as providing a unified framework covering all three forms of fragmentation (goods, commodity, and financial) simultaneously, with nominal rigidities allowing for inflation analysis, closed-form analytical results for transparency, and a multi-country setup rather than small-open-economy. Ghironi et al. (2024) study sanctions in a three-country model but without nominal rigidities. Itskhoki and Mukhin (2022) analyze sanctions on Russia but in a small-open-economy. Attinasi et al. (2023) and Conteduca et al. (2024b) use richer production networks (Baqaee-Farhi) but are static and exclude financial fragmentation. FraNK trades production network richness for dynamics, nominal rigidities, financial fragmentation, and analytical tractability.

Geoeconomic fragmentation: A policy-driven reversal of economic integration, often guided by strategic or geopolitical considerations, operationalized in FraNK as simultaneous increases in taxes on rival countries’ goods imports, commodity imports, and bond purchases.

Fragmentation shock: A simultaneous increase in three tax rates — goods import tax (tau), commodity import tax (tau_O), and bond tax (theta) — applied by each bloc against the other, representing the policy instruments through which integration is reversed.

Demand channel (goods tax): The mechanism by which a goods import tax reduces aggregate demand, as households substitute away from now-more-expensive foreign goods, reducing output and — because this channel dominates the supply channel — lowering PPI inflation.

Supply channel (commodity tax): The mechanism by which a commodity import tax raises intermediate input costs for firms, increasing real marginal costs and PPI inflation while reducing output — a purely cost-push effect with no offsetting demand-side force.

Bond tax neutrality: Under symmetric calibration, capital controls on rival-bloc bonds are macroeconomically neutral because each country’s net foreign asset position is unchanged: the reduction in holdings of rival bonds is exactly matched by a reduction in own-bond issuance, leaving the IS curve and Phillips curve unaffected.

Expenditure-switching channel: The force by which fragmentation between two blocs diverts import demand toward untaxed third-country (neutral) goods, generating a positive demand spillover for NE countries that roughly offsets the global income channel.

Global income channel: The negative spillover to neutral countries arising from the reduction in world income caused by fragmentation between the taxing blocs, which reduces demand for all goods including those of neutral producers, approximately canceling the expenditure-switching channel.

Growth Experiences and Trust in Government

Mon, 01 Jan 0001 00:00:00 +0000

This paper investigates whether individuals who have experienced stronger GDP growth over their lifetimes are more likely to trust their national government. The authors — Besley, Dann, and Dray — assemble a newly harmonized global dataset comprising approximately 3.3 million respondents across 166 countries since 1990, drawn from 11 major opinion surveys (Afrobarometer, Americasbarometer, Arabarometer, Asiabarometer, European Social Survey, Gallup World Poll, Integrated Values Survey, Latinobarometer, Life in Transition Survey, South Asia Barometer, and World Justice Project). They supplement this with longer-run U.S. evidence from the American National Election Studies (ANES) going back to 1958, covering respondents born as early as the 1880s, and longitudinal Swiss evidence from the Swiss Household Panel (SHP) which allows individual fixed-effects estimation.

The core methodological contribution is the exploitation of country-cohort variation in lifetime GDP growth experiences. Following Malmendier and Nagel (2011), the authors construct a weighted average of past growth realizations across an individual’s lifetime, with weights decaying linearly over time (lambda = 1), so that more recent growth receives greater weight. The baseline specification includes country fixed effects, cohort-by-subcontinent fixed effects, survey-by-survey-year fixed effects, controls for log GDP per capita at year of birth, and individual characteristics (sex, marital status, education, religious denomination). More demanding specifications add country-by-survey-year and country-by-age fixed effects. For Switzerland, individual fixed effects are included, fully absorbing time-invariant personal characteristics.

The main finding is that a one standard deviation increase in lifetime GDP growth experience — corresponding to approximately 2 percentage points of additional growth — is associated with a 2.1 percentage point increase in the probability of trusting the national government, significant at the 1 percent level. This corresponds to roughly 0.042 standard deviations of the trust outcome and approximately 5 percent of the global mean trust in government. The effect is quantitatively meaningful: it approximates between one-quarter and one-half of the difference in average trust between older and younger cohorts in India and Italy, respectively. For the U.S. ANES sample, a one standard deviation increase in growth experience (about 0.2 percentage points) increases trust in the federal government by 2.4 percentage points, explaining more than two-thirds of the average trust gap between Baby Boomers (born 1946–1964) and Millennials (born 1981–1996).

Several scope conditions and heterogeneity findings sharpen the interpretation. First, the growth-trust link is specific to government institutions: there is no statistically significant effect of growth experience on interpersonal trust or trust in religious organizations, indicating the channel runs through perceptions of state performance rather than generalized social capital. Second, a recency heuristic operates: the linearly decaying weighting function (lambda = 1) outperforms both an unweighted lifetime average (lambda = 0) and a formative-years weighting. Growth experienced during formative years (ages 18–25) or before birth has no detectable effect on trust in government; the pre-birth result serves as a placebo test. Third, the positive growth-trust relationship is stronger in democracies than in autocracies, which the authors interpret as democracies producing citizens more responsive to government performance signals. Fourth, a “trust paradox” emerges: unconditionally, average trust in government is lower in democracies than in autocracies, and longer democratic experience is associated with lower trust, which the authors attribute to democratic institutions generating greater citizen skepticism about government performance. Fifth, core results are robust to controlling for other lifetime politico-economic experiences including inflation, banking and currency crises, epidemics, political unrest, executive turnover, stock market returns, and income inequality. The Swiss evidence further shows that private income growth experience does not drive the result — only aggregate macroeconomic growth does.

Q: What is the paper’s core quantitative finding on the growth-trust relationship? A: Using the global harmonized dataset of 3.3 million respondents across 166 countries, a one standard deviation increase in lifetime GDP growth experience (corresponding to approximately 2 percentage points of additional growth) is associated with a 2.1 percentage point increase in the probability of trusting the national government, significant at the 1 percent level. Using only the Gallup World Poll subsample (roughly half the observations), the estimated effect is somewhat larger at 3.6 percentage points per standard deviation increase. These estimates remain statistically significant under more demanding specifications with country-by-survey-year and country-by-age fixed effects, though the magnitudes decrease as these interacted fixed effects absorb variation in recent growth experiences.

Q: How do the authors measure individual lifetime growth experience? A: The growth experience variable is a weighted average of all past annual GDP per capita growth rates since an individual’s birth, with weights that decay linearly over time (lambda = 1 in the Malmendier-Nagel framework). Under this parameterization, the measure simplifies to how much recent economic performance (in the year prior to the survey) exceeds the long-run mean over the respondent’s lifetime, scaled by the respondent’s midpoint of life. This implies younger individuals are more sensitive to recent growth outcomes because their shorter life histories give recent events relatively greater weight. The authors validate this lambda = 1 choice via a grid search over alternative weighting structures using minimum residual sum of squares as the criterion.

Q: How is reverse causality addressed? A: The empirical strategy identifies the relationship using past, cumulative growth experiences measured prior to the survey, so current trust in government cannot cause past growth. Survey-year fixed effects absorb all aggregate time trends simultaneously affecting trust and growth. The authors also conduct a placebo test showing that GDP growth occurring before an individual’s birth has a precisely estimated null effect on their trust in government, which would not be the case if unobserved societal trends were jointly driving both growth histories and political perceptions.

Q: Does growth experience affect interpersonal trust or trust in non-state institutions? A: No. The estimated coefficient on lifetime growth experience is statistically insignificant at conventional levels when interpersonal trust replaces trust in government as the dependent variable, with narrow confidence intervals indicating a precisely estimated null. Similarly, growth experience has no systematic effect on trust in religious organizations such as churches or mosques. The authors interpret these null results as evidence against the alternative explanation that broad modernizing social changes are jointly driving both growth experiences and political trust.

Q: What do the U.S. ANES results add? A: The ANES data, which extends back to 1958 and captures cohorts born as early as the 1880s, provide a within-country test controlling for state fixed effects, generation dummies, and rich individual characteristics including partisan affiliation and partisan strength. A one standard deviation increase in U.S. growth experience (approximately 0.2 percentage points) raises trust in the federal government by 2.4 percentage points, significant at the 1 percent level. This estimate is quantitatively large enough to explain more than two-thirds of the average trust gap between Baby Boomers and Millennials. Results are robust to adding state-by-survey-year fixed effects and birth-state-by-generation fixed effects, and hold for a broader “trust in government index” covering beliefs about waste, corruption, and responsiveness of the federal government.

Q: What do the Swiss Household Panel results contribute? A: The SHP allows individual fixed-effects estimation, exploiting within-person changes in growth experience and trust over time from 1999 onward, which absorbs all time-invariant individual characteristics that could confound the global and U.S. cross-cohort results. The growth experience coefficient remains positive and significant, with a one standard deviation increase yielding a 1.9 percentage point increase in trust in the Swiss federal government (significant at the 1 percent level). The Swiss data also uniquely allow the authors to test whether personal income growth experience drives the result; they find no significant effect of private income growth experience on trust in government, only aggregate macroeconomic growth matters.

Q: Does the recency heuristic hold — does growth in formative years matter? A: No. The authors find no detectable effect of growth experienced specifically during formative years (ages 18–25) on trust in government. Additionally, in a grid-search exercise assessing model fit across different lambda values, the linearly decaying weighting scheme (lambda = 1, giving more weight to recent growth) outperforms both equal-weighted lifetime averages (lambda = 0) and weighting schemes that emphasize earlier life experiences (lambda less than 0). The pre-birth placebo result (null effect) and the absence of a formative-years effect together indicate that the operative mechanism is about evaluating current government performance based on recent macroeconomic experience, not the imprinting of long-lasting political dispositions during youth.

Q: What is the “trust paradox” and how is it documented? A: The trust paradox refers to the empirical finding that average trust in government is lower in democracies than in autocracies at the cross-country level, and that longer experience with democratic institutions within countries is associated with lower levels of trust in government in the micro data. This is counterintuitive given the standard view that good institutions should foster confidence in government. The authors suggest the paradox likely reflects democracies cultivating greater citizen skepticism and more critical judgment of government performance, rather than indicating that democratic governance actually performs worse. Importantly, the positive effect of growth experience on trust remains present in democracies, and the growth-trust relationship is actually stronger in democratic regimes, consistent with citizens in democracies being more responsive to government performance signals.

Q: How is the growth-trust finding related to corruption perceptions and living standards? A: Using the Gallup World Poll, the authors find that stronger lifetime growth experience is associated with lower perceived corruption in government, greater satisfaction with personal living standards, and higher likelihood of feeling one lives comfortably on one’s present income. These results are consistent with citizens attributing economic success to government competence and integrity, and with growth translating into perceptions of improved personal circumstances through both direct income effects and indirect public goods provision.

Q: Are the results robust to controlling for other lifetime politico-economic experiences? A: Yes. When the authors include lifetime experience measures for political unrest, executive turnover, epidemic exposure, banking crises, currency crises, and inflation (both levels and volatility) simultaneously in equation (3), the growth experience coefficient remains consistently positive, stable, and significant across all specifications. Among the other experience variables, only lifetime unrest and epidemic exposure are independently negative and statistically significant at conventional levels. F-tests reject the null hypothesis that the crisis and growth experience coefficients are equal in magnitude. The U.S. results are also robust to adding lifetime experiences with S&P 500 returns, unemployment, and top-income-share inequality measures.

Q: What are the policy implications of the findings? A: The authors note that sustained economic growth may itself be a mechanism for building political trust, with positive downstream effects for policy compliance — a connection they document has been relevant during the COVID-19 pandemic (where higher-trust societies showed lower mobility during lockdowns and higher vaccine acceptance). The growth-trust channel could have implications for increasing compliance across a range of policy domains including climate action and tax morale. Governments that deliver sustained economic growth can expect citizens to update their trust upward, particularly in democracies where citizens are more performance-responsive, while governments that preside over stagnation or contraction face predictable erosion of political legitimacy across cohorts.

Growth experience: A weighted average of all past annual GDP per capita growth realizations since an individual’s birth, with weights that decay linearly over time following Malmendier and Nagel (2011), so that more recent growth receives greater weight. Under the paper’s preferred parameterization (lambda = 1), the measure equals how much last year’s GDP per capita exceeds the respondent’s lifetime mean, scaled by the respondent’s midpoint of life.

Trust in government: A binary dummy variable equal to one if a survey respondent expresses “a great deal” or “quite a lot” of trust or confidence in the national government, constructed from harmonized responses across 11 major opinion surveys. The paper treats this as reflecting respondents’ perceptions of government performance rather than a deep interpersonal trust relationship.

Trust paradox: The empirical regularity documented in the paper whereby average trust in government is unconditionally lower in democracies than in autocracies at the cross-country level, and whereby longer democratic experience within countries is associated with lower individual trust in government. The authors attribute this to democratic institutions generating more critical citizen judgment of government performance.

Recency heuristic: The finding that more recent growth experiences carry greater weight in forming trust in government, as captured by the linear decay weighting scheme (lambda = 1) outperforming equal-weighted or early-life-weighted alternatives. Growth before birth and growth during formative years (ages 18–25) have no detectable effect, while recent macroeconomic performance is the operative signal.

Cohort-level variation: The within-country differences in lifetime growth experiences across birth cohorts that form the paper’s primary identification strategy. Because different cohorts in the same country have lived through different sequences of growth episodes, differences in trust across cohorts within a country can be attributed to differential growth exposure rather than time-invariant country characteristics.

Formative years effect: The hypothesis, tested and rejected in the paper, that economic experiences during ages 18–25 have a lasting imprint on political attitudes analogous to formative-years effects found in other political behavior literatures. The paper finds no statistically significant association between growth experienced during these years and trust in government.

Source text origin: In the pipeline context relevant to this paper’s acquisition, this refers to whether a summary was generated from full working paper text (“pdf” or “oa-html”) versus abstract only (which is hard-blocked). The working paper was obtained from LSE Research Online (eprint 129614), classified as published version under CC BY 4.0.

Health Shocks, Health Insurance, Human Capital, and the Dynamics of Earnings and Health

Mon, 01 Jan 0001 00:00:00 +0000

Capatina and Keane build and calibrate a life-cycle model of labor supply and savings for U.S. men that incorporates health shocks, endogenous human capital accumulation via learning-by-doing, employer-sponsored health insurance (ESHI), means-tested social insurance, and endogenous medical treatment decisions. The model is calibrated to White males using the Medical Expenditure Panel Survey (MEPS) for 2000–2013, supplemented by CPS, HRS, and PSID data; separate calibrations are presented for Black and Hispanic men with high school or less education.

The paper’s central research question is how health shocks affect labor supply, earnings, and earnings inequality over the life cycle, and through which mechanisms. Four channels are identified and quantified: (1) the direct labor supply effect — sick days and reduced tastes for work caused by health shocks; (2) the human capital effect — reduced work experience from health-shock-induced employment exits, which deteriorates future job and wage offers in a snowball dynamic; (3) the health-productivity effect — reduced functional health directly lowering wage offers; and (4) the behavioral effect — anticipation of health risk induces low-skill workers lacking ESHI to curtail labor supply to maintain means-tested transfer eligibility.

The key quantitative findings from eliminating serious health shocks for working-age men (ages 25–64) are: the expected present value of lifetime earnings (PVE) for White men rises by 11% on average, and inequality in PVE falls by 12% (coefficient of variation). For White men with high school or less education the increase in PVE is 17.9%. For the typical White male the four channels contribute 5.7%, 2.7%, 1.4%, and 0.8% respectively. For low-skill White high school men the same channels contribute 10.7%, 14.8%, 1.3%, and 9.8% — with the human capital and behavioral effects dramatically larger for the low-skill group. For comparison, a severe health shock at age 40 reduces the present value of remaining lifetime earnings by 5.6% (approximately $53.9k) for a typical college man and by 11.5% (approximately $55.0k) for a typical high school man.

Human capital amplification operates through employment persistence: a major health shock causes full-time employment to drop by 12 percentage points one year after the shock for the average man, and by 20 percentage points for high school men, with recovery still incomplete eight years later (employment remains 7.8 pp and 10 pp below baseline, respectively). Holding human capital fixed as in the pre-shock baseline causes employment to recover quickly, confirming that persistent wage-offer deterioration is the mechanism.

On health insurance policy, the model evaluates providing public insurance to all workers lacking ESHI. This substantially increases medical utilization, improves health and life expectancy (survival to age 65 rises from 82% to 87% when health shocks are eliminated, as a related benchmark), reduces Medicaid and free-care costs, and raises labor supply among low-skill workers by weakening means-tested transfer incentives. The net program cost in a balanced budget simulation is modest, and all agent types are ex ante better off. By contrast, expanding Medicaid access creates perverse labor supply disincentives — workers reduce labor supply to maintain eligibility — does little to improve health, and makes almost all agents worse off in a balanced budget scenario.

Scope conditions: the primary calibration covers non-institutionalized civilian White males; results for Blacks and Hispanics are presented only for the high school or less education group due to small samples. The model period ends at 2013, before ACA implementation.

Q: What is the model’s overall estimate of how much health shocks reduce lifetime earnings for White men? A: Eliminating serious health shocks at working ages (25–64) would increase the expected present value of lifetime earnings (PVE) for the average White male by 11% and reduce inequality in PVE by 12% as measured by the coefficient of variation. For White men with high school or less education the PVE gain is larger at 17.9%.

Q: What are the four channels through which health shocks affect earnings, and how large is each for the average White male versus a low-skill high school male? A: The four channels are (1) direct labor supply via sick days and reduced tastes for work, (2) human capital deterioration from lost work experience worsening future job/wage offers, (3) reduced health productivity lowering wage offers, and (4) behavioral responses to health risk reducing labor supply to preserve transfer eligibility. For the average White male the contributions to PVE are 5.7%, 2.7%, 1.4%, and 0.8%, respectively. For low-skill White high school men the same channels contribute 10.7%, 14.8%, 1.3%, and 9.8% — the human capital and behavioral effects are roughly five to twelve times larger for the low-skill group.

Q: Why is the human capital effect so much larger for low-skill high school men than for college men? A: Low-skill high school men are much more likely to exit full-time employment following a major health shock and are slow to return. Lifetime work years decline by 1.89 for the typical high school man versus only 0.84 for the typical college man following a major shock at age 40. Because job offer probabilities depend on lagged employment, absence from the labor market creates a snowball effect that persistently depresses offer quality; human capital accounts for 42% of the earnings decline for high school men versus 34% for college men.

Q: How does the paper characterize the persistent employment effects of a major health shock? A: For the average man, full-time employment drops by 12 percentage points one year after a severe shock and remains 7.8 pp below baseline after eight years. For high school men the initial drop is 20 pp, still 10 pp below baseline after eight years; for college men the figures are 7 pp and 3 pp. When human capital is held fixed at the pre-shock baseline — so wage and job offers do not deteriorate due to lost experience — employment recovers quickly for workers of all skill levels, confirming the human capital mechanism drives the persistence.

Q: How does the behavioral effect operate for low-skill workers? A: Workers without ESHI who face health risk have an incentive to maintain sufficiently low income and assets to qualify for means-tested social insurance, which provides a consumption floor approximating Medicaid, Food Stamps, SSDI, and SSI. This perverse incentive leads low-skill workers to curtail labor supply preemptively. When health risk is eliminated, this incentive disappears and labor supply rises, generating the behavioral effect of 9.8% of PVE for low-skill high school men versus only 0.8% for the average White male.

Q: How does the paper correct for under-reporting of health shocks among the uninsured? A: The measurement model assumes health shocks are correctly measured for the treated, but uninsured workers who do not seek treatment only record a shock with a shock-specific probability less than one. A key identifying assumption is that, conditional on health status, risk factors, age, and education, the true frequency of health shocks does not differ by insurance status per se — ruling out ex ante moral hazard. The measurement model parameters are calibrated to match observed frequencies of health shocks and high risk in MEPS for the uninsured.

Q: What does the model estimate regarding the effect of a severe health shock on cumulative earnings relative to existing reduced-form evidence? A: The model predicts an average cumulative (non-discounted) earnings loss of $42.8k over ten years following a severe shock for men aged 50, compared with Smith’s (2004) estimate of $37k from the HRS. The paper argues Smith’s estimate identifies effects on workers who actually experience shocks, who are a selected sample with low baseline earnings (as untreated shocks are more likely to be severe, and non-treaters tend to have low earnings). The model’s “average effect” — comparing a world where everyone experiences the shock to one where no one does — yields a substantially higher loss of $59.8k.

Q: What are the key findings from the public insurance experiment (providing insurance to the uninsured)? A: Providing public insurance to all workers lacking ESHI substantially increases medical utilization among the previously uninsured, who are intrinsically less healthy. This improves health and life expectancy, raising Social Security costs. However, it also generates positive labor supply incentives for low-skill workers (reducing their reliance on means-tested transfers), substantially reduces Medicaid and free-care costs, and increases tax revenue. On balance, the net program cost in a balanced budget simulation is modest, and all types of workers are ex ante better off.

Q: Why does expanding Medicaid access produce perverse results in contrast to providing public insurance? A: Medicaid is means-tested, so expanded access requires workers to maintain sufficiently low income and assets to remain eligible. This creates disincentives to work and save — workers reduce labor supply to preserve eligibility. The result is reduced earnings, lower tax revenue, little improvement in health (as access to care depends on maintaining low income), and almost all agents being worse off in a balanced budget scenario.

Q: What role does insurance play beyond consumption smoothing in this model? A: Beyond lowering out-of-pocket (OOP) costs and smoothing consumption, insurance grants access to care: in the US system, proof of insurance is often required before treatment, so uninsured workers may not have the option to treat at all. The model captures three distinct option sets for the uninsured — all options available, treatment not available, or default not available — each motivated by different real-world contexts. Non-treatment worsens health transition probabilities, so the access-granting role of insurance independently affects health trajectories beyond its cost-reducing role.

Q: What explains the observed positive association between education, income, insurance, and health transitions in the data, and how does the model generate this without education entering the health production function directly? A: The association between education and health is largely driven by the positive correlation between education and latent health types; controlling for latent health type in a descriptive logit largely eliminates the education coefficient. The association between insurance and health transitions is driven by the fact that the insured are more likely to receive treatment; controlling for treatment and true shocks eliminates the insurance coefficient. Education affects health indirectly through its effects on treatment decisions — via wages, job offers with ESHI, and consumption capacity — without appearing as a direct argument in the health production function.

Q: How large are the effects of health shocks on key population health statistics according to the model? A: Eliminating serious health shocks at working ages would increase the fraction of working-age men in good health from 60% to 75% and raise the probability of survival to age 65 from 82% to 87%. Average annual sick days of 16.42 would be eliminated, implying a 6% increase in work days for employed workers and an employment rate increase from 88% to 91%. Average annual medical costs would fall from $4,618 to $1,132.

Q: How do the results for Black and Hispanic men compare to White men? A: The results are qualitatively similar, but the magnitudes for Black men are somewhat larger. Eliminating health shocks would raise PVE for Whites, Blacks, and Hispanics with high school or less education by 17.9%, 23.7%, and 17.7%, respectively. Separate access-to-care probabilities are calibrated for each group, reflecting racial disparities in access that explain part of the observed differences in health outcomes and treatment rates.

Q: What is the role of the consumption floor (means-tested social insurance) in shaping equilibrium outcomes for low-skill workers? A: The consumption floor guarantees a minimum household consumption level approximating Medicaid, Food Stamps, SSDI, and SSI. It shields low-skill workers from the full cost of health shocks, reducing both the consumption-smoothing value of ESHI and precautionary saving incentives. However, it also creates a powerful disincentive for low-skill workers without ESHI to work, as earning above the eligibility threshold would eliminate benefits. This mechanism amplifies earnings inequality by generating perverse labor supply behavior concentrated among low-skill, uninsured workers.

Functional Health (H): A discrete stock variable (Poor, Fair, or Good) measuring aspects of health that directly affect worker productivity and tastes for work; distinguished from asymptomatic health risk. Transitions depend on lagged health, latent health type, age, persistent health shocks, and whether shocks are treated.

Asymptomatic Health Risk (R): A binary state (low or high) capturing risk factors such as obesity, high cholesterol, and hypertension that increase the probability of future health shocks but do not affect current productivity.

Human Capital Effect: The channel by which health shocks reduce lifetime earnings not directly but indirectly — by causing employment exits that slow work experience accumulation, which in turn deteriorates future job offer probabilities and wage offers in a persistent, self-reinforcing (snowball) dynamic.

Behavioral Effect: The reduction in labor supply — and associated earnings loss — that occurs because workers facing health risk and lacking ESHI have an incentive to keep income and assets low enough to maintain eligibility for means-tested social insurance, even absent any contemporaneous health shock.

Tied Wage-Hours-Insurance Offer: The model’s labor market structure in which employment offers jointly specify a wage rate, hours (no offer, part-time, or full-time), and whether the offer includes ESHI; workers accept or reject the bundle rather than choosing hours and insurance independently.

Source Text Origin: The paper’s own term distinguishing how the full text of a paper was obtained (PDF, OA-HTML, or abstract-only); used in the summarization pipeline. [Note: this concept is from the summarization pipeline metadata, not from the paper itself — omitting.]

Treatment/Payment Options: The set of decisions available to a worker after a health shock occurs — whether to seek treatment and, if treated, whether to pay the out-of-pocket cost or default on bills. The available choice set differs by insurance status and context: the uninsured may face denial of access (option to treat unavailable) or required prepayment (default unavailable), or may have all options including free care.

Latent Health Type: An unobserved permanent individual characteristic capturing innate biological resilience and pre-age-25 health investments; determines baseline transition probabilities for functional health conditional on shocks. Positively correlated with latent skill type within education groups.

Hedge funds and the Treasury cash-futures basis trade

Mon, 01 Jan 0001 00:00:00 +0000

The U.S. Treasury market is the deepest and most liquid fixed-income market in the world, yet in March 2020 it experienced unprecedented dysfunction—widening bid-ask spreads, skyrocketing repo rates, and diverging arbitrage spreads that prompted massive Federal Reserve intervention. This paper documents the rise and near-collapse of the Treasury cash-futures basis trade—an arbitrage strategy among hedge funds exploiting a persistent disconnect between cash Treasury prices and futures prices—as a central feature of that episode. Using regulatory datasets on hedge fund exposures and repo transactions, the authors show that at its peak the basis trade accounted for an estimated $400–$500 billion in positions, constituting more than 60% of total hedge fund Treasury exposure, more than 70% of hedge fund repo borrowing, and more than 25% of primary dealers’ repo lending. A model and empirical evidence link the trade’s growth after 2016 to broader Treasury market developments, and show how the trade’s reliance on short-term repo financing creates both margin risk and rollover risk. In March 2020 many of these risks materialized, though the unwinding of basis positions was likely a consequence rather than the primary cause of the stress; prompt Federal Reserve intervention may have prevented a liquidity spiral.

In depth

Q1. What is the Treasury cash-futures basis trade and why did it become popular?

The basis trade exploits the arbitrage relationship Pₜ,τ = ΣBₜ,ₛcₛ + Bₜ,T Fₜ,τ,T: when futures prices are too high relative to the present value of the deliverable bond, traders go “long the basis” by buying the cash bond and shorting the futures, financing the long position in the overnight repo market. The trade became popular following 2016 as demand for long Treasury futures positions grew (from institutional investors seeking leveraged duration exposure) while the supply of warehousing capacity from dealers contracted under post-crisis regulatory constraints. Hedge funds stepped in as the marginal warehouser, exploiting the resulting premium embedded in futures prices. The trade is nearly zero net-cash but requires continuous repo rollover.

Q2. How large did the trade become and how was its size estimated?

Using regulatory data—specifically CFTC Form 40 (hedge fund futures positions), SEC Form PF (AUM and derivatives exposures), and FR 2004 (primary dealer repo data)—the authors estimate basis trade positions peaked at $400–$500 billion, comprising more than 60% of hedge fund Treasury exposure, more than 70% of hedge fund repo borrowing, and more than 25% of primary dealer repo lending to hedge funds. The data allow the authors to identify basis positions directly, distinguishing them from outright long Treasury positions, by matching the simultaneous long cash / short futures pattern that defines the trade. The estimates underscore that hedge funds had become systemically important participants in Treasury market intermediation.

Q3. What financial stability risks does the basis trade create?

The basis trade creates two interrelated risks: margin risk (variation margin calls on futures positions can force immediate liquidation) and rollover risk (if repo lenders withdraw funding, the cash Treasury position must be sold). The paper’s model formalizes how limits to arbitrage—specifically repo market illiquidity and margin requirements—impair risk-sharing between dealers and holders of long futures positions. These constraints mean that even a moderate adverse price move can trigger a self-reinforcing cycle: higher basis volatility → margin calls → forced sales → further basis widening → further margin calls.

Q4. What happened in March 2020 and what was the Federal Reserve’s role?

Beginning in early March 2020, the COVID-19 pandemic triggered a “dash for cash” that disrupted Treasury market functioning: bid-ask spreads widened dramatically, repo rates spiked, and the cash-futures basis moved sharply against basis traders, generating large margin calls. The authors find that while Treasury market disruptions spurred hedge funds to sell Treasuries, the unwinding of the basis trade was likely a consequence rather than a primary cause of the stress. The Federal Reserve intervened by dramatically expanding Treasury purchases from dealers and offering unlimited repo and reverse repo facilities, which likely prevented a liquidity spiral by removing the constraint on dealer intermediation capacity. The paper argues this episode highlights structural vulnerabilities in Treasury market intermediation arising from the shift of warehousing capacity to hedge funds.

Key concepts

Treasury cash-futures basis trade : an arbitrage strategy in which a trader simultaneously holds a long position in cash Treasury bonds (funded via repo) and a short position in Treasury futures, profiting from the convergence of cash and futures prices at delivery.

warehousing role of hedge funds : the function of holding Treasury bonds on behalf of institutional investors who want long futures exposure, financed in the repo market; this creates a link between Treasury, futures, and repo markets and exposes the system to repo rollover and margin risk.

rollover risk : the risk that short-term repo lenders decline to roll over funding at maturity, forcing the borrower to sell the collateral asset (Treasury bonds) at potentially distressed prices.

Heterogeneous innovations and growth under imperfect technology spillovers

Mon, 01 Jan 0001 00:00:00 +0000

Layer 1 — Overview

Research Question. Jo and Kim ask two related questions: (1) How do firms use different types of innovation when learning others’ technology takes time? (2) How does this process alter the aggregate implications of firm innovation, particularly in the context of increasing competition?

Model. The paper develops a discrete-time infinite-horizon endogenous growth model with multi-product firms pursuing two types of innovation — “own-innovation” (improving existing product quality) and “creative destruction” (entering new product markets by displacing incumbents) — subject to a novel friction called “imperfect technology spillovers.” The friction takes the specific form of lagged learning: creative destruction builds on the one-period-lagged technology of the target market’s incumbent, while only the incumbent can observe the current frontier technology level. This one-period lag creates a technology gap (Δ = q_t / q_{t−1}) between the incumbent’s frontier and the level available to rivals. Four possible technology gap values arise in equilibrium: Δ₁ = 1 (no gap), Δ₂ = λ (one successful own-innovation), Δ₃ = η (one successful creative destruction), and Δ₄ = η/λ. The step sizes satisfy λ² > η > λ, meaning a single creative destruction improves quality more than a single own-innovation, but two consecutive own-innovations dominate a single creative destruction.

Key Mechanisms. The learning friction generates two novel mechanisms. First, the “market-protection effect”: incumbents with a technology advantage (Δ > 1) intensify own-innovation to widen the gap and protect their product lines when competitive pressure rises. Formally, own-innovation probability is highest for Δ₂ products and declines monotonically (z₂ > z₃ > z₄ > z₁), and ∂z₂/∂x > ∂z₃/∂x > 0 while ∂z₁/∂x < 0, conditional on value coefficients. Second, the “technological barrier effect”: higher overall own-innovation and creative destruction intensity widens the average technology gap across products, reducing rivals’ conditional probability of successfully taking over a product market. This is distinct from the standard Schumpeterian effect (lower expected future profits) and from the escape-competition effect in step-by-step models (which apply only to neck-and-neck, single-product firms).

Data and Empirical Strategy. The empirical analysis combines the USPTO PatentsView database, the Longitudinal Business Database (LBD), the Longitudinal Firm Trade Transactions Database (LFTTD), the Census of Manufactures (CMF), Compustat, and NBER-CES data, covering the universe of U.S. patenting firms from 1976 to 2016, with main analyses from 1982 to 2007. Own-innovation is proxied by the self-citation ratio of patents (the ratio of self-citations to total backward citations); creative destruction by new products added and low-self-citation patents. Exogenous competitive pressure comes from China’s WTO accession in 2001, instrumented by the industry-level NTR tariff gap (the gap between non-NTR and NTR rates in 1999) following Pierce and Schott (2016).

Empirical Findings. Pre-shock (1982–1999): patents with lower self-citation ratios (closer to creative destruction) have significantly longer backward citation gaps (coefficient −2.29 to −2.59, p < 0.01 across specifications), confirming that learning others’ technology takes more time. Creative-destruction-type patents also have higher market value (Kogan et al. stock return measure) and scientific value (forward citations), with self-citation ratio negatively associated with both (e.g., coefficient on self-citation for market value: −0.289 without firm FE; −0.110 with firm FE, p < 0.01). Conditional on patenting, higher self-citation ratios are negatively associated with employment growth (coefficient −0.256, p < 0.05), number of industries added (−0.158, p < 0.05), and products added (−0.274, p < 0.01).

Post-shock (DID): foreign competition had no statistically significant effect on overall patent counts, but firms with above-average innovation intensity in industries with high NTR gaps significantly increased their self-citation ratio — indicating a shift toward own-innovation. The triple-interaction coefficient is 0.795 (p < 0.01) with baseline controls. For a firm with average lagged innovation intensity (0.18) in an industry with an average NTR gap (0.291), this corresponds to a 4.2 percentage point increase in the seven-year growth rate of the self-citation ratio, representing a 15.0% increase relative to the average growth rate of 28.2 percentage points. Consistent with the technological barrier effect, firm entry rates are lower in industries with higher TFPR-skewness-based technological barriers (coefficient −0.012 to −0.016, p < 0.05).

Quantitative Analysis. Calibrated to the U.S. manufacturing sector in 1992, the model matches six target moments including average number of products (2.3), products added (0.3), firm entry rate (7.6%), average productivity growth (1.9%), high-growth-firm employment growth (22.5%), and import penetration (15.3%). Creative destruction contributes approximately 1.88 times more to growth per unit than own-innovation (step size ratio 0.075/0.04). The aggregate R&D-to-sales ratio (untargeted) is 4.6% in the model vs. 4.1% in data.

A counterfactual increasing outside entrants by 83% (matching the rise in import penetration from 15.3% to 25.1% between 1992 and 2007) generates a 1.51% increase in aggregate creative destruction arrival rate x, but firm-level creative destruction probability falls 1.33% and startup creative destruction also falls 1.33%. The aggregate R&D-to-sales ratio falls 1.6% and creative destruction R&D intensity falls 1.2%. Average domestic productivity growth declines 11.0%, with growth from creative destruction falling 13.0% and growth from domestic startups falling 1.7%. The total mass of domestic firms falls 6.4%.

In economies with creative destruction costs 80 times higher than the U.S. baseline, the same competitive pressure shock raises rather than lowers total R&D (by 1.0%), but domestic growth still falls 9.7%, because the marginal decline in creative destruction impedes the growth contribution and firm entry even when aggregate innovation spending rises.

In depth

Q1. What is the key friction that distinguishes this model from the existing multi-product firm literature (e.g., Klette and Kortum 2004; Akcigit and Kerr 2018)?

A: The key friction is “imperfect technology spillovers,” modeled as lagged learning: creative destruction can only build on the one-period-lagged technology of the target product (q_{j,t−1}), while the product’s current owner observes the frontier technology (q_{j,t}). In models without this friction — such as Akcigit and Kerr (2018) — rivals can instantly learn and copy frontier technology, so firms have no technological advantage and cannot protect their markets. In the current model, own-innovation by the incumbent widens the gap between q_{j,t} and q_{j,t−1}, creating a barrier that a rival must overcome even after successful creative destruction. This makes own-innovation an endogenous function of the technology gap, a feature absent from existing multi-product firm frameworks.

Q2. Why does the model predict that own-innovation increases with the technology gap up to a point, then decreases?

A: From Corollary 1, the ordering z₂ > z₃ > z₄ > z₁ reflects competing forces. Products with gap Δ₂ = λ gain the most from additional own-innovation in terms of reducing the probability of losing the product line (equation 2), so own-innovation is highest there. Products with Δ₃ = η or Δ₄ = η/λ already have substantial technological advantages from prior creative destruction, so the marginal value of own-innovation in reducing market loss probability is lower. Products with Δ₁ = 1 have no advantage at all: if a rival succeeds in creative destruction, the incumbent loses the product regardless of own-innovation (equation 1), so z₁ is lowest. Beyond a certain gap level, the incumbent is sufficiently protected that additional own-innovation has diminishing returns in deterrence.

Q3. What is the market-protection effect formally, and for which products is it strongest?

A: The market-protection effect (Corollary 2) is the positive response of a firm’s own-innovation to an increase in the aggregate creative destruction arrival rate x, conditional on the value coefficients A₁ and A₂ being fixed. It is strongest for products with Δ₂ = λ (∂z₂/∂x is the largest and positive), positive but weaker for Δ₃ = η (∂z₃/∂x > 0), of ambiguous sign for Δ₄ = η/λ, and negative for Δ₁ = 1 (∂z₁/∂x < 0). The asymmetry reflects the asymmetric payoff to own-innovation across gap levels: for Δ₂ products, successful own-innovation can turn a losing situation into a winning one because it shifts the technology gap from Δ₁ to Δ₂ from the rival’s perspective, effectively defeating the rival’s creative destruction attempt. This mechanism provides a micro-foundation for why frontier firms (like Google or NVIDIA) keep innovating intensely despite their technological leads, a pattern the standard step-by-step model cannot explain.

Q4. What is the technological barrier effect and how does it differ from the Schumpeterian effect?

A: The technological barrier effect refers to the reduction in rivals’ incentive for creative destruction caused by an increase in the average technology gap across product lines. When incumbents do more own-innovation or when outside firms do more creative destruction, the distribution of technology gaps shifts rightward (density at Δ₁ falls; density at Δ₂, Δ₃, Δ₄ rises). This raises the average technology barrier rivals must overcome to successfully take over a product market, reducing the conditional takeover probability x^{takeover} and the expected value of creative destruction B. In the U.S. counterfactual, the technological barrier effect accounts for 17.0% of the total change in the aggregate creative destruction rate x and 15.0% of the change in startup creative destruction x_e. In contrast, the Schumpeterian effect refers to the reduction in expected future profits from owning a product due to increased displacement risk (through the value coefficient A₂), a mechanism present in standard quality-ladder models. Both operate simultaneously but the technological barrier effect is a novel feature of this framework.

Q5. How is own-innovation vs. creative destruction measured empirically, and what validates this measure?

A: The self-citation ratio (the share of a patent’s backward citations that cite the same assignee’s earlier patents) is used as the primary measure: a higher ratio indicates greater reliance on the firm’s own prior knowledge, hence a higher probability that the innovation improves an existing product line (own-innovation). This is validated empirically in three ways. First, patents with lower self-citation ratios have significantly larger backward citation gaps (coefficient −2.29 to −2.59 across fixed-effect specifications on 728,721 observations), consistent with creative destruction requiring more time to learn others’ technology. Second, lower self-citation patents have higher market value and scientific value (forward citations), consistent with η > λ (creative destruction contributes more per event to quality). Third, firm-level regressions show that lower self-citation ratios are associated with higher employment growth, more products added, and more industries entered, consistent with creative destruction contributing more to firm expansion.

Q6. How does the DID identification strategy work, and what are the main results?

A: The identification exploits the removal of trade policy uncertainty (TPU) after China’s WTO accession in 2001. The treatment variable is the industry-level NTR gap (the gap between non-NTR and NTR tariff rates in 1999): industries with larger gaps experienced a larger reduction in uncertainty and thus a greater increase in Chinese import competition. The DID compares patenting firms across periods (1992–1999 vs. 2000–2007) and across high- vs. low-NTR-gap industries, with a triple interaction for firm-level innovation intensity (lagged five-year average patents per employee, normalized within two-digit NAICS). The main finding (Table 4): the NTR gap × Post interaction has no significant effect on overall patent counts (coefficient 0.238 without controls, standard error 0.237), but the triple interaction (NTR gap × Post × innovation intensity) has a positive and significant effect on the growth rate of the self-citation ratio (0.732 without controls, p < 0.05; 0.795 with baseline controls, p < 0.01). This implies that innovation-intensive firms in high-competition industries shifted their composition toward own-innovation, while overall patenting was unchanged — consistent with an offsetting rise in own-innovation and fall in creative destruction.

Q7. What are the aggregate growth effects of increasing competitive pressure in the calibrated model?

A: Using an 83% increase in outside entrants (matching the 1992–2007 rise in import penetration from 15.3% to 25.1%), average domestic productivity growth falls 11.0%. Decomposing: growth from domestic own-innovation falls 11.4%, growth from domestic creative destruction falls 13.0%, and growth from domestic startups falls 1.7% (Table 9). The aggregate R&D-to-sales ratio falls 1.6% and the creative destruction R&D intensity falls 1.2%, indicating that the decline in creative destruction R&D outweighs the rise in own-innovation R&D. The total mass of domestic firms falls 6.4% and the average number of products per firm falls 5.5%.

Q8. How do results differ in economies with high creative destruction costs vs. the U.S.?

A: When creative destruction costs (χ̃) are set 80 times higher than the U.S. baseline, the initial equilibrium has much lower creative destruction: R&D-to-sales ratio is 1.39% (vs. 4.58% in U.S.), creative destruction R&D intensity is 8.6% (vs. 63.9%), average number of products is 1.0 (vs. 2.3), and average domestic productivity growth is 1.4% (vs. 1.9%). Under the same competition shock, total R&D actually rises by 1.0% in this high-CD-cost economy (because own-innovation increases more than creative destruction falls, given the already low baseline of creative destruction), in contrast to the −1.6% in the U.S. However, domestic growth still falls 9.7% even in this economy, driven by reductions in creative destruction by incumbents and startups combined with a decline in the mass of domestic incumbents. This result holds even with a fixed firm mass (Table E5), confirming the mechanism is not solely due to entry/exit dynamics.

Q9. What is the technological barrier effect’s quantitative contribution to the decline in creative destruction?

A: In the U.S. counterfactual (Table 8 and associated decomposition), 17.0% of the total change in the aggregate creative destruction arrival rate x and 15.0% of the total change in startup creative destruction x_e are attributable specifically to the technological barrier effect — that is, to the shift in the technology gap distribution µ(Δℓ) holding all else equal. The conditional takeover probability x^{takeover} declines from 73.2% to 73.0%. The density at Δ₁ (the easiest gap to overcome) falls 0.4%, while densities at Δ₃ and Δ₄ rise 1.1% and 1.4% respectively, driven by increased creative destruction by outside firms and intensified own-innovation by incumbents.

Q10. What are the policy implications the paper draws from its framework?

A: The paper argues that policies evaluating innovation should account for composition, not just aggregate R&D levels or patent counts. Increased overall innovation driven by defensive own-innovation contributes less to economic growth than creative destruction and restricts firm entry — so it is less beneficial than it appears. In low-creativity economies (e.g., European economies with high regulatory barriers to creative destruction), increased foreign competition may raise aggregate R&D while still lowering domestic growth, misleading policymakers who track only total innovation spending. The model also suggests that the mixed empirical findings in the competition-innovation literature (Aghion et al. 2005; Bloom et al. 2016; Autor et al. 2020) can be reconciled by accounting for compositional shifts: the net effect of competition on total innovation is ambiguous because it raises own-innovation for technologically advantaged firms while reducing creative destruction for all firms.

Key Concepts

Imperfect Technology Spillovers: The novel friction introduced in this paper, modeled as lagged learning: firms attempting creative destruction can only access the one-period-lagged technology of the target product market (q_{j,t−1}), while the incumbent product owner observes and can improve from the current frontier (q_{j,t}). This asymmetry creates a persistent technological advantage for incumbents and enables strategic defensive innovation.

Own-Innovation: R&D investment by a firm to improve the quality of its existing product lines. Successful own-innovation raises product quality by a step size λ > 1. Own-innovation does not require learning others’ technology and, in the model, constitutes the incumbents’ defensive margin against creative destruction. At the aggregate level, it contributes more to total growth than creative destruction because it succeeds more frequently, but per successful event it contributes less (λ < η).

Creative Destruction: R&D investment enabling a firm to enter a new product market by displacing the incumbent. Successful creative destruction improves the lagged quality of the target product by a step size η > λ, where λ² > η > λ. It requires learning the incumbent’s one-period-lagged technology, takes longer to develop (evidenced empirically by longer backward citation gaps), and contributes more to firm growth and product expansion per event than own-innovation.

Technology Gap (Δ): The ratio of a product’s current-period technology to its previous-period technology (Δ_{j,t} = q_{j,t}/q_{j,t−1}). This gap summarizes the technological advantage the incumbent holds in a product market under imperfect spillovers. Four values are possible in equilibrium: Δ₁ = 1, Δ₂ = λ, Δ₃ = η, Δ₄ = η/λ. The gap determines both the incumbent’s own-innovation incentive and the rival’s probability of successfully completing a product takeover conditional on creative destruction.

Market-Protection Effect: The mechanism by which incumbents with a technological advantage (Δ > 1) increase own-innovation in response to heightened competitive pressure (an increase in the aggregate creative destruction arrival rate x). This effect is maximized for products with Δ₂ = λ and positive but diminishing for Δ₃. It is absent for Δ₁ = 1 products (where own-innovation cannot prevent displacement) and is formally distinct from the escape-competition effect in step-by-step innovation models, which applies only to neck-and-neck single-product firms.

Technological Barrier Effect: The reduction in rivals’ incentive for creative destruction caused by an increase in the average technology gap across the economy’s product lines. When incumbents intensify own-innovation and/or when outside creative destruction increases, the distribution of technology gaps shifts toward higher Δ values, reducing the conditional probability that a rival successfully takes over any given product market. This feedback mechanism endogenously suppresses creative destruction and firm entry beyond what the Schumpeterian effect alone would predict.

Self-Citation Ratio: The share of a patent’s backward citations that cite patents previously owned by the same firm. Used in the paper as a continuous proxy for the likelihood that a patent represents own-innovation vs. creative destruction: a ratio of 1 (100% self-citations) implies 100% probability of own-innovation; a ratio of 0 implies 100% probability of creative destruction. This measure follows Akcigit and Kerr (2018) and is validated in the paper against learning time, quality, and firm growth outcomes.

NTR Gap (Trade Policy Uncertainty Shock): The industry-level difference between non-NTR (column 2) and NTR (column 1) U.S. tariff rates in 1999, used as an instrument for the exogenous increase in Chinese competitive pressure following China’s WTO accession and the U.S. granting of Permanent Normal Trade Relations (PNTR) in 2002. Industries with larger NTR gaps experienced a greater reduction in trade policy uncertainty and thus a larger increase in competitive pressure from foreign firms.

Homeownership, Polarization, and Inequality

Mon, 01 Jan 0001 00:00:00 +0000

This paper asks why job polarization and income inequality are higher in large U.S. cities, and proposes a novel housing-market mechanism that operates independently of — but interacts with — the skill-biased technical change (SBTC) explanations dominant in the existing literature.

The core argument is that large cities have experienced faster growth in house prices relative to both wages (price-wage ratio) and rents (price-rent ratio) since 1980. This excess price growth has priced middle-income households out of homeownership in expensive cities. Because low-income households cannot afford to own anywhere and high-income households can afford to own everywhere, it is specifically middle-income (middle-skilled) households whose location choice becomes entangled with their tenure choice. These households increasingly sort toward smaller, more affordable cities where they can purchase a home. This selective out-migration hollows out the middle of the income distribution in large cities, producing greater employment polarization and income inequality there.

Empirically, the paper uses Census and ACS data from 1980 to 2019 covering 465 commuting zones (CZs). Polarization is measured following Autor and Dorn (2013) by assigning 3-digit occupations to income percentiles fixed at 1980 levels; inequality is measured by the Gini coefficient and variance of log annual wages. Housing costs are captured by hedonic price and rent indices and three derived ratios. OLS and IV results (instrumented using the interaction of land unavailability and long-run changes in real interest rates) show that doubling of prices is associated with a 1 percentage point decline in the middle-skilled employment share; doubling of the price-rent ratio is associated with an 11.3 percentage point decline; doubling of the price-wage ratio with a 5.3 percentage point decline. Inequality follows the same pattern: doubling prices raises 100x the variance of log wages by 2.3 points; doubling the price-rent ratio raises it by 11.7 points; doubling the price-wage ratio by 7.7 points.

The migration mechanism is documented using 2001–2019 CPS ASEC data, which — uniquely among available sources — reports reasons for moving. A doubling of the price index, price-wage ratio, or price-rent ratio in the origin state relative to the destination raises the probability that a middle-income (2nd–4th quintile) household moves for housing-related reasons by approximately 5–10 percentage points in absolute terms, implying a 50–80% relative increase compared with low- or high-income households making a housing-related move.

The theoretical framework extends the standard spatial equilibrium (Rosen-Roback) model with two additions: skill heterogeneity and housing tenure choice. Households face a minimum house size constraint and a payment-to-income (PTI) constraint (calibrated at lambda = 0.308). These constraints create distinct skill thresholds for homeownership that vary by city; the interaction between location and tenure choices applies only to middle-skilled households who can afford ownership in cheap but not expensive cities.

In the quantitative model, calibrated separately for 1980 and 2019 with two locations (top 30 CZs vs. the rest), counterfactual experiments show that holding price-wage ratios at their 1980 levels reduces the excess polarization gap between large and small CZs by 93% and the excess inequality gap by 40%. Holding price-rent ratios constant reduces the polarization gap by 96% and the inequality gap by 27%. By contrast, shutting down SBTC entirely reduces the polarization gap by only 54% and the inequality gap by 73%. These results establish that while SBTC is an important driver, its effect on polarization and inequality is substantially amplified by faster house price growth in large cities; without the housing affordability channel, the effect of SBTC on disproportionate polarization would be 63–81% smaller and on the inequality gap 18–36% smaller.

Q: What is the paper’s central research question? A: The paper asks why job polarization and income inequality are systematically higher in large U.S. cities than in small ones. Prior literature attributed this to skill-biased technical change, external labor demand shocks, or IT-driven displacement of routine jobs; this paper proposes a complementary, housing-market-based explanation that does not rely on features of the production technology.

Q: What is the core mechanism linking house prices to polarization? A: When price-wage and price-rent ratios are higher in large cities, middle-income households face binding minimum-size and payment-to-income constraints that prevent them from owning a home there but not in cheaper cities. Because homeownership carries financial advantages, these households sort toward smaller, more affordable cities. Low-income households cannot afford ownership anywhere and high-income households can afford it anywhere, so only the middle group’s location choice is distorted by tenure considerations. This selective out-migration hollows out the middle of the income distribution in expensive large cities.

Q: What empirical patterns in CZ-level data motivate the paper? A: Doubling CZ size is associated with a 1.9 percentage point greater fall in the middle-skilled employment share and a 2.7 point higher growth in 100x the variance of log wages from 1980 to 2019. Larger CZs also experienced 3.4% higher price growth, 3.1% higher price-wage ratio growth, and a 10% greater increase in price-rent ratios. These associations persist after controlling for initial CZ size and other characteristics.

Q: What do the OLS and IV results show about house prices and polarization? A: A doubling of house prices is associated with a 1 percentage point decline in the middle-skilled share; a doubling of the price-rent ratio with an 11.3 percentage point decline; and a doubling of the price-wage ratio with a 5.3 percentage point decline. IV results using the interaction of land unavailability and the change in real interest rates as an instrument confirm the negative relationship remains statistically significant, suggesting a causal interpretation is plausible.

Q: What do the OLS and IV results show about house prices and income inequality? A: A doubling of prices is associated with a 2.3 point increase in 100x the variance of log wages; a doubling of the price-rent ratio with an 11.7 point increase; and a doubling of the price-wage ratio with a 7.7 point increase. IV results suggest a causal relationship between price growth and income inequality at the CZ level.

Q: What evidence does the paper provide for the migration mechanism? A: Using 2001–2019 CPS ASEC data (which reports stated reasons for moving, unlike the ACS), the paper estimates logit regressions of interstate migration for housing-related reasons. A doubling of the price index in the origin state relative to the destination raises the probability of a housing-related move for middle-income (2nd–4th quintile) households by 5–6 percentage points; a doubling of the price-wage ratio raises it by 6–7 percentage points; and a doubling of the price-rent ratio raises it by 7–10 percentage points. These effects imply a 50–80% relative increase in housing-related migration probability for the middle quintiles compared with the bottom or top quintile. Housing-related movers constitute over 12% of all interstate migrants in the sample.

Q: What is the key finding about homeownership rates? A: There is no statistically significant relationship between the change in homeownership rates and the growth in prices, price-rent, or price-wage ratios from 1980 to 2019. This is consistent with the model’s mechanism, in which middle-income households who cannot afford ownership in large cities move away rather than simply switching to renting there — so aggregate local ownership rates need not fall.

Q: How does the theoretical model generate the polarization result? A: The model extends the Rosen-Roback spatial equilibrium framework with skill heterogeneity and housing tenure choice. Two skill thresholds — one for minimum-size-constrained ownership and one for unconstrained ownership — interact with the price-wage and price-rent ratios of each city. Proposition 1 proves that a city with higher price-wage and price-rent ratios will have a lower middle-skilled share, because middle-skilled workers (those who can afford to own in cheap but not expensive cities) are drawn to cheaper locations. Proposition 2 shows that in a world with only renters or only owners, skill shares would be identical across cities regardless of price differences — the polarization result requires heterogeneity in tenure choice.

Q: What does the no-SBTC counterfactual show? A: Holding the parameters governing local returns to skills at their 1980 levels (shutting down skill-biased technical change) reduces the difference in the decline in the middle-skilled share between large and small CZs by 54% and the gap in the increase in the variance of log wages by 73%. This is broadly consistent with prior literature attributing the bulk of disproportionate polarization and inequality in big cities to SBTC.

Q: What do the constant price-ratio counterfactuals show? A: When price-wage ratios are held at 1980 levels (but SBTC is allowed to operate), the excess polarization gap between large and small CZs falls by 93% and the excess inequality gap by 40%. When price-rent ratios are held at 1980 levels, the polarization gap falls by 96% and the inequality gap by 27%. When both are held constant simultaneously, the polarization gap falls by 89% and the inequality gap by 27%. These results show that the effect of SBTC on polarization would be 63–81% smaller in the absence of the housing affordability amplification channel.

Q: Who are the largest losers from rising price-wage ratios in large cities? A: The counterfactual welfare analysis identifies middle-skilled workers with skill levels between approximately 0.29 and 0.80 as the primary losers. In the counterfactual with fixed price-wage ratios, workers with skills from 0.29 to 0.57 who previously could not afford ownership in large cities are now able to own there, and those with skills from 0.57 to 0.80 spend a smaller share of income on housing. This group either lost homeownership opportunities or was induced to move to less productive CZs by the actual price growth that occurred.

Q: How is the quantitative model calibrated and structured? A: The model is calibrated separately for 1980 and 2019 as two stationary spatial equilibria. It features two locations (the top 30 CZs, which account for 49.3% of employment, and the remaining CZs). Key parameters include a Frechet elasticity of 6.1, an agglomeration externality of 0.04, a PTI constraint of 0.308, and an annual discount factor of 0.96. Land shares differ between large and small CZs (0.3965 vs. 0.2239). The model finds that the price-rent ratio was relatively stable in large cities but fell in small ones, while the price-wage ratio increased much more in large CZs — both indicators point to purchasing a home becoming relatively more expensive in large CZs.

Q: What are the paper’s policy implications? A: Zoning reforms and other policies that increase housing supply in large, unaffordable cities could produce a more efficient spatial allocation of labor, greater aggregate productivity, and more economically diverse — less polarized and less unequal — cities, while also reducing the wealth gap between owners and renters. Policies that promote homeownership by reducing the cost of owning without raising housing supply may reduce local polarization and inequality but could lower aggregate output and do not necessarily increase homeownership rates.

Q: How does this paper relate to existing explanations for city-level polarization? A: The paper’s housing-market mechanism is explicitly complementary to SBTC-based explanations (Baum-Snow, Freedman, and Pavan, 2018; Cerina et al., 2023), external demand shock explanations (Davis, Mengus, and Michalski, 2020), and IT-displacement explanations (Eeckhout, Hedtrich, and Pinheiro, 2024). The paper’s key added contribution is that even if SBTC were the primary driver of disproportionate polarization, its measured effect would be substantially smaller in the absence of faster house price growth in large cities — the housing market amplifies rather than replaces the technology channel.

Job polarization (city-level): The hollowing out of middle-income employment shares in a commuting zone, measured as the change in the share of workers in occupations assigned to the 21st–80th income percentile (using the 1980 occupation-to-percentile mapping fixed over time). In this paper, polarization is greater in cities where price-wage and price-rent ratios grew faster, attributed to selective out-migration of middle-skilled households.

Price-wage ratio: The ratio of hedonic house prices to median annual wages in a commuting zone, constructed from Census and ACS data. A higher price-wage ratio tightens the payment-to-income constraint on potential homebuyers and is the primary driver of the skill threshold for homeownership in the model.

Price-rent ratio: The ratio of hedonic house prices to rents in a commuting zone. In the model, a higher price-rent ratio reduces the financial advantage of owning over renting, raising the skill threshold at which ownership becomes optimal. The paper treats price-rent and price-wage ratios as distinct channels that both independently amplify polarization.

Housing tenure choice: The household decision to own or rent, modeled as a discrete choice made at the start of life that interacts with location choice. Ownership requires satisfying both a minimum house size constraint and a payment-to-income (PTI) constraint (lambda = 0.308). The interaction between tenure and location choices is the paper’s key model innovation; it exists only for middle-skilled workers whose income is sufficient for ownership in cheap but not expensive cities.

Skill threshold for homeownership (s*_i): The minimum skill level at which a worker in city i chooses to own rather than rent, defined by Lemma 2. This threshold is decreasing in local labor productivity and increasing in price-wage and price-rent ratios. Workers with skill below s*_i in all cities always rent; those with skill above s*_i in all cities always own; those in between face city-dependent tenure choice that distorts their location decision.

Skill-biased technical change (SBTC): In the paper’s quantitative model, SBTC is represented by faster growth in the skill dispersion parameter (alpha_it) in large CZs, reflecting differential productivity growth concentrated at the top of the skill distribution. The paper finds SBTC accounts for 54% of the polarization gap and 73% of the inequality gap in its counterfactual, but argues its effect is amplified 4–5x by the housing affordability channel.

Payment-to-income (PTI) constraint: The constraint that a homebuyer cannot spend more than a fraction lambda (calibrated at 0.308) of annual labor earnings on the annual housing payment (user cost times price times quantity). This constraint, together with the minimum house size, determines the income threshold for ownership and makes location and tenure choices interdependent for middle-skilled workers.

How Do You Identify a Good Manager?

Mon, 01 Jan 0001 00:00:00 +0000

This paper develops a novel experimental method to identify the causal contribution of managers to team performance, and uses it to evaluate which characteristics predict managerial effectiveness and how manager selection mechanisms affect organizational outcomes.

The core identification challenge is that managers are not randomly assigned to teams in the field, and field managers are a highly non-random sample, making it difficult to infer which traits genuinely predict managerial performance. The authors address this by repeatedly randomly assigning managers to multiple teams in a controlled laboratory experiment, then estimating each manager’s average causal contribution to group output after conditioning on group members’ individual productive skills. The intuition is that a good manager is someone who consistently causes their team to produce more than the sum of their parts.

The experiment was conducted at the University of Essex lab with 555 participants (46% female, mean age 25, ethnically diverse) forming 728 groups of three across four rounds. Each group consisted of one manager and two workers who performed a Collaborative Production Task requiring coordination across three problem-solving modules (numerical, spatial, and analytical reasoning). The team score was the minimum module score — a weakest-link structure making coordination essential. Prior to group testing, all participants completed individual assessments of task-specific skill, fluid intelligence (CFIT), emotional perceptiveness (Reading the Mind in the Eyes Test, RMET), economic decision-making skill (the Assignment Game, which measures resource allocation under comparative advantage), Big 5 personality, and demographic characteristics. Manager selection was randomly varied at the session level: in 20 sessions, the participant with the strongest preference for leadership became manager (self-promotion); in 19 sessions, managers were assigned by lottery.

The main quantitative findings are as follows. First, there are large, stable, and statistically significant manager effects: a manager one standard deviation above average improves team performance by approximately 0.23 standard deviations (p = 0.04). This estimate is roughly 90% the size of the combined productive skill coefficient for the two workers (approximately 0.26 sd), indicating that a good manager is roughly twice as valuable as a good individual worker. Manager contributions predict out-of-sample group performance in a leave-one-out procedure (p < 0.01).

Second, among randomly assigned managers, only two predictors significantly explain managerial performance: fluid intelligence (CFIT) and economic decision-making skill (Assignment Game scores), both significant at below the 1% level. Gender, age, and ethnicity do not predict managerial performance.

Third, self-promoted managers perform substantially worse than lottery-assigned managers, by approximately 0.10 standard deviations — roughly equivalent to being assigned a manager with fluid intelligence one full standard deviation below average. The mechanism is overconfidence: people who strongly prefer management roles are significantly more overconfident (d = 0.41 sd, p < 0.01) and exhibit a strong negative correlation between self-reported social skills and actual emotional perceptiveness on the RMET (r = -0.37, p < 0.001). Among self-promoted managers, self-reported extraversion and political skill are negatively correlated with managerial performance (rho = -0.24 and -0.26, p < 0.05); no such negative relationship appears among lottery managers.

Fourth, selecting managers on economic decision-making skill rather than self-promotion improves average manager quality by 0.6 standard deviations — equivalent to replacing an average worker in every group with a worker at the 99th percentile of individual productivity.

The three mechanisms through which good managers improve performance are: (1) monitoring — good managers (1 sd above average) cut monitoring errors from 16% to 8%; (2) optimal task allocation according to comparative advantage — groups with optimally assigned workers score 0.52 sd higher (p < 0.01); (3) worker motivation in late-stage effort — teams led by a 1-sd-above-average manager solve 0.6 more problems in the final two minutes versus only 0.3 more in the first two minutes.

The experiment was conducted in a university lab in the UK, and the sample skews toward graduate students with limited work experience. Generalizability to field settings is supported by prior evidence that peer productivity spillover experiments yield similar magnitudes in lab versus field settings, and that the estimated manager effects are similar to Lazear et al. (2015) estimates from a large employer dataset.

Q: What is the core methodological innovation of this paper? A: The paper requires repeated random assignment of managers to multiple teams, combined with controls for individual productive skill measured prior to group work. This allows identification of each manager’s average causal contribution to group output, rather than confounding management quality with team composition or individual worker ability. The key estimand is the standard deviation of individual manager effects (sigma_alpha), interpreted as the impact of having a manager one standard deviation above average.

Q: How large is the estimated manager effect, and how does it compare to worker effects? A: A manager one standard deviation above average improves team performance by approximately 0.23 standard deviations (p = 0.04 by randomization inference). This is roughly 90% the size of the combined productive skill effect of both workers together (approximately 0.26 sd), implying a good manager is nearly twice as valuable as a good individual worker. Without conditioning on production skills, the manager effect rises to 0.29 sd.

Q: What characteristics predict managerial performance among randomly assigned managers? A: Only two measures predict managerial performance in the lottery arm: fluid intelligence (CFIT) and economic decision-making skill (scores on the Assignment Game), both significant at below the 1% level. These predictors are robust to controls for demographics, education, work experience, emotional perceptiveness, and personality traits. Gender, age, and ethnicity do not predict managerial performance.

Q: What is the “Assignment Game” and why is it a strong predictor? A: The Assignment Game (Caplin et al., 2024) places participants in a simulated managerial role where they must assign fictional workers to tasks. Performing well requires understanding comparative advantage intuitively, managing an attentionally demanding numerical environment, and avoiding biases such as anchoring. The paper argues its strong predictive power reflects that good managers excel at allocating workers according to comparative advantage — which the experiment directly identifies as a key mechanism.

Q: How do self-promoted managers perform relative to lottery-assigned managers? A: Self-promoted managers perform approximately 0.10 standard deviations below lottery managers, and this gap is robust across model specifications. The performance deficit is roughly equivalent to being assigned a manager whose fluid intelligence is one full standard deviation below average. This finding implies that common organizational practice of selecting managers partly via self-nomination actively reduces team productivity.

Q: Why do self-promoted managers underperform? A: The paper attributes underperformance primarily to overconfidence. People strongly preferring management roles are significantly more overconfident than those without strong preferences (d = 0.41 sd, p < 0.01). Self-promoted managers specifically overestimate their social skills: among them, self-reported people skills are strongly negatively correlated with actual emotional perceptiveness on the RMET (r = -0.37, p < 0.001), and self-reported extraversion and political skill are negatively correlated with managerial performance (rho = -0.24 and -0.26, p < 0.05). None of these negative relationships appear among lottery managers.

Q: Who wants to be a manager, and does it differ by gender? A: The three variables most strongly correlated with wanting to be in charge are extraversion, risk appetite, and being male. The relationship between high extraversion and preference for management is driven largely by men. Women are much less likely to nominate themselves for leadership roles despite being equally or more effective on average — a finding consistent with broader experimental evidence on gender and leadership self-selection.

Q: How large are the potential gains from skill-based manager selection? A: Compared to self-promotion, selecting managers based on economic decision-making skill yields managers who are 0.6 standard deviations better in terms of estimated manager effects. In terms of group performance, this is equivalent to replacing an average worker in every group with a worker at the 99th percentile of individual productivity. Selecting on both economic decision-making and fluid intelligence outperforms random assignment, selection on social skills, or selection on worker task performance (the Peter Principle).

Q: What are the three mechanisms through which good managers improve team performance? A: First, monitoring: good managers (1 sd above average) reduce monitoring errors — defined as having a worker on a module substantially above the minimum score at task end — from 16% to 8% (bivariate correlation with manager performance = -0.40, p < 0.001). Second, optimal task allocation: the probability of finding the optimal comparative-advantage-based assignment is positively associated with manager performance (rho = 0.19, p < 0.01), and groups with always-optimal starting assignments score 0.52 sd higher than those with never-optimal assignments (p < 0.01). Third, worker motivation: team performance in the final two-minute period is about 50% more influential for overall outcomes than the first two minutes (p = 0.038), and 1-sd-above-average managers generate 0.6 more problems solved in the final period versus 0.3 in the first, consistent with differential motivational effects emerging over time.

Q: What is the Peter Principle, and how does this paper relate to it? A: The Peter Principle refers to the practice of promoting employees based on their performance as line workers rather than their suitability for management — promoting individuals to their level of incompetence. Benson et al. (2019) document this selection pattern empirically. This paper shows that selecting managers on worker task skill is inferior to selecting on economic decision-making skill or fluid intelligence, confirming that task skill is not the right criterion for manager selection even if it predicts individual worker output.

Q: How does the paper validate that manager effects are real and not noise? A: The paper uses randomization inference with 5,000 simulated allocations to compute p-values, obtaining p = 0.04 for the main manager effect. Robustness checks include controlling for pre-existing social relationships, manager risk appetite, variance of individual scores, and granular skill measures — all yielding estimates near 0.22 sd. A leave-one-out out-of-sample prediction test confirms manager contributions significantly predict held-out group performance (p < 0.01), while the analogous worker out-of-sample estimate is less than half the magnitude and not statistically significant.

Q: What are the scope conditions on the experimental results? A: The experiment is conducted in a university lab in the UK with graduate students averaging 25 years of age and two years of work experience, limiting direct generalizability to experienced workers or senior management. The task lasts approximately 15 minutes, which may not capture longer-run managerial dynamics. Compensation equalized average earnings between managers and workers, which differs from most real-world settings. The authors note their effect-size estimates closely match Lazear et al. (2015) from a large employer, and that Herbst and Mas (2015) find lab peer-productivity experiments generalize to the field.

Manager Effect (sigma_alpha): The standard deviation of individual managers’ average causal contributions to group performance, estimated via repeated random assignment and conditioning on individual productive skill. Represents the impact of having a manager one standard deviation above average, estimated at approximately 0.23 standard deviations of group output.

Collaborative Production Task: A novel lab group task in which a manager and two workers solve problems across three modules (numerical, spatial, analytical reasoning), with team score defined as the minimum module score (weakest-link structure). Managers are responsible for worker assignment, monitoring, and motivation; workers face no financial performance incentives.

Economic Decision-Making Skill: Defined by Caplin et al. (2024) as the ability to make good resource allocation decisions, assessed via the Assignment Game in which participants must optimally assign workers to tasks under comparative advantage. The single strongest predictor of managerial performance in the lottery arm.

Monitoring Failure: Defined in the paper as having any group member working on a module at task end whose score is substantially greater (e.g., 10 points higher) than the minimum module score — meaning the worker’s effort is not contributing to the group score. Occurs in 16% of groups overall; managers one sd above average reduce this to 8%.

Self-Promotion (as selection mechanism): A treatment condition in which the participant with the strongest stated preference for being manager (on a 1-10 scale) is assigned the managerial role. Contrasted with lottery assignment; self-promoted managers perform approximately 0.10 sd worse than lottery managers.

Overconfidence (in managerial context): The gap between self-assessed skill (particularly social/interpersonal skill) and objectively measured skill (e.g., RMET score). Self-promoters are significantly more overconfident (d = 0.41 sd), and overconfidence is strongly negatively correlated with actual emotional perceptiveness (r = -0.33, p < 0.001).

Comparative Advantage Allocation: The practice of assigning each worker to the module in which they have the highest relative (not absolute) performance advantage. Captured via whether a manager selects the optimal one-to-one assignment given pre-measured individual module scores; groups with always-optimal allocation score 0.52 sd higher.

Ideas Have Consequences: The Impact of Law and Economics on American Justice

Mon, 01 Jan 0001 00:00:00 +0000

This paper quantifies the effect of the Manne Economics Institute for Federal Judges — an intensive two-week economics training program run by the Law and Economics Center from 1976 to 1998 — on the decision-making of U.S. federal judges. The research question is whether exposure to a coherent set of economic ideas can directly shift the policy decisions of sitting policymakers, as distinct from effects operating through partisan affiliation or formal legal rules.

The program trained nearly half of all federal judges over its two decades of operation. By 1990, forty percent of federal judges had attended; by the late 1990s, roughly half of circuit court cases had a Manne-trained judge on the panel. Instructors included Milton Friedman, Armen Alchian, Harold Demsetz, Martin Feldstein, Paul Samuelson, and Orley Ashenfelter, covering supply-and-demand theory, the Coase Theorem, externalities, property rights, and criminal deterrence following Becker (1968). The program was funded by pro-business foundations and had a recognized conservative-leaning orientation, though it invited both Republican- and Democrat-appointed judges and was popular across party lines.

The identification strategy is a differences-in-differences design exploiting staggered attendance timing. Because the program was oversubscribed and admitted judges on a first-come-first-served basis — with applicants bumped to later cohorts when capacity was reached — the timing of attendance within the ever-attending population has a quasi-random component. The preferred control group consists exclusively of other ever-attending judges who had not yet attended, rather than never-attenders, because never-attenders differ systematically on observables and show a pre-existing positive trend in economics language use, likely from ambient diffusion through clerks, law schools, and organizations such as the Federalist Society. Judge fixed effects and circuit-by-year (or courthouse-by-year) fixed effects absorb time-invariant judge characteristics and court-level time trends. Elastic-net-selected covariates predicting attendance timing, fully interacted with year fixed effects, are added as robustness controls. Standard errors are clustered by judge.

The data cover approximately 200,000 published circuit court opinions (1970–2005) from Bloomberg Law, a 5% random sample of circuit cases hand-coded for ideological direction from the Songer-Auburn database, machine-coded regulatory agency outcomes, a newly collected antitrust case dataset, and approximately 1.03 million district court criminal sentencing records (1992–2003) from TRAC.

The main findings are as follows. First, after attending the Manne program, judges increase their use of economics language in written opinions by approximately one-third of a standard deviation, measured via word-embedding similarity to an economics lexicon; this effect is statistically significant in the short-run event-study window but does not persist over the full career. Second, Manne attendance raises conservative voting in economics-related cases (labor and regulation) by approximately one-quarter of a standard deviation — corresponding to judges deciding in the conservative direction about 20 percent more often relative to the mean — with no significant effect on non-economics cases; the interaction effect is robust across specifications including never-attenders. Third, post-Manne judges vote more frequently against federal labor and environmental regulatory agencies, a result that is statistically significant and economically meaningful with no detectable pre-trends. Fourth, post-Manne judges impose longer and more frequent prison sentences, with no increase in sentencing harshness for drug crimes — consistent with Manne instructors having explicitly advocated drug legalization — and with the harshness gap between Manne and non-Manne judges widening after the 2005 Booker decision expanded judicial sentencing discretion. Fifth, there is some evidence of increased voting against antitrust enforcement, though this result is more sensitive to specification. Persuasion rates computed following DellaVigna and Gentzkow (2010) are slightly larger than those estimated for partisan media interventions such as Fox News and are closest to the effect of a 10-week Washington Post subscription on Democratic governor vote share. Neither the legalist model (judges follow statutes mechanically) nor the attitudinal model (judges follow party affiliation) can explain these within-judge, within-party shifts.

Q: What is the central identification challenge and how do the authors address it? A: The key threat is that judges who chose to attend the Manne program — or who attended at a particular time — may differ systematically from non-attenders in ways correlated with their decision trajectories. The authors address this in two steps. First, they restrict the control group to other ever-attending judges who had not yet attended, exploiting the first-come-first-served oversubscription rule that created quasi-random variation in timing among applicants. Second, they use judge fixed effects plus circuit-by-year fixed effects, and add elastic-net-selected biographical covariates (e.g., birth cohort indicators) interacted with year fixed effects as a robustness check. Republican affiliation — the most salient ideological predictor of attendance — is not a statistically significant predictor of attendance timing, supporting the exclusion restriction.

Q: Why are never-attenders excluded from the preferred control group? A: Never-attenders differ from attenders on observables including political party and show a positively trending use of economics language in their opinions even before any treatment, suggesting ambient diffusion of economics ideas through law clerks, law school curricula, and organizations such as the Federalist Society. Including never-attenders in the control group produces a near-zero coefficient on the language outcome, which the authors interpret as reflecting spillovers rather than a true null effect; the coefficient on conservative voting in the interaction specification, however, remains positive and significant even when never-attenders are included.

Q: What is the magnitude of the effect on economics language use? A: The within-judge effect of Manne attendance on the word-embedding similarity between judicial opinions and an economics lexicon is approximately one-third of a standard deviation, statistically significant in the short-run event-study window (covering six years before and after attendance). The effect shrinks and becomes non-significant when the full career of Manne judges is examined (rather than just the event-study window), consistent with broad diffusion of economics language across the judiciary over time rather than a persistent individual-level treatment effect.

Q: How large is the effect on conservative voting, and is it concentrated in particular case types? A: Post-Manne attendance raises conservative voting in economics-related cases (labor and regulation) by approximately one-quarter of a standard deviation, corresponding to judges deciding in the conservative direction about 20 percent more often relative to the mean liberal-conservative decision rate. There is no statistically significant effect on non-economics cases. The interaction coefficient — the differential effect on economics versus non-economics cases — is positive and significant across all specifications including the full sample with never-attenders, making this the most robust directional result in the paper.

Q: What is the effect on regulatory agency voting? A: Post-Manne judges vote more frequently against federal labor agencies (National Labor Relations Board, OSHA, Department of Labor, Federal Labor Relations Authority, Office of Worker’s Compensation Programs) and the Environmental Protection Agency. The event study shows a positive and significant increase that persists across the event-study window with no detectable pre-trends. This result is robust to both the baseline specification and the elastic-net-controls specification.

Q: What is the effect on criminal sentencing, and what heterogeneity is found? A: Post-Manne judges impose both more frequent prison sentences and longer sentences, consistent with Becker’s deterrence framework taught in the program’s criminal law curriculum. The sentencing effects are absent for drug crimes, consistent with Manne instructors — including Milton Friedman — having explicitly advocated against the drug war and for drug legalization. The gap in sentencing harshness between Manne and non-Manne judges widens after the 2005 United States v. Booker decision, which made the Federal Sentencing Guidelines advisory rather than mandatory; this is consistent with the program having shaped latent judicial preferences that are expressed more fully when formal constraints are relaxed.

Q: How do the persuasion rates compare to benchmark media studies? A: The persuasion rates computed following DellaVigna and Gentzkow (2010) are slightly larger than those estimated for partisan media interventions such as Fox News (DellaVigna and Kaplan, 2007) and are closest to the persuasion rates implied by a 10-week subscription to the Washington Post on Democratic governor vote share (Gerber et al. 2009). The comparison contextualizes the Manne program as a moderately high-intensity ideational intervention relative to documented cases of political persuasion.

Q: What do the results imply for theories of judicial behavior? A: The findings are inconsistent with both the legalist/formalist model — under which judges apply statutes and precedent without regard to extra-legal factors, predicting zero effect — and the attitudinal model — under which judges simply follow partisan preferences, also predicting zero effect since the program attended judges of both parties. The within-judge, within-party shifts point to a third channel: judicial worldviews and economic ideas, independent of formal law and partisan affiliation, shape high-stakes precedent-setting decisions.

Q: Can the authors distinguish between a pedagogical (informational) and an ideological persuasion mechanism? A: They cannot definitively distinguish between the two. Both mechanisms predict increased economics language, more conservative rulings in economics cases, deregulatory voting, and harsher non-drug sentences. The drug-crime heterogeneity is somewhat more consistent with a nuanced pedagogical channel, since Manne instructors explicitly discussed drug legalization, but this pattern is also consistent with complex ideological effects. Evidence on decision quality (citation rates, judicial promotion) is mixed and not robust, providing no clean test of the informational mechanism.

Q: What does the antitrust evidence show? A: Post-Manne judges tend to vote against antitrust claimants (i.e., in favor of less antitrust enforcement), but this result is more sensitive to specification than the regulatory agency and sentencing results and is not always statistically significant across specifications. The authors treat it as suggestive rather than conclusive.

Q: How does this paper relate to the literature on economics education and normative beliefs? A: Prior work finds that economics students are less redistributive (Selten and Ockenfels 1998), view surge prices more favorably (Frey and Meier 2005), favor profit maximization (Rubinstein 2006), and that economics professors are less ideologically liberal than other social scientists (Jelveh et al. 2018). The present paper extends this literature by studying established professionals (judges) making high-stakes real-world decisions, and by documenting a direct policy impact rather than a change in survey responses or experimental choices.

Q: What is the scope of the dataset and the program coverage? A: The circuit court dataset covers approximately 200,000 published opinions from 1970 through 2005. The district court sentencing dataset covers approximately 1.03 million cases from 1992 through 2003 (event study sample). The Manne program ran from 1976 to 1998, with roughly twenty judges per cohort; by 1990 forty percent of federal judges had attended, and by the late 1990s roughly half of circuit court cases had a Manne-trained panelist. Biographical information comes from the Federal Judicial Center; program attendance lists come from Butler (1999) supplemented by FOIA-obtained annual reports.

Manne Economics Institute for Federal Judges: An intensive two-week economics training program for sitting U.S. federal judges, run by the Law and Economics Center from 1976 to 1998, covering supply-and-demand theory, the Coase Theorem, externalities, property rights, deterrence theory, and related topics; funded by pro-business foundations; admitted judges on a first-come-first-served basis and trained nearly half of all federal judges over its operation.

Word-embedding economics language measure: A continuous measure of how closely a judicial opinion’s vocabulary aligns with a lexicon of law-and-economics phrases, constructed using word2vec embeddings (Mikolov et al. 2013) trained on the corpus of judicial opinions; measures the semantic proximity of opinion text to the Ellickson (2000) economics lexicon in embedding space, capturing implicit and contextual use of economics reasoning rather than raw phrase counts.

Deterrence theory (Becker model): The framework, drawn from Becker (1968), taught in the Manne program’s criminal law curriculum, which holds that optimal crime deterrence requires setting the expected penalty — the economic cost of punishment times the probability of detection — high enough to outweigh the expected benefits of crime; treated in the paper as the theoretical basis for predicting harsher sentencing among post-Manne judges, and contrasted with retribution- or rehabilitation-based sentencing rationales that dominated before its diffusion.

Conservative judicial decision (economics cases): In the paper’s usage, a ruling against the liberal/pro-plaintiff position in a case involving labor or regulation, as hand-coded by the Songer-Auburn database; includes ruling against a labor agency, rejecting a regulatory claimant, or voting against antitrust enforcement; the paper finds Manne attendance shifts judges in this direction in economics cases but not in non-economics cases.

First-come-first-served oversubscription: The admission rule of the Manne program during its oversubscribed heyday (from the second cohort in 1977 through the late 1980s), under which applicants who did not secure a spot were bumped to the next year’s cohort; the authors argue this rule generates quasi-random variation in the timing of attendance among ever-attending judges, conditional on applying, providing the identifying variation for the differences-in-differences design.

Persuasion rate: A summary statistic, following DellaVigna and Gentzkow (2010), measuring the fraction of the “persuadable” population that is convinced by a treatment; used in the paper to benchmark the Manne program’s effect size against documented media persuasion interventions such as Fox News and Washington Post subscriptions.

Identifying Preference for Early Resolution from Asset Prices

Mon, 01 Jan 0001 00:00:00 +0000

Layer 1: Overview

This paper develops a revealed-preference theory that uses asset-market data to identify whether investors have a preference for early resolution of uncertainty (PER), a property of non-expected utility preferences that is distinct from risk aversion. The central theorem shows that, under a condition called generalized risk sensitivity (GRS), the representative agent prefers early resolution if and only if claims to future stock market volatility earn a positive premium during the period in which the informativeness of upcoming macroeconomic announcements is resolved — a window the authors call the Resolution of Information Quality (ROIQ) period. Using S&P 500 index option data from 1996 to 2019, the paper identifies the ROIQ period as the five weekdays before FOMC announcements, demonstrates that the inverse slope of the implied-volatility term structure (9-day/90-day VIX ratio) significantly predicts the informativeness of upcoming announcements, and finds a statistically significant positive ROIQ premium on synthetic variance claims (beta = 1.085, t = 2.44) and on at-the-money straddles (beta = 0.428, t = 2.25). The evidence supports Epstein-Zin recursive utility with the intertemporal elasticity of substitution exceeding the reciprocal of risk aversion, and hence is consistent with the Bansal-Yaron long-run risk framework. Crucially, this identification requires no parametric calibration of the full asset pricing model.

In depth

Q1. What is preference for early resolution (PER) and why is it hard to identify?

PER means that an agent with a given distribution over future outcomes strictly prefers to learn the outcome sooner rather than later, as formalized by Kreps and Porteus (1978); under Epstein-Zin recursive utility, PER is equivalent to risk aversion exceeding the reciprocal of the IES (or IES > 1/risk aversion). In standard applied asset pricing models with constant-elasticity recursive utility, PER is intertwined with risk aversion and the IES, so that the separate role of the timing of resolution is obscured. Existing papers either test joint implications of the full calibrated model (conflating PER with other preference properties) or use thought-experiment willingness-to-pay calculations without market-data grounding. The authors’ goal is to provide a necessary and sufficient condition for PER directly from asset prices, independent of a fully specified model.

Q2. What is the role of Generalized Risk Sensitivity (GRS) in the identification theorem?

GRS — the condition that the certainty-equivalent functional I is increasing in second-order stochastic dominance — provides the bridge between the unobservable ranking of utility levels across states and the observable ranking of marginal utilities (stochastic discount factors) across those states. The authors prove that under GRS (Theorem 1), the vector of partial derivatives of I with respect to continuation utility is strictly negatively comonotone with the level of continuation utility: higher utility states have lower marginal utility. This inversion is what allows asset prices to reveal the ordering of utility levels. GRS itself is empirically supported by the well-documented fact that assets earn positive announcement premia around scheduled macroeconomic releases (Savor and Wilson, 2013).

Q3. How does the main theorem (Theorem 2) identify PER from a single asset class?

Theorem 2 establishes that, under strict GRS, the premium earned by any asset comonotone with the informativeness of upcoming macroeconomic announcements during the ROIQ period is strictly positive if and only if the agent has PER; a negative ROIQ premium would indicate preference for late resolution. The intuition is that if the agent prefers early resolution, she assigns higher continuation utility to the early-resolution state (0E) than to the late-resolution state (0L); under strict GRS, higher continuation utility maps to lower marginal utility, meaning assets paying off more in the early-resolution state are negatively correlated with the SDF and therefore carry a positive risk premium. Claims to stock market return variance serve as the test asset because expected variance is high before informative announcements (early resolution) and low before uninformative ones (late resolution).

Q4. How do the authors operationalize the ROIQ period empirically?

The ROIQ period is identified as the five weekdays before FOMC announcements, during which market attention to the Fed (measured by RavenPack Fed-related news intensity) is significantly positively correlated with the change in the inverse slope of the implied-volatility term structure (coefficient = 1.076, t = 4.09), while no such correlation exists in the ten days 6–10 before or after the announcement. This correlation arises because, during those five days, investors regularly update their expectations about whether the upcoming FOMC statement will be informative; more expected informativeness raises the demand for short-dated options (driving up the 9-day VIX relative to the 90-day VIX) and simultaneously raises Fed-related news coverage. Outside the ROIQ window, the two series are uncorrelated (coefficient = −0.242, t = −1.13 unconditionally), confirming that the window is the correct testing period.

Q5. What is the empirical evidence for a positive ROIQ premium, and how is it constructed?

Synthetic variance claims constructed as option portfolios following Bakshi, Kapadia, and Madan (2003) earn a ROIQ premium (coefficient beta in the panel regression) of 1.085 percentage points per day (t = 2.44) above their average daily return; at-the-money straddles earn 0.428 pp/day (t = 2.25), both significantly positive. The panel regression controls for maturity fixed effects (11 dummies for weeks to expiration), FOMC-day effects, and day-of-week effects. Crucially, the market itself earns approximately 8 basis points lower than average during the ROIQ period, and the market loading on variance claims does not increase during the ROIQ window (Table 5), ruling out an interpretation in which the premium simply reflects a higher market beta at announcement times.

Q6. How does the paper rule out alternative explanations for the ROIQ premium?

A placebo test using VIX futures — which pay the forward-looking VIX level (expected volatility over the next 30 days after expiry) rather than realized variance over the announcement — shows no significant ROIQ premium, confirming that the effect operates specifically through exposure to volatility during the announcement itself rather than through general volatility-level exposure. The paper also shows that controlling for the Fama-French three factors does not appreciably change the ROIQ coefficient. An additional test using individual stock options (5 weekdays before earnings announcements) also yields positive ROIQ premiums, extending the result beyond FOMC to firm-level announcements.

Q7. What does the finding imply for macroeconomic preference modeling and policy?

The empirical finding that investors have a positive ROIQ premium — i.e., PER — without assuming any particular utility functional form confirms the central calibration assumption of Bansal-Yaron long-run risk models (risk aversion > 1/IES) and provides the market-based evidence that Epstein, Farhi, and Strzalecki (2014) stated was unavailable. The paper’s approach is significant for macro modeling because it establishes PER from minimal assumptions (GRS and monotonicity of preferences), meaning that the result holds across expected utility deviations including robust control, smooth ambiguity, and disappointment aversion preferences — as long as they satisfy GRS — making it a broadly applicable empirical anchor for calibrating non-expected utility models.

Q8. What are the identification limitations and scope conditions?

The identification relies on three maintained conditions: (i) GRS holds for the representative agent, (ii) FOMC announcements genuinely resolve macro uncertainty (so that the ROIQ window is correctly specified), and (iii) the pre-announcement period does not contain price-relevant news (so that market return premia during the ROIQ are not confounded with the news content of the announcement itself). The empirical support for condition (iii) comes from the fact that the market does not earn abnormal returns during the ROIQ (negative, not positive, as expected from the announcement drift literature), and from the lack of a ROIQ premium for VIX futures that expire after but not over the announcement. The framework abstracts from heterogeneous agents and assumes a representative-agent economy, which is standard but may not fully capture distributional effects.

Key Concepts

preference for early resolution of uncertainty (PER) : the property of a dynamic preference that the agent strictly prefers to learn the realization of a future uncertain outcome earlier rather than later, holding the distribution unchanged; equivalent in Epstein-Zin recursive utility to risk aversion exceeding the reciprocal of the IES.

generalized risk sensitivity (GRS) : the condition that the certainty-equivalent functional I is strictly increasing in second-order stochastic dominance; equivalent to the existence of strictly positive announcement premia for all assets comonotone with continuation utility; the paper’s key maintained assumption connecting utility levels to asset prices.

resolution of information quality (ROIQ) period : the period during which investors learn whether the upcoming macroeconomic announcement will be informative; empirically identified as the five weekdays before FOMC meetings, during which Fed-related news intensity co-moves with the inverse slope of the VIX term structure.

ROIQ premium : the excess return earned by a claim to market volatility (synthetic variance claim or straddle) during the ROIQ period over its average daily return on non-ROIQ days; the paper’s operational test for PER; estimated at 1.085 percentage points per day for variance claims.

inverse slope of the implied-volatility term structure : the ratio IV9/IV90 (9-day CBOE VIX divided by 90-day CBOE VIX); the paper’s market-based predictor of FOMC announcement informativeness; a higher ratio reflects investor anticipation of large announcement-day volatility relative to long-run baseline uncertainty.

Inequality and asset prices during Sudden Stops

Mon, 01 Jan 0001 00:00:00 +0000

Layer 1 — Overview

Research Question

This paper studies the cross-sectional dimension of Fisher’s (1933) debt-deflation mechanism as it operates during Sudden Stop crises — episodes characterized by large, abrupt reversals in the current account. The central question is how the distribution of wealth and leverage across households shapes the macroeconomic dynamics of financial crises, and whether greater inequality makes Sudden Stops more or less severe.

Data and Methodology

The empirical analysis uses panel microdata from the Mexican Family Life Survey (MxFLS) across three waves (2002, 2005, 2009), covering a representative sample of approximately 8,400 households in 150 localities. The 2009 wave captures a Sudden Stop in which Mexico’s current account reversed by 1.5 percentage points of GDP, per capita consumption fell 7 percent, and housing prices fell 4 percent below pre-crisis trend by 2010. Households are sorted by net wealth and leverage ratio — defined as total debt divided by total assets — to identify how balance sheet heterogeneity drove differentiated asset-holding dynamics during the crisis.

The theoretical framework is a Bewley small open economy model with heterogeneous agents, incomplete markets, aggregate risk (simultaneous shocks to the international interest rate and total factor productivity), and an occasionally-binding loan-to-value (LtV) collateral constraint. Households hold two assets: a one-period risk-free international bond and a risky domestic collateralizable asset (land). Households face persistent non-insurable idiosyncratic risk in both labor income and dividend returns; the latter creates an endogenous risk-wealth tradeoff, since larger asset holdings raise future income volatility while simultaneously expanding debt capacity. The model is calibrated to Mexican data — matching the leverage ratio distribution in 2005 (10 percent of households financially constrained) and a net foreign asset position of −35 percent of GDP — and solved using the FiPIt algorithm combined with the Krusell-Smith stochastic-simulation approach.

Main Findings with Quantitative Magnitudes

The empirical evidence from Mexico’s 2009 crisis reveals sharply divergent asset dynamics across the household balance sheet distribution. Wealthy households (top net-wealth decile) with low leverage increased their real estate holdings by 61.4 percent (annualized, relative to the average) between 2005 and 2009, consistent with a crisis-dampening effect whereby unconstrained agents absorb fire-sales. Wealthy households in the top decile of both net wealth and leverage ratio — financially constrained — reduced their real estate holdings by 36.6 percent, consistent with a crisis-amplifying effect. Cross-country descriptive evidence shows that Sudden Stop episodes are associated with significantly larger contractions in consumption and GDP in more unequal economies (Gini index, World Bank data, 58 Sudden Stop episodes identified by Bianchi and Mendoza 2020).

In the calibrated model, the crisis-dampening effect dominates relative to the representative agent baseline: the heterogeneous-agents economy produces a smaller decline in asset prices (−0.99 percent vs. −2.57 percent in the representative agent model during crisis episodes), but a larger and more persistent consumption decline (−2.97 percent vs. −1.17 percent) and current account reversals (1.56 percentage points vs. 0.09 percentage points). The wealth Gini index generated by the calibrated model is 0.61, close to the untargeted 2005 Mexican estimate of 0.73. The aggregate equity premium generated is 5.1 percent, close to the data estimate of 6.5 percent; of this, 55.3 percent is attributable to the risk component, 35.9 percent to the persistence effect, and 8.6 percent to the constraint effect.

When comparing the baseline emerging economy (wealth Gini 0.61) to an advanced economy calibration in which idiosyncratic dividend risk is set to zero (wealth Gini 0.29), crises are milder and less frequent in the more equal economy: consumption drops 1.0 percentage point less, asset prices drop 0.2 percentage points less, and the net foreign debt position is 6.2 percentage points larger relative to GDP. The implied slope coefficient from the model relating consumption declines during Sudden Stops to the income Gini (−11.1) closely matches the cross-country empirical estimate (−11.5). An economy with an income Gini index 0.10 points lower experiences a decline in consumption 1.1 percentage points smaller during a crisis.

An impulse response to a two-standard-deviation aggregate shock confirms that, conditional on starting from a perfectly equal (symmetric) initial distribution via complete redistribution, declines in consumption and asset prices are approximately 0.5 percentage points smaller than in the baseline economy with the stationary ergodic distribution as initial condition.

Redistributive Dividend Tax

A flat 30 percent dividend income tax, redistributed as lump-sum transfers, reduces Sudden Stop severity by lowering average asset prices by 9.6 percent relative to the benchmark, which shrinks effective debt capacity and limits bond adjustment during crises. The average current account reversal during a crisis falls by 0.54 percentage points, and aggregate consumption falls by 0.63 percentage points less than in the benchmark. Crisis probability under the benchmark threshold falls from 4.3 to 1.83 percent (less than half). Average welfare improves by a gain equivalent to 2.8 percent of consumption. However, 26.7 percent of households — those more leveraged and three times wealthier than the beneficiaries — experience welfare losses averaging 6.8 percent of consumption, due to asset price declines and tighter financial conditions.

Overall Conclusion

Both the empirical evidence and the model suggest that economies with lower inequality, whether due to reduced idiosyncratic risk (as in advanced versus emerging economy calibrations) or wealth redistribution across agents with identical idiosyncratic risk processes, experience less severe Sudden Stop crises.

In depth

Q1. What are the two cross-sectional channels through which household heterogeneity affects the debt-deflation mechanism, and in which direction do they move asset prices?

A1: The dampening effect operates when unconstrained wealthy households — who hold diversified portfolios and have precautionary savings in bonds — purchase fire-sold assets from constrained households, relieving downward pressure on asset prices. The amplifying effect operates when highly leveraged households, once pushed into binding credit constraints by declining asset prices, must further liquidate asset positions, deepening the price decline and tightening the collateral constraint for additional households via the pecuniary externality. These two effects move in opposite directions, so the net effect of inequality on crisis severity is theoretically ambiguous and depends on calibration.

Q2. What specific empirical evidence from Mexico’s 2009 Sudden Stop supports both cross-sectional effects?

A2: Using MxFLS microdata, Table 1 in the paper shows that wealthy households (top net-wealth decile) with low leverage (deciles I–VII of leverage) increased their real estate holdings by 61.4 percent between 2005 and 2009 — evidence for the dampening effect. Wealthy households in the top decile of both net wealth and leverage reduced their real estate holdings by 36.6 percent — evidence for the amplifying effect. Between 2005 and 2009, the share of financially constrained households (leverage ratio above 0.168, the 90th percentile) increased by 1.7 percentage points, while the share of financial savers dropped by 5.0 percentage points. The pre-crisis period (2002–2005) shows no comparable divergence, ruling out a mechanical mean-reversion explanation.

Q3. What is the risk-wealth tradeoff, and why is it central to generating a realistic wealth and leverage distribution in the model?

A3: The risk-wealth tradeoff arises because idiosyncratic dividend risk is endogenous to asset holdings: holding more risky domestic assets increases debt capacity (relaxing borrowing constraints) but also raises future income volatility, since the variance of household flow income is convex in asset holdings. For households earning high dividend realizations, there exists a threshold beyond which precautionary savings motives — driven by rising income risk — dominate the benefit from expanded debt capacity, causing these households to begin accumulating bonds and eventually become net savers. This mechanism generates an empirically plausible distribution in which some households are financially constrained at the LtV limit, others are unconstrained borrowers, and a fraction are net savers holding both domestic assets and positive international bonds.

Q4. How does the model calibration match the stationary distribution of Mexican households?

A4: Three parameters governing the dividend income risk process (average dividend yield, autocorrelation, and standard deviation) are jointly calibrated to match three statistics from the MxFLS 2005 distribution of households: 14.1 percent financial savers (data: 14.2 percent), 75.9 percent unconstrained indebted (data: 75.8 percent), and 10.0 percent financially constrained (data: 10.0 percent). The collateral fraction κ = 0.168 is set equal to the 90th percentile of the leverage ratio distribution in 2005, reflecting that the average delinquency rate for commercial bank household credit was 10.3 percent between 2004 and 2008. The discount factor β = 0.90 matches the average net foreign asset position relative to GDP of −35 percent for Mexico.

Q5. How does the heterogeneous-agents model compare to the representative agent model in terms of crisis dynamics?

A5: In the heterogeneous-agents benchmark, the average current account reversal during a Sudden Stop is 1.56 percentage points, consumption falls 2.97 percent, and asset prices fall 0.99 percent below the steady state. In the representative agent model with the same average leverage ratio (κ = 0.12), the current account reversal is only 0.09 percentage points, consumption falls 1.17 percent, and asset prices fall 2.57 percent. The crisis-dampening effect in the heterogeneous economy produces a smaller asset price drop but a larger consumption decline, because leveraged households must make larger consumption adjustments when hit by negative idiosyncratic shocks in addition to the aggregate shock. Impulse response analysis shows the heterogeneous-agents economy generates current account reversals 1.9 percentage points larger than the representative agent, and consumption responses approximately four times larger.

Q6. What is the mechanism by which comparing emerging and advanced economy calibrations shows that lower inequality leads to less severe crises?

A6: The advanced economy calibration sets idiosyncratic dividend risk to zero, eliminating the risk-wealth tradeoff and resulting in a wealth Gini of 0.29 (compared to 0.61 in the baseline). Without dividend risk, households have weaker incentives to accumulate assets as a precautionary buffer against income volatility, so they hold less debt on average and the long-run net foreign debt relative to GDP is 6.2 percentage points larger (i.e., less debt). During a Sudden Stop under this calibration, consumption drops 1.0 percentage point less, asset prices drop 0.2 percentage points less, and the economy is less frequently in crisis. The model-implied slope of consumption decline on income Gini is −11.1, matching the cross-country empirical estimate of −11.5.

Q7. What does the impulse response analysis reveal about the effect of wealth redistribution on crisis severity, holding idiosyncratic risk constant?

A7: The impulse response analysis compares the baseline heterogeneous-agents economy (with the stationary ergodic distribution as the initial condition) against a version in which all households are given a perfectly symmetric initial distribution — identical bond and asset holdings equal to long-run averages — while retaining the same idiosyncratic risk processes. The symmetric initial condition corresponds to a complete redistribution of wealth without changing fundamentals. In the first three periods after a two-standard-deviation aggregate shock, the symmetric economy shows declines in consumption and asset prices approximately 0.5 percentage points smaller than the baseline. This demonstrates that even holding the risk environment constant, reducing wealth dispersion mitigates crisis severity.

Q8. How does the equity premium decomposition work in the heterogeneous-agents model, and which components are quantitatively most important?

A8: The aggregate equity premium is decomposed into five components (Equation 7 in the paper): a constraint effect (positive, increasing in the measure and intensity of constrained households), a risk effect (positive, from the negative covariance between the individual stochastic discount factor and individual equity return, weighted more heavily on constrained households), a persistence effect (positive, from the covariance between idiosyncratic dividend return and asset holdings, since high-dividend households accumulate more assets), a trading cost effect (approximately zero in aggregate), and a no-short-sales effect (negative, since households at the short-sales constraint add to asset demand without increasing the marginal benefit of saving). In the calibrated model, the equity premium is 5.1 percent; the risk effect accounts for 55.3 percent, the persistence effect for 35.9 percent, and the constraint effect for 8.6 percent.

Q9. What is the mechanism by which the dividend income tax reduces crisis severity?

A9: A flat 30 percent dividend income tax lowers average after-tax dividend returns, reducing households’ incentive for precautionary accumulation of domestic assets and weakening the risk-wealth tradeoff. As a result, households demand fewer domestic assets and fewer international bonds in normal times. The reduced demand for the domestic asset lowers the equilibrium asset price by 9.6 percent on average relative to the benchmark, which — through the pecuniary externality embedded in the LtV constraint — tightens borrowing constraints, raising the share of financially constrained households from 5.6 to 7.8 percent. Nevertheless, the reduction in equilibrium debt positions means that during a crisis, bond adjustments and consumption drops are more limited: the average current account reversal during crises falls by 0.54 percentage points, and aggregate consumption falls by 0.63 percentage points less than in the benchmark. Crisis probability under the benchmark threshold falls from 4.3 to 1.83 percent.

Q10. Who benefits and who loses from the dividend income tax, and by how much?

A10: Among the simulated population, 73.3 percent of households experience welfare gains averaging 6.2 percent of consumption in consumption-equivalent terms, while 26.7 percent experience welfare losses averaging 6.8 percent of consumption. The average welfare gain across all households is equivalent to 2.8 percent of consumption. The households experiencing losses are more leveraged and three times wealthier on average than those that benefit; the policy reduces their net worth through lower asset prices and tightens their financial constraints. The welfare analysis accounts for the transition to the new tax policy.

Q11. Why does the representative agent model miss the cross-sectional effects that are central to the paper’s mechanism?

A11: In the representative agent model, all households behave identically and either collectively want to buy or sell assets, but since there is no one to trade with domestically, actual asset holdings remain unchanged by cross-sectional forces. Additionally, the average debt constraint multiplier in the representative agent equals the single household’s multiplier, whereas in the heterogeneous model a small fraction of highly constrained households can have much larger individual multipliers, amplifying the aggregate debt-deflation effect. In the calibrated stationary model, 10 percent of constrained households own 7.7 percent of assets and have a consumption share of 9.0 percent, while 75.9 percent of unconstrained indebted households hold 88.1 percent of assets with a consumption share of 78.1 percent — distributional features invisible to a representative agent.

Q12. What robustness does the model validation provide for the quantitative results?

A12: The model reproduces the untargeted net wealth and asset distributions across deciles from MxFLS 2005 closely, with slight underestimation at the top deciles; the exception is the bottom decile of debt (where the model cannot generate households with negative net wealth since default is not modeled). The aggregate law of motion for the Krusell-Smith algorithm fits with R² = 0.99 for bond position and R² = 0.93 for asset price, and Den Haan (2010) accuracy checks show maximum forecast errors of 2.8 (current account) and 1.1 (asset price). The model replicates the untargeted magnitude of current account reversals observed in Mexican Sudden Stops. The wealth Gini of 0.61 is close to the untargeted 2005 Mexican estimate of 0.73, and the equity premium of 5.1 percent is close to the data estimate of 6.5 percent.

Key Concepts

Sudden Stop: An episode characterized by a large, abrupt reversal in the current account, typically triggered by a sudden halt in foreign capital inflows. In this paper, Sudden Stops are modeled as endogenous crises that arise from the interaction of a negative aggregate shock (simultaneous rise in the international interest rate and decline in total factor productivity) with an occasionally-binding LtV collateral constraint. The paper follows Bianchi and Mendoza (2020) in identifying 58 such episodes over the past four decades.

Debt-deflation mechanism (cross-sectional dimension): The paper studies Fisher’s (1933) debt-deflation spiral — in which declining asset prices tighten credit constraints, forcing further asset sales, further depressing prices — through the lens of household heterogeneity. The cross-sectional dimension refers to the fact that different households (wealthy unconstrained vs. highly leveraged constrained) respond differently to price declines, generating two opposing effects: dampening (wealthy buyers absorb fire-sales) and amplifying (constrained households fire-sell additional assets).

Risk-wealth tradeoff: A novel feature of the model in which holding more risky domestic assets simultaneously (a) expands debt capacity by relaxing the LtV constraint and (b) increases future income volatility through higher exposure to idiosyncratic dividend risk, since the variance of household flow income is convex in asset holdings. This tradeoff generates the endogenous transition of households from indebted to net-saver status and gives rise to the empirically plausible distribution of savers, unconstrained borrowers, and constrained households.

Loan-to-value (LtV) collateral constraint: A borrowing limit requiring that households’ international debt (negative bond holdings) cannot exceed a fixed fraction κ of the market value of their domestic asset holdings. In the paper, κ = 0.168 (the 90th percentile of the Mexican leverage ratio distribution in 2005). The constraint is occasionally binding and generates a pecuniary externality: households fail to internalize that their individual portfolio choices affect the aggregate asset price, which in turn determines the borrowing limits of all other households.

Pecuniary externality: The externality arising from the LtV constraint in which each household’s choice of asset holdings affects the equilibrium asset price, thereby changing the borrowing limits of all households simultaneously. This externality drives the debt-deflation spiral and is the source of Sudden Stop crises in the model: no single household internalizes the aggregate impact of its fire-sales on credit conditions.

Fire-sale: In the context of this paper, the forced liquidation of domestic asset holdings by financially constrained households during a crisis. Fire-sales are triggered when the LtV constraint becomes binding, forcing households to sell assets to reduce debt; the resulting price decline tightens the constraint further, producing additional fire-sales. The paper documents that, during Mexico’s 2009 Sudden Stop, wealthy constrained households (top decile of both net wealth and leverage) reduced real estate holdings by 36.6 percent, while wealthy unconstrained households increased holdings by 61.4 percent.

Dampening and amplifying effects: Two opposing cross-sectional effects on asset prices during a crisis. The dampening effect: unconstrained wealthy households purchase depressed assets fire-sold by constrained households, relieving downward pressure on prices and weakening the debt-deflation spiral. The amplifying effect: highly leveraged households that are pushed into binding constraints by falling prices must also fire-sell assets, further depressing prices and tightening financial conditions. The net impact on crisis severity depends on which effect dominates, which the paper establishes empirically and quantitatively is inequality-dependent.

Equity premium decomposition: A decomposition derived in the paper (Equation 7) that expresses the aggregate excess return on the risky domestic asset as the sum of five components: a constraint effect (positive, from the measure and intensity of binding LtV constraints), a risk effect (positive, from the covariance of individual stochastic discount factors with individual equity returns), a persistence effect (positive, from the covariance of idiosyncratic dividend returns with asset holdings due to return persistence), a trading cost effect (approximately zero in aggregate), and a no-short-sales effect (negative). In the calibrated model, the risk and persistence effects account for 91 percent of the 5.1 percent equity premium.

Insuring Peace: Index-Based Livestock Insurance, Droughts, and Conflict

Mon, 01 Jan 0001 00:00:00 +0000

This paper provides quasi-experimental evidence that Index-Based Livestock Insurance (IBLI) — a remote-sensing-triggered, automated payout scheme for pastoralists — substantially reduces drought-induced conflict in Kenya over the 2001–2020 period.

The research question is whether a market-based financial instrument can mitigate the causal chain running from drought shocks to violent conflict between nomadic pastoralists and sedentary farmers and other land users. The authors motivate the study by documenting that droughts force pastoralists out of their traditional grazing grounds and into mixed-land-use areas (farms, ranches, urban settlements, nature reserves), where miscoordination with other land users escalates into violence. A case study of the Samburu-Laikipia-Isiolo-Meru region in central Kenya — drawing on georeferenced survey data from Lengoiboni et al. (2010) and ACLED conflict events — validates this spatial mechanism: during droughts, roughly 60–90% of non-pastoral land users report encounters with pastoralists, and conflicts accumulate precisely where drought migration routes cross into non-pastoral land.

The empirical design combines two sources of variation: (1) plausibly exogenous changes in rainfall deficits at the 0.1 × 0.1-degree grid-cell level (roughly 10 × 10 km), derived from NASA GPM satellite data; and (2) the staggered, five-wave rollout of IBLI across 146 insurance districts in Kenya from 2010 onward, which the authors argue was driven primarily by technical challenges rather than pre-existing conflict or drought patterns. The unit of observation is 94,300 cell-periods. Because conflicts due to pastoralist drought migration occur in the neighborhood of affected areas rather than within them, both drought and IBLI coverage are measured as inverse-distance-weighted averages over surrounding cells. The estimating equation is a linear probability model with cell and period fixed effects, interacting neighborhood rainfall deficit with neighborhood IBLI coverage; the coefficient on this interaction term (delta3) is the parameter of interest.

The main finding is that a one-standard-deviation increase in neighborhood IBLI coverage reduces the semi-elasticity of neighborhood rainfall deficit on conflict probability by approximately 23%. In absolute terms, a one-percentage-point increase in the rainfall deficit raises the probability of conflict by 6.92 percentage points at average IBLI coverage; with one additional standard deviation of neighborhood IBLI, that same deficit raises conflict probability by only 5.34 percentage points — a reduction of 1.58 percentage points against a baseline conflict probability of roughly 2.5%.

Scope conditions: the effect is estimated for Kenya specifically, over a pastoralist-heavy population of approximately 8.8 million out of 53 million Kenyans, during 2001–2020. The conflict-mitigating effect is approximately four times larger in mixed-land-use areas (nine times when rollout-cluster-times-period fixed effects are included), consistent with the theoretical expectation that IBLI matters most where pastoralists are most likely to encounter other land users during drought migration.

Two mechanisms are identified. First, IBLI reduces migratory pressure: when pastoral homelands have IBLI coverage, the distance between the ethnic homeland centroid and conflict events involving that group decreases, indicating reduced drought migration. Second, IBLI smooths incomes — corroborated with Afrobarometer geo-coded data — raising the opportunity cost of fighting. An instrumental-variable specification finds that actual IBLI payouts in the neighborhood reduce conflict probability by approximately 150% relative to the baseline risk.

A cost-effectiveness analysis finds that even using conservative World Health Organization or World Bank estimates of the value of statistical life, IBLI delivers fatality savings of between 10 and 22 cents per dollar spent on government subsidies for the program, making it a cost-effective complement to political and institutional conflict-mitigation approaches.

Q: What is the core causal mechanism linking droughts to conflict that IBLI interrupts?

A: Droughts deplete forage in pastoralists’ traditional grazing grounds, forcing them to migrate into mixed-land-use areas — farms, ranches, urban settlements, and nature reserves — where encounters with other land users are more likely to escalate into violence. Without insurance, pastoralists hold excess livestock as precautionary savings, amplifying the extent of necessary migration during dry periods. IBLI payouts allow pastoralists to purchase forage locally, reducing migration distance and intensity, and also smooth income, raising the opportunity cost of engaging in violence.

Q: How does IBLI work technically, and why does it overcome problems of traditional livestock insurance?

A: IBLI uses satellite remote sensing to calculate whether a district-specific drought threshold has been crossed; if so, automated payments are triggered immediately without requiring direct loss assessment or field inspections. This design eliminates moral hazard and adverse selection problems inherent in traditional indemnity insurance, reduces monitoring costs, and enables fast delivery via mobile payment platforms such as MPESA even to remote households. The Kenyan government rebranded the program as the Kenyan Livestock Insurance Program (KLIP) in 2015 and fully subsidizes coverage for up to five tropical livestock units per household.

Q: What is the magnitude of the main conflict-mitigation result?

A: A one-standard-deviation increase in neighborhood IBLI coverage reduces the semi-elasticity of the neighborhood rainfall deficit on conflict probability by approximately 23% (delta3/delta1 = -0.0158/0.0692). In absolute terms, this translates to a reduction from a 6.92 percentage-point increase in conflict probability per one-percentage-point rainfall deficit to a 5.34 percentage-point increase — a decline of 1.58 percentage points against a mean conflict probability of roughly 2.5%.

Q: Why do the authors use a neighborhood rather than cell-level treatment measure?

A: Drought-induced pastoralist conflicts occur primarily not in the pastoral home areas themselves but in neighboring regions where drought migration routes cross into non-pastoral land. The case study documents this pattern directly: ACLED conflict events accumulate where migration routes from Namelok, Lodungokwe, and Ngaremara communities intersect urban or agricultural areas, not within the pastoral zones. The neighborhood approach, using inverse-distance-weighted averages, captures both the probability of migration from surrounding cells and the declining probability of migration with distance.

Q: What is the main identification concern and how do the authors address it?

A: The main concern is that the timing of the IBLI rollout is endogenously determined — areas with a higher latent drought-conflict elasticity might receive coverage earlier or later, biasing the interaction coefficient. The authors show that the pre-treatment drought-conflict elasticity has no systematic correlation with either IBLI eligibility or the timing of coverage receipt. Placebo tests interacting the neighborhood rainfall deficit with pre-treatment eligibility or eventual coverage indicators yield positive, statistically insignificant coefficients, suggesting any bias would run in the direction of underestimating the mitigation effect. A permutation test randomly reassigning IBLI coverage across the six rollout clusters finds the actual point estimate is in the bottom 2.2% of the simulated distribution, indicating it is unlikely to arise from cluster-level confounders.

Q: How do the authors rule out that other programs — cash transfers or development aid — explain the result?

A: The authors control for cell-level and neighborhood-level coverage of Kenya’s Hunger Safety Net Programme (HSNP), which provides unconditional cash transfers to vulnerable households and covers most IBLI-eligible areas, as well as for World Bank agricultural aid projects. Across these specifications, the estimated conflict mitigation ranges from -19.16% to -42.24%, with the baseline estimate of -22.79% remaining robust, indicating neither HSNP nor development aid is a plausible alternative explanation.

Q: What is the alternative identification strategy using within-rollout-cluster variation?

A: The authors exploit pre-determined (1984 government land-use map) variation in mixed-land-use status across cells within the same IBLI rollout cluster-period, including rollout-cluster-times-period fixed effects that absorb any omitted variable related to the potentially endogenous rollout steps. The conflict-mitigating effect of IBLI is approximately four times larger in mixed-land-use cells, and approximately nine times larger in the most restrictive specification with rollout-cluster-times-period fixed effects, consistent with the prediction that IBLI matters most where pastoralists encounter other land users.

Q: How do the authors establish the migratory pressure mechanism?

A: Following Eberle et al. (2023), the authors match conflict actors to ethnic homelands using Murdock (1967) boundaries and test whether IBLI coverage in a homeland reduces the distance between the homeland centroid and conflict events involving that group. They find that it does, indicating that IBLI coverage reduces the spatial range of pastoralist drought migration and thus the probability of conflict-generating encounters with other land users.

Q: How do the authors establish the income-smoothing mechanism?

A: Using geo-coded Afrobarometer survey data, the authors show that IBLI coverage is associated with higher reported incomes among pastoralist households, consistent with Jensen et al. (2017). Higher incomes raise the opportunity cost of fighting (following Grossman, 1991), contributing to the overall conflict-mitigating effect alongside reduced migratory pressure.

Q: What does the instrumental variable specification find?

A: The authors instrument inverse-distance-weighted IBLI payouts in the neighborhood with the interaction of neighborhood rainfall deficit and neighborhood IBLI coverage. The first stage confirms that rainfall deficits trigger payouts conditional on coverage. The second stage finds that the occurrence of payouts in the neighborhood reduces the probability of conflict by approximately 150% relative to the baseline risk, corroborating the reduced-form results.

Q: How do the authors assess cost-effectiveness?

A: The authors predict plausible drought-induced conflict fatalities in Kenya over the pre-treatment period and calculate yearly lives saved from the main estimates, then compare the monetary value of saved lives to government subsidy expenditures on IBLI. Using conservative VSL estimates from the WHO and World Bank, IBLI delivers between 10 and 22 cents of pure fatality savings per dollar of public subsidy expenditure.

Q: How robust are the results to alternative drought and conflict measures?

A: Results are qualitatively similar using an Aridity Index or Dry Matter Productivity (DMP) as drought proxies instead of rainfall deficit. The estimated interaction effect maintains a t-statistic above two for spatial decay functions ranging from distance^-0.5 to distance^-1.5 and for Conley standard error cutoffs from 200 km up to 400 km. Results also hold when restricting to conflict events not involving the government, or to battles, riots, and violence against civilians only, and when excluding the pre-IBLI period (2000–2009) entirely.

Q: What are the policy implications regarding scalability?

A: Pastoralism covers 43% of the African landmass across 36 countries, supporting approximately 268 million people (FAO, 2018). The World Bank and private equity were planning to invest close to 900 million dollars in East African pastoralist programs over 2023–2027. The authors argue that IBLI’s cost structure — high fixed costs of technology and setup but low marginal costs of expansion — gives it a scalability advantage over cash transfer programs or public works schemes that require sustained state capacity. Market-based IBLI complements rather than substitutes for political and institutional reforms.

Index-Based Livestock Insurance (IBLI): A financial instrument that uses satellite remote sensing to automatically trigger preemptive cash payouts to pastoralists when a pre-determined district-specific drought threshold is crossed, bypassing direct loss assessment and thereby eliminating moral hazard and adverse selection problems inherent in traditional indemnity insurance.

Drought-conflict semi-elasticity: The percentage-point change in the probability of conflict associated with a one-percentage-point increase in the rainfall deficit; the paper’s main outcome quantity, estimated at 6.92 percentage points at mean IBLI coverage, reduced by 23% for a one-standard-deviation increase in neighborhood IBLI coverage.

Neighborhood approach: An empirical strategy that measures both drought severity and IBLI coverage as inverse-distance-weighted averages over all surrounding grid cells, reflecting the authors’ finding that pastoralist drought-migration generates conflicts not in the pastoral home area but in neighboring mixed-land-use zones where migration routes intersect other land users.

Migratory pressure: The mechanism by which drought forces pastoralists — who hold excess livestock as precautionary savings in the absence of insurance — to migrate farther from traditional grazing grounds into mixed-land-use areas, increasing the probability of encounters and violent miscoordination with farmers, urban dwellers, and protected-area managers.

Mixed land use: Areas, designated using a 1984 Kenyan government land-use map, where pastoral grazing zones are proximate to farms, ranches, urban settlements, or nature reserves; the paper identifies these as the locations with the highest expected treatment intensity, where IBLI coverage reduces drought-induced conflict approximately four to nine times more than elsewhere.

Tropical Livestock Unit (TLU): The standard unit of account for IBLI contracts in Kenya; one TLU corresponds to one head of cattle or ten goats or sheep; the Kenyan government fully subsidizes IBLI for up to five TLUs per household.

Rollout-cluster-times-period fixed effects: A restrictive set of fixed effects included in the alternative identification strategy that absorbs all omitted variables varying at the level of the six IBLI spatial rollout clusters over time, allowing the authors to identify the conflict-mitigating effect purely from within-cluster variation in mixed-land-use exposure.

International trade and macroeconomic dynamics with sanctions

Mon, 01 Jan 0001 00:00:00 +0000

Sanctions are increasingly used as an instrument of economic statecraft, yet their macroeconomic consequences—especially the transitional dynamics and their effects on business cycles—are poorly understood. This paper develops a micro-founded framework combining the intertemporal general-equilibrium structure of standard open-economy macro models with the rich trade-theoretic microfoundations of modern trade theory to study sanctions systematically. In a two-country, two-sector model where Home specializes in differentiated consumption goods (heterogeneous firms, endogenous entry, Melitz-style) and Foreign specializes in homogeneous intermediate goods (Cournot oligopoly in extraction), sanctions—modeled as trade bans and financial restrictions excluding particular Foreign agents—reallocate resources across and within countries, affect production, exchange rates, and welfare, and are shown to inflict larger welfare losses when they target sectors of comparative disadvantage. A central finding is that focusing only on long-run outcomes and overlooking initial transitional dynamics substantially misdirects welfare assessments; sanctions weaken international comovement and fragment markets but, contrary to some claims, leave the structure of business cycles largely intact.

In depth

Q1. What is the model structure and what types of sanctions does it capture?

The model is a two-country, two-sector economy: Home has a comparative advantage in differentiated consumption goods (produced by heterogeneous firms with endogenous entry under monopolistic competition, as in Melitz 2003), while Foreign specializes in homogeneous intermediate goods (produced via Cournot competition among a fixed number of upstream firms and processed by a representative distributor). This structure is motivated by the pattern in which Western economies specialize in high-value, firm-entry-intensive industries while sanctioned countries often specialize in commodity production (energy, natural gas). Sanctions are modeled as two distinct instruments: trade bans (restrictions on commerce in goods) and financial restrictions (excluding particular Foreign agents from capital markets). The model accommodates both Ricardian and Melitz-type comparative advantage.

Q2. What are the key results on welfare effects and the role of transitional dynamics?

Sanctions reallocate resources across and within countries, with welfare losses larger when sanctions target sectors of comparative disadvantage rather than sectors of comparative advantage; and focusing only on long-run welfare ignores significant transitional costs that substantially change the total welfare assessment. The model implies that initial transitional dynamics—disruptions to trade flows, entry and exit of firms, exchange rate movements, and adjustment of resource allocation—can be quantitatively important and even dominate long-run effects in welfare calculations. Assessments based solely on steady-state comparisons may therefore produce seriously misleading conclusions about whether and how severely sanctions harm the imposing or target economy.

Q3. How do sanctions affect international comovement and business cycles?

Sanctions weaken international business cycle comovement and fragment markets—reducing the degree to which shocks in one country transmit to the other—but leave the structural properties of business cycles within each country largely intact, in the sense that the cyclical dynamics of output, consumption, and investment retain their qualitative features. This result has policy implications: sanctions can reduce the interdependence of the sanctioned country’s business cycle from the rest of the world (which could be either beneficial or harmful depending on the source of shocks), but cannot fundamentally restructure the domestic cycle.

Q4. What is the baseline application and what general lessons emerge?

While the model’s comparative advantage structure is explicitly motivated by the 2022 Western sanctions against Russia—with intermediate goods interpretable as energy—the framework is designed to be more broadly applicable to other geopolitical conflicts involving sanctioned commodity producers facing differentiated-goods exporters, such as US-China trade tensions. The general lessons are: (1) the sector targeted by sanctions matters greatly for their welfare costs; (2) transitional dynamics are not second-order; (3) financial sanctions operate through different channels than trade bans and should be modeled separately.

Key concepts

comparative disadvantage in sanctions : the paper’s finding that sanctions are more costly when they target goods in which the targeted country has a comparative disadvantage—sectors the country cannot efficiently produce domestically—because those are the sectors where trade provides the highest value and substitution is hardest.

financial restrictions : one of the two types of sanctions modeled, in which particular Foreign agents (firms or sovereign entities) are excluded from international capital markets, distinct from trade bans that restrict commerce in goods.

International Trade Responses to Labor Market Regulations

Mon, 01 Jan 0001 00:00:00 +0000

Overview

Research Question. This paper asks whether differences in labor market regulations — specifically payroll taxes and minimum wages — shape countries’ comparative advantage in the cross-border provision of labor-intensive services. The question has broad policy relevance: if lower labor standards confer a systematic trade advantage, countries may face pressure to race to the bottom in labor protections, and political support for economic integration may erode.

Setting and Identification. The paper exploits the EU “posting policy,” a large trade program established in 1959 that allows firms in one EU member state to temporarily send their employees to perform service contracts in another member state. In 2017, posting accounted for roughly one-third of all within-EU trade in services (approximately 2% of EU GDP), involving about 2 million workers (in full-time equivalents) in 2019. The setting is analytically attractive because competing foreign and domestic firms serve the same customers at the same physical location using shared capital, holding most determinants of comparative advantage constant while labor market regulations vary by the firm’s country of origin.

Under posting rules, payroll taxes are generally origin-based (exporting firms pay their home country’s tax rate) but become destination-based when contracts exceed a regulatory duration threshold (12 months pre-2010, 24 months from 2010–2020, 18 months from 2020 onward). Minimum wages are destination-based: foreign firms must match the importing country’s statutory minimum wage floor when it exceeds the workers’ home-country wage level. This generates the paper’s key identifying variation — payroll taxes and minimum wages vary across countries, over time, and within countries across sectors.

Data. The author uses administrative A1 social security forms filed for every EU posting contract from 2007–2018, collected from 25 EU member states, supplemented by micro-level national posting registries in Belgium (LIMOSA), France (SIPSI), and Luxembourg (matched employer-employee data). Labor cost data (wages, payroll tax rates, minimum wages) come from Eurostat and the OECD Taxing Wages Dataset.

Methodology. The paper proceeds in three steps. First, it documents steady-state cross-sectional correlations between bilateral posting flows and labor cost differentials. Second, it estimates difference-in-differences (DiD) elasticities from four quasi-natural experiments. Third, it estimates a theory-consistent gravity model using all sources of variation across 25 EU countries from 2009–2018.

Main Findings.

Steady-state correlation: A strong negative relationship exists between bilateral posting flows and labor cost differentials, with a cross-sectional elasticity of approximately –0.58 (SE 0.08). In sharp contrast, the relationship between bilateral goods trade and labor cost differentials is weak and if anything marginally positive (point estimate +0.13), confirming that labor cost differences are a distinctive driver of trade specifically in labor-intensive services rather than goods.
Belgian tax shift (2016–2018): When Belgium cut employers’ social security contributions from 33% to 25%, imports of posting services into Belgium slowed relative to France (a neighboring control country on parallel pre-reform trends). The reduced-form elasticity of posting imports with respect to the payroll tax rate is 1.45 (SE 0.3).
Luxembourg EU regulation reform (2010): A new EU regulation required temporary employment agencies in border regions to pay destination-based payroll taxes, raising statutory rates faced by Luxembourgish exporters from 15% to 44%. Posting exports from Luxembourg’s temporary employment sector fell by 40% relative to the pre-reform level and relative to the domestic (control) sector, while the sheltered road transportation sector showed no response. The reduced-form elasticity with respect to the statutory payroll tax rate is –1.55 (SE 0.24), and the triple-difference estimate is –1.37 (SE 0.08).
Bunching at duration thresholds: The distribution of posting contract lengths in France (which has the EU’s highest payroll taxes) shows a sharp spike just below the 24-month payroll tax threshold. When the threshold was moved to 18 months in 2020, excess mass migrated to the new threshold, confirming that bunching reflects behavioral responses to the tax notch rather than reference-point effects. This documents that payroll tax differentials shape not only the quantity (extensive margin) but also the length (intensive margin) of posting contracts.
German minimum wage reform (2015): Germany’s introduction of a national minimum wage of €8.50 per hour — which was already binding on construction workers through a sectoral minimum, but not on foreign firms providing non-construction services — caused postings to Germany in manufacturing to fall by approximately 60% relative to the construction (control) sector. The reduced-form elasticity is –1.34 (SE 0.43). Heterogeneity analysis shows that export declines were monotonically larger for low-wage origin countries where the new minimum wage was binding, and placebo estimates using Germany’s high-wage neighboring countries (where minimum wage requirements did not change) are statistically indistinguishable from zero.
Gravity estimates: The preferred specification (PPML with origin-year, destination-year, and pair fixed effects, exploiting bilateral variation in minimum wage bindingness across origin countries) yields a model-implied trade elasticity θ of –1.2 (SE 0.2). The range across specifications is –1.2 to –2.4. These estimates are smaller than the goods trade elasticity (typically estimated around 5) and below the medium-run reduced-form elasticities from the DiD case studies, consistent with short-run gravity estimates capturing only partial adjustment while DiD designs measure longer-run equilibrium responses.

Policy Counterfactual. The paper’s estimates imply that the Bolkestein Directive — which proposed exempting foreign firms from all destination-country labor regulations — would have doubled exports of physical services from Eastern European countries (upper bound), as their cost advantage would have been dramatically amplified by removal of minimum wage requirements. Counterpart to this export boom, average posted workers’ wages would have fallen by approximately 16%, since workers would lose their entitlement to destination-country minimum wages. The paper documents that the Bolkestein controversy — sparked by the “Polish plumber” debate in early 2005 — coincided with a sharp and persistent drop in French voter support for the EU constitutional treaty, which was subsequently rejected.

Scope Conditions. Results apply specifically to trade in physical (labor-intensive) services traded via temporary worker posting within the EU, where productivity differences across countries for these tasks are plausibly small (Balassa-Samuelson), making institutional factors a primary driver of wage differences. The paper estimates intent-to-treat effects, assuming perfect compliance by exporting firms. The paper does not perform a comprehensive welfare analysis covering consumer price effects or general equilibrium wage and trade-balance responses.

In depth

Q1. What is the EU posting policy and why does it provide an unusually clean setting for identifying the causal effect of labor regulations on trade?

The EU posting policy, established in 1959, allows firms in one EU member state to temporarily send employees to perform service contracts in another member state. The policy keeps most determinants of comparative advantage constant — competing foreign and domestic firms serve the same customers at the same physical location using shared capital — while labor market regulations vary by the firm’s country of origin. Productivity differences for physical services across countries are also plausibly limited (Balassa-Samuelson), making institutional wage differences the primary cost driver. Enforcement is facilitated by the on-site nature of the service, and administrative A1 forms create a direct measure of the number of workers involved in cross-border transactions without a minimum reporting threshold.

Q2. What are the three sources of labor cost differences the paper identifies and quantifies?

Foreign firms competing for posting contracts face different costs through three channels: (i) equilibrium gross wages differ across origin countries, reflecting both productivity differences and institutional/information frictions that allow wage discrimination between posted and domestic workers; (ii) payroll tax rates are origin-based and differ substantially across countries (for example, France’s employer payroll tax is approximately 40% versus approximately 15% for Luxembourg before the 2010 reform); and (iii) destination-specific minimum wages impose a “posting allowance” on firms from countries with lower wages, equal to the shortfall between the firm’s home-country wage and the importing country’s minimum wage floor. Micro-level wage data from France confirm that most posted workers from low-wage countries are paid exactly at the French minimum wage, demonstrating the bindingness of the third channel, while French workers performing the same tasks receive wages near the French average (approximately €21.1 per hour versus a minimum wage of approximately €10 per hour in 2018).

Q3. What does the cross-sectional evidence show about the relationship between labor cost differentials and posting flows, and how does this compare to goods trade?

Bilateral posting flows and bilateral labor cost differentials have a tight negative cross-sectional relationship with an estimated elasticity of –0.58 (SE 0.08), indicating that countries export more posting services when their labor costs are substantially below those of the destination country. The same exercise applied to bilateral goods trade yields a coefficient of +0.13 (SE 0.07) — weak and marginally positive — consistent with goods trade being driven by capital, technology, and scale rather than labor cost differentials. The gap confirms that labor cost differences are a distinctive comparative advantage mechanism for labor-intensive services but not for less labor-intensive goods.

Q4. What does the Belgian tax shift reform demonstrate, and how is identification established?

Belgium cut employer social security contributions from 33% to 25% between 2016 and 2018 in a revenue-neutral reform (financed by VAT, excise duties, and dividend taxes). The DiD compares posting imports into Belgium with those into France (a neighboring, similarly sized importer on parallel pre-reform trends). Belgium and France imported posting services at similar rates before 2015; Belgian imports slowed immediately after the reform while French imports continued growing. The reduced-form elasticity of posting flows with respect to the destination payroll tax rate is 1.45 (SE 0.3). The elasticity with respect to total labor cost is 3.7 (SE 0.7). No discernible response is detected for trade in manufacturing goods, providing a within-reform placebo. A synthetic control using all available importing countries yields a smaller elasticity of 0.6 (SE 0.22).

Q5. How does the Luxembourg EU regulation reform (2010) improve on the Belgian case for identification?

The 2010 EU regulation required temporary employment agencies in border regions to pay destination-based (rather than origin-based) payroll taxes, raising statutory rates for Luxembourgish exporters from 15% to 44%. Unlike the Belgian reform, this created within-country variation: the same Luxembourgish firms were exposed in the temporary employment sector but not in road transportation (which received a 10-year exemption). This within-exporter, cross-sector design controls for all Luxembourg-wide demand or supply shocks. Posting exports by the temporary employment sector fell 40% relative to pre-reform levels and relative to the domestic (control) sector, while road transportation posting showed zero response. The monthly data confirm the drop occurred in the exact month following the regulation with no anticipation. The triple-difference elasticity (with respect to the payroll tax rate) is –1.37 (SE 0.08).

Q6. What does the bunching evidence at payroll tax duration thresholds add to the DiD findings?

When posting contracts exceed a regulatory duration threshold (24 months during 2010–2020, then 18 months from July 2020), payroll taxes become destination-based. Because France has the highest payroll tax in the EU, all exporting firms face strong incentives to avoid crossing the threshold. The distribution of posting contract lengths in France shows sharp excess mass just below 24 months in 2017. When the threshold moved to 18 months in 2020, the excess mass migrated to the new threshold while diminishing at the old one, confirming that bunching is tax-motivated rather than driven by a reference-point at 24 months. This establishes that labor tax differentials shape not only the quantity of posting contracts (extensive margin) but also their length (intensive margin).

Q7. What are the main findings from the German minimum wage reform, and how do the heterogeneity tests strengthen identification?

Germany’s January 2015 introduction of a national minimum wage of €8.50 per hour (preceded by a sectoral minimum in meat processing in August 2014) raised wage costs for foreign firms providing non-construction services, but not for construction firms already covered by a higher sectoral minimum. Postings to Germany in manufacturing fell by approximately 60% relative to the construction (control) sector, implying a reduced-form elasticity of –1.34 (SE 0.43). Two heterogeneity tests reinforce identification: (i) within the treated German sector, posting declines are monotonically increasing in the degree to which the new minimum wage is binding in the origin country, with Luxembourg (where the minimum is non-binding) showing no statistically significant effect; (ii) the same industry-by-country comparison in Germany’s high-wage neighboring countries (which did not change minimum wage rules) yields placebo estimates statistically indistinguishable from zero. The reform raised wages for German workers by an average of 6% (and up to 10% for most affected workers) but automatically raised wages for posted workers by an average of 40%, doubling them for workers from the poorest sending countries.

Q8. How do the gravity model estimates compare to the reduced-form DiD estimates, and what explains the difference?

Across gravity specifications, model-implied elasticities range from –0.75 to –2.4. The preferred specification — PPML with pair fixed effects, destination-year fixed effects, and origin-year fixed effects — yields θ = –1.2 (SE 0.2). These estimates are systematically below the medium-run reduced-form DiD estimates because: (a) the gravity model uses nationwide average tax and minimum wage measures that introduce measurement error relative to the sector-specific reforms in the case studies; and (b) the gravity model captures year-to-year (short-run) adjustments, while the DiD designs compare outcomes several years before and after the reform, picking up longer-run equilibrium reallocation. The finding that responses grow over time mirrors evidence on dynamic adjustment in goods trade (Boehm, Levchenko and Pandalai-Nayar, 2023), and contradicts the conventional belief that fiscal devaluations boost exports only in the short run.

Q9. What does the gravity model reveal about trade in goods as a function of posting-specific wage costs?

When the same gravity specification is applied to bilateral goods trade rather than posting flows, posting-specific wage costs have a positive — not negative — coefficient on goods trade. This is inconsistent with a model where unobserved shocks affect all exports symmetrically, and instead suggests a small substitution effect: as the cost to import labor services rises (due to tighter posting regulations), countries substitute toward importing goods. For some activities (such as meat processing), importing finished goods is a partial substitute for importing labor services to produce on-site.

Q10. What are the Bolkestein Directive counterfactual implications, and how do they connect to the political economy evidence?

The Bolkestein Directive (proposed 2005) would have enforced a “country of origin principle,” exempting foreign posting firms from destination-country minimum wages. Using the preferred lower-bound elasticity from the gravity model (column 5, θ = –1.2) and an upper bound averaging gravity and DiD estimates, the paper predicts this would have at least doubled exports of labor services from Eastern European countries. Tax revenues collected on posted workers in origin countries would also double. However, average posted workers’ wages would fall by approximately 16%, as workers would lose their entitlement to destination-country minimum wages. The paper documents that the Bolkestein controversy — introduced to the EU Parliament in March 2005 and popularized via the “Polish plumber” trope — coincided with a sharp and permanent drop in French voter support for the EU constitutional treaty, which was subsequently rejected in referendum. This is consistent with Rodrik’s (1998) hypothesis that voters withdraw support for economic integration when comparative advantage appears to be based on institutional choices that conflict with importing countries’ social norms.

Q11. How does the paper handle the incidence of payroll taxes — does the canonical result that payroll taxes are fully passed through to workers hold in this context?

The canonical competitive labor market model predicts full pass-through of payroll taxes to workers’ net wages, leaving firms’ labor costs unchanged. The paper finds substantial trade responses to payroll tax reforms, inconsistent with full pass-through. Nominal rigidities — including binding minimum wages that constrain downward wage adjustment — help rationalize incomplete pass-through in the EU context. The paper estimates elasticities both with respect to statutory tax rates (the reduced-form, making no incidence assumption) and with respect to total wage costs (instrumented with the reform, allowing for gross wage responses). Wage data from Belgium show no distinguishable wage response to the Belgian tax cut, suggesting the incidence fell largely on firms’ costs rather than workers’ wages in that episode.

Q12. What do the destination-based taxation counterfactual (tax cooperation proposal) calculations show?

A proposal to shift all posting payroll taxation to destination-based rates would decrease posting exports from Eastern European countries by between 10% and 25%. Despite the volume reduction, total taxes collected on posted workers would still increase under this reform even when the upper-bound elasticity (approximately –3.7 with respect to total wage cost) is used, because a 1% increase in the payroll tax rate translates to a much smaller proportional increase in total wage cost.

Key Concepts

Posted workers / posting policy: Employees temporarily sent by their employer (the “exporting firm”) to perform a service contract in another EU member state. Posted workers maintain their employment contract with the firm in the origin country but physically work in the destination country. This creates a setting where competing domestic and foreign firms serve the same customers at the same location under different labor regulations.

Posting allowance: The additional wage component that exporting firms must pay to posted workers to satisfy the destination country’s minimum legal wage when that minimum exceeds the firm’s home-country wage level. The posting allowance is zero when the exporting country’s average wage already exceeds the destination minimum wage; it can be large for low-wage origin countries. The allowance enters directly into firms’ labor costs and is the minimum-wage channel of the paper’s labor cost formula.

Origin-based vs. destination-based payroll taxation: Under posting, payroll taxes are normally assessed in the country where the exporting firm is registered (origin-based), creating tax rate differentials between competing firms in the same job site. EU regulations convert payroll taxes to destination-based when posting contracts exceed a duration threshold, eliminating the tax advantage of lower-tax origin countries for those contracts. The 2010 EU regulation additionally imposed destination-based taxation on border-region temporary employment agencies.

Trade elasticity for physical services (θ): The structural parameter from the Eaton-Kortum (2002) gravity model that governs the elasticity of bilateral posting flows with respect to changes in firms’ total wage costs when exporting services from country i to country j. The paper’s preferred estimate is –1.2 (from gravity estimation) to approximately –1.3 to –1.5 (from reduced-form DiD designs), substantially smaller in absolute value than the goods trade elasticity (typically estimated around 5).

Social standards as comparative advantage: The paper uses “standards” to refer to countries’ domestic policy choices about payroll taxes (which finance social insurance programs) and minimum wages (which set worker protection floors). The paper demonstrates that these regulatory choices — distinct from productivity differences, factor abundance, or technology — create measurable cost advantages that shape specialization in labor-intensive service sectors. This is in contrast to “benign” sources of comparative advantage.

Bolkestein Directive / country of origin principle: A 2005 EU legislative proposal that would have required posting firms to operate under the laws of their home country when supplying services in other EU member states, eliminating the hard core of destination-country regulations (including minimum wages) that the 1996 Posted Workers Directive had imposed on foreign firms. The proposal was withdrawn after a wave of protests and its association with a sharp fall in French support for the EU constitutional treaty.

Bunching / notch at duration threshold: A behavioral response in which exporting firms strategically keep posting contract lengths below the duration threshold that triggers destination-based payroll taxation, generating an excess mass in the distribution of contract lengths just below the threshold. The paper uses this bunching, together with the movement of the threshold from 24 to 18 months in 2020, as additional evidence that payroll tax differentials affect the intensive margin of posting.

Is it AI or data that drives firm market power?

Mon, 01 Jan 0001 00:00:00 +0000

Labor Market Competition and the Assimilation of Immigrants

Mon, 01 Jan 0001 00:00:00 +0000

Labor Market Competition and the Assimilation of Immigrants

Research Question

Why have immigrant-native wage gaps widened substantially across arrival cohorts in the United States since the 1960s, and why has the speed of wage convergence slowed? The paper argues that the existing literature, which attributes these trends entirely to declining immigrant cohort quality, omits a critical general-equilibrium channel: labor market competition arising from imperfect substitutability between immigrants and natives. The paper quantifies how much of the observed deterioration in wage assimilation profiles can be attributed to (i) increasing immigrant cohort sizes raising labor market competition, (ii) secular shifts in relative skill demand, and (iii) genuine changes in immigrant cohort quality.

Data and Methodology

The analysis uses U.S. Census microdata for 1970, 1980, 1990, and 2000, combined with American Community Survey (ACS) data pooled for 2009–2011 (labeled 2010) and 2018–2019 (labeled 2020), all drawn from IPUMS-USA. The sample covers individuals aged 25–64 who are employed in the civilian sector, not self-employed, not in group quarters, and report positive earnings. Immigrant cohort sizes grew from approximately 800,000 individuals in the 1960s cohort to 2.3 million in the 1980s cohort and 4.6 million in the 2000s cohort.

The theoretical framework is a constant elasticity of substitution (CES) production function in which workers supply two types of skills: “general” skills portable across countries and “specific” skills particular to the host country (including language proficiency and knowledge of cultural and institutional environment). Immigrants arrive with the same general skills as observationally equivalent natives but only a fraction of their specific skills; they accumulate specific skills over time. Because immigrants disproportionately supply general skills upon arrival, increasing immigrant inflows raise the relative supply of general skills, depress the relative price of general skills, and thereby widen the immigrant-native wage gap. This mechanism operates only when immigrants and natives are imperfect substitutes (elasticity of substitution σ < ∞).

The model is estimated in two steps using nonlinear least squares (NLS). First, productivity factor parameters are estimated from native wages year by year, with state dummies identifying state-level skill prices. Second, specific skill accumulation parameters and the elasticity of substitution σ are jointly identified from immigrant wage differences across labor markets (defined as U.S. states) and over time. The demand shift parameter δ_t, which captures changes in the relative demand for specific skills (e.g., technology that favors communication over manual tasks), enters as a linear time trend in the baseline specification.

Main Findings with Quantitative Magnitudes

Competition effect: Immigration-induced increases in labor market competition explain 14.2, 43.9, and 40.8 percent of the increase in the initial wage gap of the 1970s, 1980s, and 1990s cohorts relative to the 1960s cohort, respectively. Averaged across all years spent in the United States, the competition effect alone accounts for 14.1, 22.4, and 20.4 percent — approximately one fifth overall.

Competition plus demand effect: Adding secular shifts in relative skill demand raises these figures to 24.8, 68.3, and 109.5 percent at arrival and 21.2, 33.6, and 36.4 percent averaged across years — approximately one third overall.

Elasticity of substitution: The baseline estimate of σ (elasticity of substitution between general and specific skills) is 0.020 (s.e. 0.002), implying an inverse elasticity of approximately 50.5. The relative supply of general skills increased by 1.67 log points between 1970 and 2020, producing a predicted increase in the relative price of specific skills of approximately 59.6 log points. The demand shift trend is estimated at 1.3 log points per year.

Cohort quality: Once competition and demand effects are netted out, the remaining deterioration in assimilation profiles is entirely attributable to observable changes in immigrants’ educational attainment and country-of-origin composition. Conditional on these two observable characteristics, unobservable skill quality improved across cohorts (consistent with English language proficiency trends), reversing the conventional narrative of declining cohort quality.

Specific skills gap at arrival: The 1960s cohort faced a specific skills gap of approximately 52.4 percent relative to native equivalents; this narrowed to 41.8 percent for the 1970s cohort, 35.6 percent for the 1980s cohort, and 17.6 percent for the 1990s cohort, conditional on origin and education. After 20–30 years, all cohorts reach 83.7–92.0 percent of their native counterparts’ specific skill levels.

Scope Conditions

The analysis focuses on employed men in the main text (women are analyzed in an Online Appendix, showing qualitatively similar but quantitatively smaller patterns).
Labor markets are defined at the U.S. state level in the baseline; robustness checks use state-education and state-gender cells.
The decomposition covers the period from the 1960s to the 1990s arrival cohorts.
Results are robust to corrections for selective outmigration, undercounting of undocumented immigrants, immigrant network effects, alternative demand shift specifications, alternative labor market definitions, and endogenous immigrant location choice (using shift-share instruments in the spirit of Card, 2001).

In depth

Q1. What is the core theoretical mechanism by which increasing immigrant inflows widen the immigrant-native wage gap?

A: Because immigrants disproportionately supply general (country-portable) skills upon arrival, while natives disproportionately supply specific (host-country) skills, an increase in immigrant inflows raises the ratio of general to specific skills in the economy. Under imperfect substitutability (σ < ∞), this lowers the relative price of general skills and raises the relative price of specific skills, thereby widening the wage gap between immigrants (who earn predominantly from general skills) and natives (who earn more from specific skills). The effect is larger in the early years after arrival when immigrants’ specific skill endowment s is small, and diminishes as immigrants accumulate specific skills over time.

Q2. How does the paper model immigrants’ skill accumulation, and how do accumulation profiles differ across groups?

A: Immigrants’ specific skill endowment s(·) upon arrival and over time is modeled as a flexible polynomial in years since migration, interacted with dummies for region of origin, education, cohort of entry, and potential experience abroad. Mexican high school dropouts (the reference group) are estimated to arrive with approximately 80 percent of the specific skills of equivalent natives. Immigrants from Latin America, Asia, and other regions arrive with lower specific skills than Western immigrants, who arrive near native parity. Higher-educated immigrants arrive relatively less similar to equivalently educated natives than low-educated immigrants, reflecting the greater importance of language-intensive skills in high-skill occupations. Conditional on origin and education, more recent cohorts arrive with narrower specific skill deficits: the 1990s cohort faces a gap of 17.6 percent at arrival compared to 52.4 percent for the 1960s cohort.

Q3. What are the estimated technology parameters, and how are they interpreted?

A: The elasticity of substitution between general and specific skills is estimated at σ = 0.020 (s.e. 0.002), with a confidence interval of [0.017, 0.024]. This implies an inverse elasticity of approximately 50.5, meaning a one percent increase in the relative supply of general skills raises the relative price of specific skills by about 50.5 percent. The implied elasticity of substitution between natives and immigrants (evaluated at market-level averages) is approximately 0.013 in 1990, 0.020 in 2000, and 0.025 in 2010 — in the same range as the Ottaviano and Peri (2012) benchmark of 0.034 (s.e. 0.008). The demand shift trend is estimated at δ̃ = 0.013 (s.e. 0.001) log points per year, reflecting secular increases in the relative demand for specific (host-country) skills.

Q4. How does the paper identify the elasticity of substitution σ and the skill accumulation parameters separately?

A: The estimation proceeds in two steps. First, productivity factor parameters (returns to education and experience) are estimated from native wage regressions, with state-year dummies absorbing state-specific skill prices. Second, skill accumulation parameters θ are identified from wage differences between immigrants with different characteristics working in the same labor market, while σ and the demand shift δ̃ are identified from variation in immigrant wage gaps across states (which have different immigrant population shares) and over time. Specifically, states with higher immigrant shares display lower relative prices of general skills, providing the identifying variation for σ.

Q5. What are the quantitative magnitudes of the competition effect for specific cohorts at different time horizons?

A: At the time of arrival, the competition effect explains 14.2 percent (1970s cohort), 43.9 percent (1980s cohort), and 40.8 percent (1990s cohort) of the increase in initial wage gaps relative to the 1960s cohort. After 10 years, these figures are 17.1, 22.7, and 22.2 percent respectively. After 20 years, they are 12.2, 16.9, and 16.2 percent. After 30 years, 10.9, 15.3, and 13.7 percent. The declining share across years reflects the fact that as immigrants accumulate specific skills, their wages become less sensitive to equilibrium skill prices. Averaged across all years since migration, the competition effect accounts for 14.1, 22.4, and 20.4 percent for the three cohorts.

Q6. How does labor market competition affect the speed of wage assimilation, and does it prevent full convergence?

A: The effect on assimilation speed is theoretically ambiguous and depends on whether future cohorts are larger or smaller than the reference cohort, and whether immigrants fully converge to native skill levels. In the stylized examples, a one-time permanent increase in competition raises both the initial wage gap and the speed of subsequent convergence (since the gap between immigrant and native skill levels is larger and therefore more responsive to changes in skill prices). However, continuous inflows of increasingly large cohorts counteract this speedup by continuously shifting the wage profile downward — the “dynamic competition effect.” For immigrants who fully converge (s → 1), competition delays but does not prevent convergence; for those who only partially converge (s → < 1), competition permanently widens the long-run wage gap. Quantitatively, the paper finds the effect on assimilation speed to be small in the full-sample decomposition.

Q7. What do the illustrative examples for specific immigrant groups reveal about heterogeneous competition effects?

A: For a Mexican male high school dropout (1960s cohort skills), facing the same competition level as the 1990s cohort would widen the initial wage gap by 10.2 log points; facing 2010 competition levels would widen it by 21.1 log points. However, because this group fully converges (s → 1), the effect dissipates entirely after approximately 25 years, and long-run wage assimilation is not prevented. For a Latin American male high school graduate who only partially converges (s → < 1), facing 1990s competition would widen the initial gap by 17.4 log points and leave a 3.8 log-point larger long-run wage gap. For a Western college graduate who arrives near native skill parity, competition effects are negligible throughout.

Q8. What are the changes in absolute wage gaps documented in the baseline data?

A: The 1960s cohort arrived with an initial wage gap of approximately 17.2 log points relative to natives. The 1970s cohort arrived with a gap of 30.1 log points, the 1980s cohort 29.2 log points, and the 1990s cohort 20.8 log points. Under the no-competition counterfactual, these initial gaps narrow to 13.6, 24.7, 20.3, and 15.7 log points respectively. Removing both competition and demand effects further narrows them to 13.7, 23.4, 17.5, and 13.3 log points.

Q9. What does the paper find about the role of observable versus unobservable immigrant quality?

A: Once competition and demand effects are accounted for, all remaining cohort differences in assimilation profiles are attributable to observable changes in immigrants’ educational attainment and country-of-origin composition. Conditional on these two observable characteristics, immigrants in more recent cohorts display higher levels of unobservable skills (smaller specific skill deficits conditional on origin and education), consistent with rising English language proficiency across cohorts. This reverses the standard interpretation that unobservable immigrant quality has declined.

Q10. How do aggregate skill supplies and relative skill prices evolve over the sample period?

A: Between 1970 and 2020, the total supply of general skills from immigrants grew by a factor of 16.3, while the supply of specific skills grew by a factor of 15.0. The resulting increase in the relative supply of general skills caused the relative price of general skills to fall from 0.89 to 0.38. Accounting for growing relative demand for specific skills (the δ_t trend), the ratio of relative skill prices fell further to 0.20 by 2020. At the state level, relative prices of general skills are well below 0.3 in high-immigration states like California, Florida, and New York, and approach 1.0 in states with low immigrant shares.

Q11. Are the results robust to selective outmigration, undocumented immigrants, and alternative specifications?

A: Yes. Across twelve robustness checks covering selective outmigration corrections (using Borjas and Bratsberg 1996 or Rho and Sanders 2021 outmigration rates, and synthetic cohort reweighting), undocumented immigrant undercounting corrections, immigrant network controls (share and stock of compatriots in the same state), alternative demand shift specifications (quadratic and time dummies), alternative labor market definitions (state-education and state-gender cells), and endogenous immigrant location choice (GMM with shift-share instruments), the estimated elasticity of substitution σ ranges from 0.017 to 0.033 and the average competition effects remain stable. Averaged across all robustness checks, competition effects are 1.3 log points (1960s cohort), 3.0 log points (1970s), 5.2 log points (1980s), and 4.3 log points (1990s), compared to baseline values of 1.4, 3.1, 5.5, and 4.6 log points.

Q12. What are the policy implications highlighted by the authors?

A: First, since assimilation and competition effects are intertwined, the wage impact of immigration on natives is intrinsically dynamic: newly arrived immigrants initially compete relatively little with natives but increasingly substitute for them as their specific skills grow. Second, labor market competition may reduce immigrants’ incentives to invest in host-country-specific skills, a channel not modeled in most existing structural models. Third, dispersal policies (such as those used during refugee crises) that reallocate immigrants across regions will affect local skill price ratios and therefore alter wage assimilation trajectories — a potentially unintended consequence of geographic allocation policies.

Key Concepts

General skills: Skills that are portable across countries and can be used productively in any labor market. In the paper’s framework, general skills are those required for tasks (such as manual or physical labor) that are similar across national contexts. Upon arrival, immigrants are assumed to supply the same amount of general skills as observationally equivalent natives, making immigrants’ relative supply of general skills high at arrival.

Specific skills (host-country-specific skills): Skills particular to the host country, including language proficiency (English in the U.S. context) as well as familiarity with the institutional and cultural environment. Immigrants arrive with only a fraction s of the specific skills of comparable natives; this fraction evolves over time as immigrants spend time in the host country. The level of specific skills governs how substitutable a given immigrant worker is with native workers.

Labor market competition effect: The mechanism by which increasing immigrant inflows affect relative wages through equilibrium changes in skill prices rather than through individual skill accumulation. When immigrants and natives are imperfect substitutes, rising immigrant inflows raise the relative supply of general skills, depress the relative price of general skills, and widen the immigrant-native wage gap. This effect is larger for recently arrived immigrants (small s) and diminishes as immigrants assimilate.

Dynamic competition effect: The combined effect on a given cohort’s observed assimilation profile of continuous, growing immigrant inflows over its time in the country. Unlike a one-time permanent increase in competition (which would raise both the initial gap and assimilation speed), continuously growing inflows both widen the initial gap and exert a continuous downward shift on the cohort’s wage profile, with an ambiguous net effect on the speed of convergence.

Demand shift (δ_t): A time-varying parameter in the CES production function capturing secular changes in the relative demand for specific versus general skills beyond what is explained by standard skill-biased technological change. A positive trend in δ_t (estimated at 1.3 log points per year in the baseline) reflects technological change that favors communication-intensive (specific-skill-intensive) tasks over manual (general-skill-intensive) tasks, and amplifies the competition effect.

Elasticity of substitution between general and specific skills (σ): The key technology parameter governing the degree of imperfect substitutability between natives and immigrants in equilibrium. Estimated at σ = 0.020 in the baseline. When σ = ∞, immigrants and natives are perfect substitutes and labor market competition has no effect on relative wages. As σ decreases, the competition effect on relative wages becomes stronger for a given change in relative skill supplies.

Specific skill accumulation function s(·): A flexible parametric function of years since migration, interacted with region of origin, education level, cohort of entry, and potential experience at arrival, that governs the rate at which immigrants acquire host-country-specific skills over time. The intercept of s(·) at arrival (relative to a native s = 1) measures the initial specific skill deficit; the polynomial in years since migration captures how quickly this deficit closes.

Wage assimilation profile: The trajectory of the immigrant-native log wage gap as a function of years spent in the host country, conditional on a cohort of arrival. The paper distinguishes between changes in the level of the profile (the initial wage gap) and changes in its slope (the speed of convergence), and decomposes both dimensions into competition effects, demand effects, and cohort quality effects.

Lender concentration of external debts and sudden stops

Mon, 01 Jan 0001 00:00:00 +0000

Layer 1 — Overview

Research Question

This paper studies how the lender structure of external debt — specifically, the degree to which a borrowing country’s external debt is concentrated among a small number of large lenders — affects open economies’ credit conditions, borrowing behavior, and the severity of sudden stops.

Core Mechanism

The paper argues that the pecuniary externality arising from collateral foreclosure can be internalized not only by borrowers (as in the standard Bianchi 2011 framework) but also by lenders. When a large lender holds a substantial share of total loans, it has an incentive to foreclose only partially on seized collateral. Selling foreclosed collateral injects asset supply and depresses the collateral price; a sufficiently large lender internalizes this price impact and therefore restrains foreclosure. Atomistic lenders, by contrast, take the collateral price as given and sell all seized collateral (foreclosure rate = 1). Consequently, concentrating external debt in fewer, larger lenders supports a higher collateral price during financial downturns. This higher collateral price raises borrowing capacity, weakens borrowers’ precautionary saving motive, and causes them to overborrow relative to the social optimum.

Empirical Evidence

Using FFIEC 009a data — quarterly exposure of individual U.S. banks to the external debts of other countries, covering 2003Q1–2022Q2 — the paper documents two new empirical facts. First, lender concentration of emerging countries’ external debt has been considerably higher than that of advanced countries since the Global Financial Crisis. The average difference in the mean top-3 lender concentration (LTop3) between emerging and advanced economies is 0.11 (= 0.93 − 0.82), with a t-statistic of 13.87. Second, higher lender concentration alleviates sudden stop events in terms of both current account reversal and the decline in asset price proxies. In a difference-in-differences specification interacting sudden stop indicators with lagged lender concentration, the coefficient on the interaction term is negative and statistically significant across all concentration measures. A one-standard-deviation increase in LTop3 (7.2 percentage points) results in a 2.6 percentage point reduction in current account-to-GDP reversal during sudden stops, constituting 7.5% of the overall sudden stop increase. Lender concentration also mitigates real effective exchange rate depreciation during sudden stops, consistent with the mechanism operating through the collateral price channel. Results hold when controlling for rollover risk motives.

Model

The model extends a standard small open economy DSGE framework (Bianchi 2011) by introducing one large lender who holds share eta of total loans and internalizes the pecuniary externality of collateral foreclosure, alongside atomistic lenders who hold share (1 − eta) and take the collateral price as given. When tradable endowment falls short of debt obligations (foreclosure state), lenders optimally choose their foreclosure rate: atomistic lenders set foreclosure rate = 1 (sell all seized collateral), while the large lender sets foreclosure rate < 1 (partial foreclosure to maintain the collateral price). Higher lender concentration (larger eta) leads to lower aggregate foreclosure, less collateral sold, a higher nontradable goods price, a higher borrowing capacity, more tradable consumption, and a weaker precautionary saving motive — generating overborrowing relative to the social planner’s allocation.

Two channels through which concentration affects overborrowing are identified: (1) a debt capacity channel, whereby concentration raises the nontradable price in foreclosure states and thereby increases borrowing capacity; and (2) an amplification channel, whereby concentration steepens the decline in nontradable price per unit fall in tradable consumption, amplifying the pecuniary externality that the social planner internalizes.

Quantitative Results (Calibrated to Argentina)

In the competitive equilibrium, agents encounter foreclosure with probability 2%, and the large lender sells two-thirds of seized collateral. The social planner’s allocation eliminates foreclosure entirely. The social planner’s allocation can be implemented via a state-dependent debt tax; the implied consumption-equivalent welfare gain is 0.78%. The pecuniary externality internalized by lenders is estimated to equal two-thirds of the externality internalized by borrowers. Overborrowing is increasing in lender concentration.

Optimal Lender Structure

When lender countries optimally choose their lender structure, they select further concentration relative to the baseline in order to gain higher foreclosure repayment. Under optimal lender structure, domestic agents consume and borrow more and encounter sudden stops with higher probability, but completely avoid foreclosure events. Borrower welfare improves by 0.1% in consumption-equivalent terms relative to the baseline competitive equilibrium. The paper concludes that managing lender structure benefits both sides of the international credit market, and notes that policies targeting creditor coordination — such as collective action clauses — may be insufficient to fully correct the efficiency implications of lender structure.

Key Implication

Because lender concentration alleviates crisis severity, emerging economies (which are documented to have substantially more concentrated lender structures than advanced economies) face a reduced precautionary saving motive and therefore tend to overborrow more than advanced economies, compounding their vulnerability to sudden stops.

In depth

Q1. What is the paper’s central departure from the Bianchi (2011) sudden stops framework?

The standard Bianchi (2011) model features atomistic lenders who take the collateral price as given, so the pecuniary externality of collateral fire-sales is internalized only by the borrower’s social planner. This paper introduces a large lender who holds a non-trivial share eta of total loans and therefore internalizes the price impact of selling foreclosed collateral. This creates a second source of pecuniary externality internalization — on the lender side — that is absent from the canonical framework.

Q2. Why do atomistic lenders sell all seized collateral, while the large lender does not?

Atomistic lenders take the collateral price as given and therefore face no downside from selling their entire share of seized collateral — they cannot individually affect the price. The large lender, holding share eta of total loans, recognizes that selling a large quantity of collateral depresses the nontradable goods price, which reduces the value of any remaining collateral claims. It therefore optimally sets foreclosure rate < 1, retaining some seized collateral to support the equilibrium price.

Q3. What are the two channels through which lender concentration amplifies overborrowing, and how do they differ?

The debt capacity channel operates in foreclosure states: higher concentration reduces foreclosure, raises the nontradable price, and increases the collateral value that backs borrowing. This directly expands the borrowing capacity available to agents and weakens their precautionary saving motive. The amplification channel operates through the slope of the nontradable price response: greater concentration steepens the decline in the nontradable price per unit fall in tradable consumption, which amplifies the pecuniary externality that the social planner internalizes. The two channels reinforce each other in driving overborrowing.

Q4. What empirical dataset is used, and what does it measure?

The paper uses FFIEC 009a data, which records the quarterly exposure of individual U.S. banks to the external debts of other countries, covering 2003Q1–2022Q2. From these data, the paper constructs lender concentration measures — including LTop3, the combined share of the top three lenders — at the borrowing-country level for each quarter.

Q5. What is the quantitative magnitude of the lender concentration gap between emerging and advanced economies?

The average difference in mean top-3 lender concentration (LTop3) between emerging countries and advanced countries is 0.11 (= 0.93 − 0.82), and this difference is highly statistically significant, with a t-statistic of 13.87. This gap emerged and persisted notably since the Global Financial Crisis.

Q6. How does lender concentration affect sudden stop severity in the empirical specification, and how large is the effect?

The paper estimates a difference-in-differences specification in which current account reversal (and other sudden stop outcome variables) is regressed on a sudden stop indicator, lagged lender concentration, and their interaction, with country and time fixed effects. The coefficient on the interaction term is negative and statistically significant across all concentration measures. A one-standard-deviation increase in LTop3 (7.2 percentage points) reduces current account-to-GDP reversal by 2.6 percentage points, which corresponds to 7.5% of the overall increase in the current account during a sudden stop episode.

Q7. Does higher lender concentration also mitigate exchange rate and asset price pressures during sudden stops?

Yes. Lender concentration is also found to mitigate real effective exchange rate depreciation during sudden stops, which is consistent with the model’s proposed mechanism: higher concentration supports the collateral (nontradable goods) price, which in turn limits the depreciation of the real exchange rate. The paper reports results on asset price proxy declines as well.

Q8. What is the welfare cost of overborrowing under the baseline calibration to Argentina?

The social planner’s allocation, implemented by a state-dependent debt tax, delivers a consumption-equivalent welfare gain of 0.78% relative to the competitive equilibrium. This measures the efficiency cost of overborrowing under the calibrated model in which the large lender sells two-thirds of seized collateral and competitive equilibrium agents encounter foreclosure with probability 2%.

Q9. How large is the lender-side pecuniary externality relative to the borrower-side externality?

Under the baseline calibration, the pecuniary externality internalized by lenders is estimated to be two-thirds of the externality internalized by borrowers. This is described as a “plausible parameterization,” meaning that lender-side internalization of the externality is quantitatively substantial relative to the classic borrower-side effect.

Q10. What does the optimal lender structure exercise find, and what does it imply for welfare?

When lender countries are allowed to optimally choose lender structure, they select a more concentrated structure than the baseline in order to maximize foreclosure repayment. Under this optimal structure, domestic (borrowing-country) agents consume and borrow more, face sudden stops with higher probability, but completely avoid foreclosure events. Borrower welfare improves by 0.1% in consumption-equivalent terms relative to the baseline competitive equilibrium. This implies that concentrating lender structure can be mutually beneficial for both sides of the international credit market.

Q11. Why might collective action clauses be insufficient to correct the efficiency implications of lender structure?

Collective action clauses are policies designed to improve creditor coordination in sovereign debt restructuring. The paper argues that the efficiency distortions arising from lender structure go beyond pure coordination failures: because a concentrated lender structure generates welfare-relevant pecuniary externalities through the collateral price channel — affecting overborrowing and crisis severity — addressing creditor coordination alone is insufficient to fully resolve these inefficiencies.

Key Concepts

Lender concentration (LTop3): The combined loan share held by the top three lenders in a borrowing country’s external debt. Measured using FFIEC 009a data. Used as the primary empirical proxy for the degree to which external debt is concentrated in a few large creditors rather than dispersed among many atomistic lenders.

Pecuniary externality (lender-side): The price impact that a large lender imposes on the collateral market when selling foreclosed assets. Unlike in the standard Bianchi (2011) framework where only borrowers (via the social planner) internalize this externality, a sufficiently large lender also internalizes it by restraining collateral sales to support the collateral price.

Foreclosure rate (zeta): The fraction of seized collateral that a lender sells after foreclosure. Atomistic lenders set zeta = 1 (sell everything); the large lender sets zeta < 1 (partial foreclosure) to prevent collateral price depression. The aggregate foreclosure rate is a weighted average across lender types.

Overborrowing: Borrowing in excess of the social planner’s optimal level, arising because competitive equilibrium agents do not internalize the pecuniary externality of their borrowing on the collateral price. In this model, overborrowing is increasing in lender concentration because a more concentrated lender structure supports a higher collateral price, reducing precautionary saving.

Sudden stop: An abrupt reversal of capital inflows to an emerging economy, typically associated with a sharp current account reversal, real exchange rate depreciation, and a decline in asset prices. In the model, sudden stops are associated with foreclosure states in which tradable endowment falls short of debt obligations.

Debt capacity channel: The mechanism by which higher lender concentration raises the nontradable goods price in foreclosure states, thereby increasing the collateral value and expanding agents’ borrowing capacity, which weakens the precautionary saving motive.

Amplification channel: The mechanism by which higher lender concentration steepens the slope of the nontradable price response to a fall in tradable consumption, amplifying the magnitude of the pecuniary externality that the social planner internalizes and thus increasing the social planner’s incentive to restrict borrowing.

Leveraging Virtual Contact and Social Networks to Foster Interethnic Harmony

Mon, 01 Jan 0001 00:00:00 +0000

This paper investigates whether virtual contact — exposure to an outgroup through a documentary film — can promote interethnic harmony, and whether targeting network-central individuals amplifies effects on untreated community members. The study addresses a context of deep, historically rooted discrimination: the Santal ethnic minority in northwestern Bangladesh have faced colonial-era land dispossession, ongoing violence, labor market discrimination, and structural exclusion by the Bengali ethnic majority. The Santals are the second-largest ethnic-minority group in Bangladesh; in the study villages, their share ranges from 13% to 83% of the population.

The authors conducted a cluster-randomized field experiment across 121 multiethnic villages in the Rajshahi and Naogaon districts of Bangladesh, involving over 3,300 households. Villages were randomly assigned to three arms: a random treatment arm (RR, 40 villages, N=562 Bengalis) in which approximately 14 randomly selected ethnic-majority households per village watched a 45-minute documentary film (“Ami Santal” / “I Am Santal”) portraying Santal culture, economic hardships, and aspirations; a central treatment arm (41 villages) in which approximately 7 randomly selected Bengalis (RC) and 7 network-central Bengalis identified via a diffusion-centrality nomination exercise (CC) watched the same film; and a control arm (40 villages) in which households watched a placebo documentary on flower farming. The documentary, costing approximately $13 per participant, was screened individually at participants’ homes on tablets. Data were collected at baseline (September–October 2022), first end line approximately 3 months post-screening (February–March 2023), and a casual-work field experiment second end line approximately 4.5–5 months post-screening (April–May 2023). Outcomes were measured via lab-in-the-field experiments (dictator game, solidarity game), an experimentally validated interethnic trust survey item (Falk et al. 2018), self-reported behaviors, administrative police complaint data, and facial emotion detection during screening.

The main findings are as follows. First, treated Bengalis in the central arm (RC) gave 14.7% more in the dictator game (p < .01) and exhibited 21.7% greater trust toward Santals (p < .01) compared to controls; RR participants showed a 7.1% increase in solidarity game giving (p < .10) and 11.8% greater trust (p < .01). Effects on reducing negative stereotypes and discriminatory opinions were not statistically significant, suggesting that affective components of prejudice are more responsive to the intervention than cognitive components. About 82% of treated Bengalis reported acquiring new information about Santals, primarily regarding occupational struggles, educational aspirations, and economic potential. Facial expression analysis using emotion-detection software found sadness to be significantly more prevalent among viewers (p < .05), particularly among network-central participants, consistent with an empathetic response.

Second, untreated Bengalis in the central arm — who never watched the documentary — showed 20.9% higher altruism (p < .10), 27.3% higher solidarity (p < .05), and 8.1% higher trust (p < .05) toward Santals relative to controls. No significant effects on untreated Bengalis were found in the random arm. Untreated Santals in both arms exhibited greater trust toward Bengalis (11% increase in random arm, p < .05; 21.7% increase in central arm, p < .01) and higher subjective well-being (p < .01 in both arms). Village-level administrative data show a significant reduction in Bengali police complaints against Santals post-intervention (p < .05), but only in the central arm.

Third, in the casual-work field experiment, multiethnic pairs jointly produced paper bags under piece-rate compensation. Overall productivity increased approximately 5% (p < .05) in the central arm only. Both Bengali and Santal workers increased productivity specifically in the finisher role — the most critical role for determining earnings — in the central arm. The authors interpret Bengali productivity gains as reflecting increased prosociality toward Santal co-workers, and Santal productivity gains as reflecting conformism or peer pressure in response to Bengali effort. The scope of all effects is limited to multiethnic villages in northwestern Bangladesh, a context of historically severe and ongoing majority-minority inequality; the intervention deliberately did not challenge the socioeconomic hierarchy of the villages.

Q: What was the documentary film’s content and design rationale? A: The 45-minute film “Ami Santal” featured three narrative layers: Santal culture (rituals, cuisine, the Baha festival), economic hardships (housing, water access, low incomes, labor market struggles, educational barriers), and aspirational stories of Santals who achieved success. All stories were narrated by non-actor local Santals, filmed outside the study region, and deliberately avoided attributing blame to Bengalis. The film was designed under the supervision of anthropologists at the University of Rajshahi to maintain ethnographic authenticity and a non-moralistic, observational tone (moral judgment language was much lower than in comparison Bangladeshi documentaries and general films, per LIWC-22 analysis).

Q: How were network-central individuals identified and why might targeting them matter? A: In central-arm villages, enumerators surveyed approximately 18–20 randomly selected passers-by at village markets and asked them to nominate the 15 people most effective at disseminating information. The seven most consistently and highly ranked individuals per village were selected as network-central (CC). These individuals were expected to have high diffusion centrality — meaning information they receive spreads widely — so targeting them with the documentary could shift attitudes and behavior among untreated community members through persuasion, visibility, credibility, or diffusion (the paper cannot separately identify which mechanism operates).

Q: What were the primary behavioral effects on treated Bengalis (the ethnic majority who watched the film)? A: Randomly selected participants in the central arm (RC) gave 14.7% more in the dictator game (p < .01) and 8% more in the solidarity game (not statistically significant), and exhibited 21.7% greater trust toward Santals (p < .01), all relative to controls. In the random arm (RR), participants showed a 6.4% increase in dictator game giving (not statistically significant), a 7.1% increase in solidarity game giving (p < .10), and 11.8% greater trust toward Santals (p < .01). Effects on self-reported behaviors — interethnic friendships, social interactions, amount charged to minorities for water — were not statistically significant.

Q: Did the intervention change Bengali stereotypes or discriminatory opinions toward Santals? A: No. Despite treated Bengalis acquiring substantial new information (approximately 82% reported learning new things, primarily about Santal occupational struggles and educational aspirations), the authors find no significant effects on the stereotypes index or the discriminatory-opinions index among treated Bengalis. They propose two explanations: cognitive components of prejudice (stereotypes) are harder to change through indirect contact than affective components (emotions, prosocial behavior), consistent with Tropp and Pettigrew (2005) and Turner, Crisp, and Lambert (2007); and a single documentary may be insufficient to counter deeply ingrained generational biases due to resistance to change.

Q: What emotional responses did the documentary elicit, and how was this measured? A: Field assistants took candid photographs of participants’ faces at a random point during the screening; these were analyzed using Emotimeter software (machine learning-based emotion detection) that assigns scores across seven emotion categories summing to 100%. Sadness was significantly more prevalent among documentary viewers compared to placebo viewers (p < .05), particularly among network-central participants (CC). The authors interpret this as consistent with an empathetic response to the film’s content about Santal hardships, and connect it to increased prosocial behavior via emotion-regulation mechanisms (alleviating sadness through prosocial action).

Q: What were the spillover effects on untreated Bengalis in the central arm? A: Untreated Bengalis in central-arm villages — who never watched the documentary — showed 20.9% higher altruism (p < .10), 27.3% higher solidarity (p < .05), and 8.1% higher trust toward Santals (p < .05) relative to controls. By contrast, untreated Bengalis in random-arm villages showed no statistically significant effects on any of these outcomes. The authors attribute the central-arm spillovers to the presence of network-central individuals being treated in those villages, though whether these patterns reflect persuasion, visibility, credibility, or information diffusion cannot be separately identified.

Q: How did the intervention affect the Santal ethnic minority (who never watched the documentary)? A: Untreated Santals in both arms exhibited greater trust toward Bengalis: an 11% increase in the random arm (p < .05) and a 21.7% increase in the central arm (p < .01) compared to controls. Santals in both arms also reported higher subjective well-being (p < .01). A weakly significant increase in food security was observed among Santals in the central arm (p < .10), possibly reflecting increased material support from Bengalis. No statistically significant effects were found on Santal altruism or solidarity.

Q: What did the village-level administrative complaint data show? A: Using data collected from two police stations covering all 121 villages, the authors find a significant reduction in Bengali complaints against Santals post-intervention in the central arm (p < .05). No significant reduction was found in Santals’ complaints against Bengalis (p > .10) in any arm. Data from village counselors’ offices (shalish arbitration complaints) showed no significant change in any arm. The distinction matters because police complaints involve more serious, violent matters, while village-counselor complaints involve routine arbitration.

Q: How was the casual-work field experiment designed, and what did it find? A: Approximately 4.5 months after the documentary screenings, 720 participants (360 Bengalis, 360 Santals) drawn equally from the three study arms were paired into multiethnic dyads to jointly produce paper bags for a local supplier under piece-rate compensation, with earnings split equally. One worker was randomly assigned the preparer role and the other the finisher role; roles were switched halfway through the three-hour session. The paper finds an approximately 5% overall productivity increase (p < .05) in the central arm only, concentrated in the finisher role (the role most critical for final output). Bengalis and Santals both increased productivity specifically as finishers in the central arm.

Q: What mechanisms explain the productivity effects in the casual-work experiment? A: For Bengali finishers, the productivity gain is interpreted as prosocial behavior: treated Bengalis who showed greater altruism toward Santals worked harder to increase the earnings of their Santal co-workers. For Santal finishers, the productivity gain is interpreted as conformism or peer pressure: Santals increased effort more when they worked as finisher after swapping roles (i.e., after observing Bengalis’ higher effort as finisher first), suggesting responsiveness to the higher productivity of Bengalis rather than an independent prosocial motivation. The authors present a simple theoretical model to formalize these interpretations, citing Rotemberg (1994) on prosocial effort and Kandel and Lazear (1992) and Mas and Moretti (2009) on peer pressure mechanisms.

Q: Why was virtual rather than direct contact used in this intervention? A: The authors argue that encouraging direct contact between Bengalis and Santals in this setting carries specific risks: the unequal status of the groups may generate anxiety during interactions, potentially limiting engagement or provoking backlash. By contrast, the documentary provides an indirect, low-cost ($13 per participant) form of contact that presents Santal lives without disrupting the socioeconomic hierarchy of the villages and without attributing blame to Bengalis. The film’s entertaining veneer and emotional storytelling make it more scalable and logistically feasible in contexts where direct contact is socially difficult or impractical.

Q: What are the primary limitations acknowledged by the authors? A: The authors acknowledge that the study’s sampling protocol relied on a door-to-door skip procedure without systematic records of approached households, raising the possibility of convenience or snowball-type recruitment and potential deviations from random sampling — this is reflected in some imbalances in baseline characteristics across arms. CC-control comparisons are explicitly descriptive (not causal) because network-central individuals were selected on centrality. Differential attrition was found among untreated Santals (both treatment arms had significantly lower attrition than control, p < .05), which could bias estimates for that subgroup. The authors cannot separately identify the mechanisms (persuasion, visibility, credibility, diffusion) underlying spillover effects in central villages.

Q: What are the policy implications of this study? A: The findings suggest that media-based virtual contact interventions are a low-cost, scalable tool for improving interethnic prosociality even in contexts of deep-rooted discrimination where direct contact may be socially impractical. Targeting network-central individuals — identified via a simple nomination exercise requiring no pre-existing network data — amplifies village-wide effects, including among untreated community members and the minority group itself. The productivity gains in multiethnic work teams imply that improved interethnic relations can have tangible economic consequences beyond attitudinal change. However, the null effects on stereotypes and discriminatory opinions suggest that single documentary interventions may not be sufficient to alter deep-seated cognitive biases, and more intensive or repeated interventions may be needed to achieve durable attitude change.

Virtual contact: Indirect exposure to an ethnic outgroup through a documentary film, as distinct from direct intergroup contact; posited to influence majority-group attitudes and behavior by increasing empathy and identification with the outgroup without requiring face-to-face interaction.

Diffusion centrality: A network measure of how effectively an individual can spread information through a community, operationalized via a nomination exercise in which community members identify those best positioned to disseminate information; used to select the seven highest-ranked individuals per village for targeted treatment.

Prosociality (altruism and solidarity): Measured using incentivized lab-in-the-field games — the dictator game (unilateral allocation of an endowment to a passive outgroup recipient) and the solidarity game (precommitted transfers to an outgroup member who may incur a random loss) — capturing willingness to benefit non-coethnic others at personal cost.

Affective versus cognitive components of prejudice: A distinction between emotional aspects of prejudice (feelings, empathy) — which the authors find to be more responsive to the documentary intervention — and cognitive aspects (negative stereotypes, discriminatory opinions) — which show no significant change despite new information acquisition.

Spillover effects (untreated individuals): Changes in behavior or attitudes among community members who did not directly receive the intervention (did not watch the documentary), attributed to the influence of treated individuals in their village, particularly network-central individuals in the central arm.

Piece-rate casual-work field experiment: A second end line in which multiethnic pairs of Bengali and Santal workers jointly produced paper bags for a local supplier, with individual earnings determined by joint piece-rate output; designed to measure whether improved interethnic attitudes translated into higher workplace productivity in ethnically mixed teams.

Source text origin: The provenance classification of the text used to generate a paper summary (full PDF, open-access HTML, or abstract only); the paper’s pipeline rules impose a hard block on abstract-only summarization.

Linking Social and Personal Preferences: Theory and Experiment

Mon, 01 Jan 0001 00:00:00 +0000

This paper asks whether an individual’s attitude toward risk in the personal domain (choices affecting only oneself) can be linked to that same individual’s attitude toward risk in the social domain (choices affecting both oneself and others). The authors provide a theoretical answer in the form of necessary and sufficient conditions, and then test those conditions experimentally.

The formal model posits a decision maker (DM) with a preference relation over lotteries on a set of social states, where a distinguished subset of states are personal (consequences for the DM alone). The authors assume preferences satisfy Completeness, Transitivity, Continuity, and State Monotonicity — the last being equivalent to respect for First-Order Stochastic Dominance (FOSD), a condition weaker than the Expected Utility Independence Axiom and satisfied by virtually all extant decision theories including Weighted Expected Utility, Rank-Dependent Utility, and Prospect Theory. The key theoretical result (Theorem 1) establishes that the full preference relation over all social lotteries can be uniquely deduced from the partial observations of (i) riskless social choices and (ii) risky personal choices if and only if the DM finds every social state indifferent to some personal state. When this condition fails, there exist social lotteries whose ranking cannot be recovered from the partial data.

For two empirically relevant preference types, this condition generates directly testable predictions: for selfish subjects (who allocate nothing to others in deterministic social choices), risky personal preferences must coincide with risky social preferences; for impartial subjects (who treat self and other symmetrically in deterministic social choices), riskless social preferences must coincide with risky social preferences.

The experiment was conducted at the University of Bergen and NHH Norwegian School of Economics with 276 undergraduate subjects. Each subject faced 50 budget-line choice problems in each of three domains: Personal Risk (equiprobable binary lotteries over own payoffs only), Social Choice (deterministic splits between self and an anonymous other), and Social Risk (equiprobable binary lotteries over symmetric payout pairs for self and other). The graphical interface of Choi et al. (2007b) was used throughout. One randomly selected decision per domain was paid out; each token was worth 1.2 NOK (approximately 0.2 USD), with average earnings of approximately 270 NOK.

Within-domain consistency, measured by the Critical Cost Efficiency Index (CCEI), is high: mean CCEIs are 0.959, 0.952, and 0.902 in the Personal Risk, Social Choice, and Social Risk domains respectively. At the CCEI > 0.90 threshold, 89.9%, 85.9%, and 69.9% of subjects pass in the three domains. Using a 0.95 share-to-self threshold, 103 subjects (37.3%) are classified as selfish; using revealed-preference criteria at the 5% significance level, 33 subjects (12.0%) are classified as impartial.

Testing is done via an individual-level nonparametric permutation test that draws 10,000 random data sets per subject and compares simulated CCEI distributions to actual cross-domain CCEIs, with Bonferroni correction. At the 1% significance level, the null that Personal Risk and Social Risk preferences coincide is rejected for only 5.9%–9.3% of selfish subjects (varying by classification threshold), compared with 14.7%–16.3% rejection rates for non-selfish subjects. For impartial subjects at the 1% level, the null that Social Choice and Social Risk preferences coincide is rejected for 0.0%–11.1%, compared with 19.8%–26.8% for non-impartial subjects. The theory’s predictions are thus supported for a large majority of both selfish and impartial subjects.

A theoretical extension (Theorem 2) shows that if one additionally observes comparisons between social states and personal lotteries, unique deduction of the full preference relation requires that preferences in both personal and social domains satisfy Expected Utility (Independence Axiom) and that every social state is indifferent to some personal lottery — a strictly stronger set of conditions.

Q: What is the central theoretical question and why does it matter? A: The paper asks whether preferences over risky social choices (lotteries over outcomes for self and others) can be deduced from observing only riskless social choices and risky personal choices. This matters because people frequently observe or predict the risky social choices of leaders and representatives, but may have access only to those leaders’ personal risk-taking behavior and their expressed social preferences under certainty.

Q: What is the main theoretical result (Theorem 1)? A: Under Completeness, Transitivity, Continuity, and State Monotonicity, the unique extension of the partial preference relation (over social states and personal lotteries) to the full domain of social lotteries exists if and only if every social state is indifferent to some personal state. When this condition is not met, multiple distinct preference relations can extend the partial observations, making deduction impossible.

Q: What is State Monotonicity and how does it relate to standard axioms? A: State Monotonicity requires that if each social state in one lottery dominates the corresponding state in another lottery, then the first lottery is weakly preferred. The paper shows this is equivalent to respect for First-Order Stochastic Dominance (FOSD) given the other axioms, and is strictly weaker than the von Neumann–Morgenstern Independence Axiom. It is satisfied by Weighted Expected Utility, Rank-Dependent Utility, and Prospect Theory, making it a broadly applicable assumption.

Q: What are the testable predictions for selfish subjects? A: Proposition 2 establishes that if a subject’s Social Choice preferences are selfish — meaning any bundle (x, y) is indifferent to (0, y), so the subject is indifferent between keeping x for self and giving it to other — then preferences in the Personal Risk domain must coincide with preferences in the Social Risk domain. In the experiment, selfish subjects are those allocating more than 95% of tokens to themselves in the Social Choice domain (103 of 276 subjects, or 37.3%).

Q: What are the testable predictions for impartial subjects? A: Proposition 3 establishes that if a subject’s Social Choice preferences are symmetric — meaning (x, y) is indifferent to (y, x) for all pairs — then preferences in the Social Choice domain must coincide with preferences in the Social Risk domain, implying risk neutrality toward social lotteries. The intuition is that such a subject treats self and other identically, so risky splits are evaluated by expected value alone. In the experiment, 33 subjects (12.0%) are classified as impartial by the revealed-preference criterion at the 5% significance level.

Q: How does the experiment measure within-domain rationality? A: Choices within each domain are evaluated using the Critical Cost Efficiency Index (CCEI, following Afriat 1967), which measures how much a budget constraint must be relaxed to remove all GARP violations. Mean CCEIs are 0.959 (Personal Risk), 0.952 (Social Choice), and 0.902 (Social Risk). At the CCEI > 0.90 threshold, 248 subjects (89.9%), 237 (85.9%), and 193 (69.9%) pass in the three domains respectively, compared to a simulated mean CCEI of only 0.585 for subjects randomizing uniformly.

Q: How does the cross-domain test work and why is it nonparametric? A: The test uses individual-level permutation inference: under the null that preferences in domains I and J are identical, any 50-element subset drawn from the pooled 100 choices should satisfy GARP as well as the actual domain-specific choices. For each subject, 10,000 such random draws are generated, their CCEI scores are computed, and the distribution is compared to the actual cross-domain CCEI with Bonferroni correction. The test makes no functional form assumptions about utility and accommodates the observed within-domain errors without parametric error modeling.

Q: What are the rejection rates for the selfish-subject prediction? A: At the 1% significance level, the null that Personal Risk and Social Risk preferences coincide is rejected for only 5.9%–9.3% of selfish subjects (range across four classification thresholds from 0.99 to 0.90 share-to-self), compared to 14.7%–16.3% for non-selfish subjects. At the 5% level, rejection rates rise to 20.4%–25.6% for selfish and 22.4%–31.8% for non-selfish subjects.

Q: What are the rejection rates for the impartial-subject prediction? A: At the 1% significance level, the null that Social Choice and Social Risk preferences coincide is rejected for 0.0%–11.1% of impartial subjects (range depending on threshold and classification method), compared to 19.8%–26.8% for non-impartial subjects. At the 5% and 10% levels, rejection rates for impartial subjects range from 0.0% to 22.2%.

Q: Does the theory predict how risk aversion should map across domains for non-selfish, non-impartial subjects? A: The theory does not directly produce testable cross-domain predictions for subjects who are neither selfish nor impartial without additional parametric assumptions, because the specific personal-state equivalent of each social state depends on the form of preferences. The paper restricts its nonparametric tests to the two polar cases where the equivalence mapping is determinate from social choice behavior alone.

Q: What is the extended result (Theorem 2) and what stronger conditions does it require? A: When one additionally observes comparisons between social states and personal lotteries (not just within each domain separately), unique deduction of the full preference relation is possible if and only if preferences in both the personal and social domains are consistent with an Expected Utility representation and every social state is indifferent to some personal lottery. This requires the Independence Axiom — a strictly stronger condition than State Monotonicity — highlighting that the main Theorem 1 result exploits the weaker observational structure.

Q: What is the distribution of social preferences in the sample? A: Of 276 subjects, 103 (37.3%) are classified as selfish at the 0.95 share-to-self threshold. Only 6 subjects (2.2%) kept fewer than 0.45 of tokens on average, making purely altruistic subjects rare. In the Personal Risk domain, 41 subjects (14.9%) allocated more than 95% to the cheaper account (consistent with risk neutrality), while 9 (3.3%) allocated fewer than 55% (consistent with infinite risk aversion). In the Social Risk domain, 30 subjects (10.9%) are consistent with utilitarianism in money and 9 (3.3%) with Rawlsianism in money.

Q: How does the Social Risk domain compare to the Personal Risk and Social Choice domains in terms of rationality scores? A: The Social Risk domain shows lower consistency than the other two: mean CCEI is 0.902 versus 0.959 and 0.952, and only 69.9% of subjects exceed the 0.90 threshold versus 89.9% and 85.9%. The CCEI distribution is shifted left for Social Risk, suggesting the novel combined dimension of social and risky choice introduces more decision complexity or error.

Q: What is the relationship to the prior experimental literature on social and risk preferences? A: The Personal Risk domain replicates the symmetric risk experiment of Choi et al. (2007a), and the Social Choice domain replicates the linear two-person dictator experiment of Fisman et al. (2007). The Social Risk domain is new to this paper. The theoretical framework connects to Saito (2013) on social preferences under risk, and to the preference extension literature of Grant et al. (1992) and Nishimura et al. (2017).

State Monotonicity: The axiom requiring that if each social state in one lottery weakly dominates the corresponding social state in another lottery, the first lottery is weakly preferred. The paper proves this is equivalent to respect for First-Order Stochastic Dominance given Completeness, Transitivity, and Continuity, and distinguishes it from the stronger Independence Axiom by noting that Independence compares lotteries over lotteries while State Monotonicity only compares lotteries over states.

Selfish preferences (in the paper’s sense): Preferences in the Social Choice domain such that (x, y) is indifferent to (0, y) for all bundles — the subject is indifferent between receiving x themselves versus giving x to the other person. Operationally measured as allocating more than a threshold share (e.g., 95%) of tokens to self across Social Choice decisions.

Impartial preferences (in the paper’s sense): Preferences in the Social Choice domain such that (x, y) is indifferent to (y, x) for all bundles — the subject treats self and other symmetrically. Operationally identified by the revealed preference criterion that choices in the Social Choice domain satisfy GARP and are consistent with symmetric treatment.

Unique extension (deducibility): The property that there exists exactly one complete preference relation over all social lotteries that is consistent with the axioms and agrees with the observed partial relation over social states and personal lotteries. Theorem 1 identifies the necessary and sufficient condition for unique extension under State Monotonicity.

Personal state indifference condition: The condition that for every social state omega in Omega minus P, there exists some personal state in P to which the DM is indifferent. This is the necessary and sufficient condition in Theorem 1 for deducibility of the full preference relation. Interpreted as: for every proposed social allocation, there exists a “bribe” — a personal allocation with nothing for others — that the DM finds equally desirable.

Critical Cost Efficiency Index (CCEI): A measure of how much budget constraints must be scaled down to eliminate all GARP violations in a dataset of choices from budget lines (following Afriat 1967). A CCEI of 1 indicates perfect rationality; the paper uses 0.90 as a practical threshold. Mean values are 0.959, 0.952, and 0.902 in the Personal Risk, Social Choice, and Social Risk domains respectively.

Nonparametric permutation test: The individual-level test used to assess consistency across choice domains. Under the null that preferences are identical in domains I and J, any random 50-element draw from the pooled 100 choices should achieve CCEI scores no worse than the actual domain scores. The test draws 10,000 permuted datasets per subject and uses the Bonferroni correction for multiple comparisons, making no assumptions about the functional form of utility.

Lives Versus Livelihoods: The Impact of the Great Recession on Mortality and Welfare

Mon, 01 Jan 0001 00:00:00 +0000

Overview

Research Question. Does the Great Recession reduce or increase mortality, and what are the welfare implications of incorporating recession-induced mortality changes into standard macroeconomic welfare frameworks?

Setting and Identification. The authors exploit spatial variation in the severity of the 2007–2009 Great Recession across 741 U.S. Commuting Zones (CZs), following the empirical design of Yagan (2019). The primary shock variable is the percentage-point change in the CZ unemployment rate between 2007 and 2009. The key identifying assumption is that no concurrent shocks to mortality coincide with the timing and geographic pattern of the Great Recession shock. Pre-trend evidence supports this: CZs subsequently harder hit experienced a slight relative increase in mortality before 2007, which is the opposite sign from the main effect, supporting the validity of the design.

Data. Mortality data come from CDC restricted-use death certificate microdata (2003–2016) covering the universe of U.S. deaths, combined with SEER population denominators. A 20 percent random sample of Medicare enrollees aged 65–99 provides an individual-level panel that directly addresses concerns about endogenous migration. The main outcome is the log age-adjusted CZ mortality rate; economic indicators come from BLS, BEA, and FHFA; air pollution data from the EPA AQS monitor network (PM2.5); morbidity from the BRFSS; nursing home characteristics from federal certification inspections.

Main Mortality Finding. A one-percentage-point increase in the local unemployment rate between 2007 and 2009 is associated with a 0.50 percent decline (SE = 0.15) in the annual age-adjusted mortality rate in 2007–2009, and a 0.58 percent decline (SE = 0.34) in 2010–2016; the two periods are statistically indistinguishable (p = 0.78). Because the national average unemployment rate rose by 4.6 percentage points, the Great Recession on average reduced the annual age-adjusted mortality rate by approximately 2.3 percent, with effects persisting for at least 10 years. The authors note this is equivalent to approximately two years of secular mortality improvement at the pre-recession trend pace of 1.1 percent per year. For a 55-year-old, the estimates imply that 1 in 25 gained an extra year of life from a shock of this magnitude.

Heterogeneity by Cause of Death. Mortality declines appear across most major causes. Cardiovascular disease (34 percent of 2006 deaths) declines by 0.65 percent per percentage-point unemployment increase (SE = 0.21) and accounts for approximately 48 percent of the total estimated mortality reduction. Motor vehicle mortality falls by 1.7 percent (SE = 0.56) and liver disease by 1.1 percent (SE = 0.43). Suicides show a statistically significant 1.7 percent decline (SE = 0.5) in the 2010–2016 period. The notable exception is cancer (the second-largest cause of death), for which the estimated effect is a precise null of 0.02 percent (SE = 0.11). The null cancer result is interpreted as a specification check: if mortality declines were spurious (e.g., driven by population mismeasurement), cancer mortality should also decline.

Heterogeneity by Demographics. Recession-induced mortality declines are similar in percentage terms across gender and race/ethnicity, and statistically equi-proportional across age groups (p-value for equality across 25–64 versus 65+: 0.76). Because mortality is heavily concentrated in the elderly, those aged 65 and over account for approximately 74.3 percent of averted deaths, roughly proportional to their 72.5 percent share of 2006 mortality. The most striking heterogeneity is by education: the entire mortality decline is concentrated among the approximately 52 percent of the population with a high school degree or less. The estimated 2007-2016 effect is −1.3 percent per percentage-point unemployment increase (SE = 0.56) for those with high school or less, compared to +0.34 percent (SE = 0.68) for those with more than high school (statistically distinguishable at p < 0.01).

Mechanisms. The authors distinguish internal effects (own reduced employment or consumption improving health) from external effects (externalities from reduced aggregate economic activity, holding own employment/consumption fixed). Evidence strongly favors external effects as the primary driver. Three-quarters of averted deaths accrue to the elderly, who experienced no direct income effects from the labor market shock. Moreover, the timing pattern—an immediate mortality drop that does not grow over time—is inconsistent with health-behavior channels (e.g., smoking cessation, improved diet) that would build up gradually. Direct tests find no statistically significant impact on self-reported health behaviors (smoking, drinking, exercise) and no impact on healthcare use among Medicare enrollees.

Among external channels, neither reduced spread of infectious disease nor improved nursing home staffing receives empirical support. Reduced air pollution (PM2.5) is identified as a quantitatively important channel. A one-percentage-point increase in CZ unemployment is associated with a 0.16 µg/m³ decline in PM2.5 (SE = 0.04), a 1.3 percent decline relative to the 2006 national average of 12 µg/m³. A mediation analysis (controlling for the PM2.5 shock) attenuates the estimated mortality effect by 37 percent, from −0.52 percent to −0.33 percent per percentage-point unemployment increase. Back-of-the-envelope calculations combining the PM2.5 decline with external estimates of PM2.5-mortality elasticities suggest pollution can explain 17 to 35 percent of total recession-induced mortality declines.

Lag Structure. Exploiting variation in the speed of post-recession labor market recovery (measured by 2010–2016 EPOP ratio changes) conditional on the initial shock, the authors find that mortality reductions persist in areas that have fully recovered economically by 2016, suggesting lagged mortality effects of the initial economic downturn beyond what contemporaneous economic conditions alone explain.

Welfare Analysis. The authors extend the Krebs (2007) consumption-based welfare cost-of-recessions model to incorporate endogenous mortality. For a 45-year-old with γ = 2 and a value of a statistical life-year (VSLY) of $250k (five times annual consumption), accounting for endogenous mortality reduces the willingness to pay to avoid all future recessions from 2.00 percent of average annual consumption to 0.91 percent—a reduction of approximately 55 percent. Starting around age 55, recessions become welfare-improving on net. For the Great Recession specifically, at age 55 endogenous mortality reduces the welfare cost by approximately 25 percent (from 2.39 to 1.80 percent of average annual consumption). Because mortality declines are concentrated among those with high school or less, accounting for endogenous mortality also substantially mitigates—and at older ages reverses—the finding that the Great Recession was more costly for the less educated.

Scope Conditions and Caveats. (i) The design captures only differential local effects, not nationwide impacts (e.g., stock market collapse, nationwide malaise). (ii) Mortality impacts may not generalize to milder recessions, though the relationship appears approximately linear in shock size. (iii) The analysis excludes morbidity, though limited evidence suggests morbidity is also pro-cyclical and roughly equi-proportional across ages. (iv) The welfare analysis begins at age 35 and does not account for longer-run mortality costs of recession entry for younger cohorts.

In depth

Q1. What is the baseline empirical specification, and why does the design exploit cross-sectional variation rather than time-series panel regressions?

The estimating equation regresses the log age-adjusted CZ mortality rate on an interaction of the CZ-level Great Recession shock (2007–2009 unemployment change) with year indicators, plus CZ and year fixed effects, weighted by 2006 CZ population. The authors prefer this to the standard two-way fixed effects panel approach (area and year FE with contemporaneous unemployment rate) for three reasons: (1) it directly identifies the full dynamic lag structure of the shock rather than imposing contemporaneity; (2) exploiting a single spatially differentiated shock reduces risk of confounding from other concurrent area-level shocks; (3) the panel can be linked to individual-level Medicare data, allowing explicit control for endogenous migration, which the existing literature cannot do.

Q2. How does the paper address the concern that mortality rate declines might simply reflect unmeasured population outflows from hard-hit areas rather than genuine reductions in deaths?

The authors offer two main responses. First, cancer mortality shows a precise null effect despite being the second-leading cause of death; if unmeasured population losses were driving the results, cancer deaths should decline proportionally. Second, using the Medicare individual-level panel, they fix each enrollee’s location at their 2003 CZ and find a statistically significant mortality decline of 0.35 percent per percentage-point unemployment increase in the reduced-form (2007–2009 period). A control function approach that instruments current-year location with 2003 location yields an estimate of −0.37 percent (SE = 0.17), similar to the baseline −0.50 percent from the aggregate specification, confirming that migration bias is not the primary driver.

Q3. How long do the mortality reductions from the Great Recession persist, and does the paper identify whether these are contemporaneous or lagged effects?

The 2007–2009 period estimate is −0.50 percent per percentage-point unemployment increase and the 2010–2016 period estimate is −0.58 percent, and these are statistically indistinguishable (p = 0.78). To identify whether persistence reflects ongoing economic effects or true lagged mortality effects, the authors compare CZs with above- vs. below-median 2010–2016 EPOP recovery (conditional on initial shock decile). Both groups show similar 2010–2016 mortality declines despite the above-median recovery CZs having returned to pre-recession employment levels by 2016. This finding is consistent with lagged mortality effects of the initial economic downturn that persist independently of current economic conditions.

Q4. Are mortality reductions concentrated among individuals already near death (“harvesting”), or do they represent meaningful longevity gains?

The authors use a Medicare auxiliary model to predict counterfactual remaining life expectancy for each enrollee based on age, demographics, and chronic conditions. The marginal life saved has only about 6 percent lower counterfactual remaining life expectancy than a typical decedent of the same age, and this difference is statistically insignificant. Because effects persist over 10 years (not just days or weeks), short-run mortality displacement (harvesting) is not the operative concern. The 6 percent difference is also small enough that the authors do not adjust their welfare analysis for it.

Q5. What is the educational gradient in mortality impacts, and is it explained by age composition or other confounders?

Mortality declines are entirely concentrated among those with a high school degree or less: the 2007–2016 estimate is −1.3 percent per percentage-point unemployment increase (SE = 0.56) for this group versus +0.34 percent (SE = 0.68) for those with more than high school, distinguishable at p < 0.01. This gradient holds within age groups (confirmed in Appendix analysis), and further disaggregation shows no mortality declines for those with some college or college-or-more separately. In Medicare data, the elderly mortality effect is concentrated among the approximately 12 percent enrolled in Medicaid (a proxy for low income), reinforcing the socioeconomic concentration.

Q6. What evidence rules out improved health behaviors (increased exercise, reduced smoking, reduced alcohol) as the main mechanism?

Two types of evidence argue against this channel. First, three-quarters of averted deaths are among the elderly, who experienced no direct income or employment effects from the local labor market shock and would not plausibly change their health behaviors in response to someone else losing employment. Second, the mortality decline is immediate in 2007 and flat through 2016 rather than growing over time; smoking cessation, for example, takes 10–15 years to accumulate mortality effects. Direct tests of behavioral outcomes from BRFSS find no statistically significant impact on smoking, drinking, exercise, or flu vaccination rates, individually or pooled. The pooled average treatment effect on six morbidity measures is statistically significant and negative (suggesting morbidity improvements), but behavioral covariates show no movement.

Q7. What is the evidence for and against improved nursing home care as a mechanism?

Prior literature (Stevens et al. 2015; Konetzka et al. 2018; Antwi and Bowblis 2018) documents that recessions increase nursing home staffing and reduce nursing home deaths in earlier decades. However, the authors find no evidence for this channel in the Great Recession context. Estimated mortality impacts are virtually identical (approximately 0.5 percent per percentage-point unemployment increase) for the 7 percent of the elderly in nursing home care and the 93 percent not in nursing home care. Direct measures of nursing home staffing (direct-care staff hours per resident-day, highly skilled nurses ratio) show no statistically significant change in harder-hit areas: the point estimate for direct-care hours is −0.11 percent (SE = 0.22) in 2007–2009. Nursing home occupancy rates and resident characteristics also show no significant changes.

Q8. How is the quantitative importance of the air pollution channel estimated, and what are the two complementary approaches used?

Approach 1 (back-of-the-envelope): The authors combine their estimate that a one-percentage-point unemployment increase reduces PM2.5 by 0.16 µg/m³ with external estimates from Deryugina et al. (2019) of PM2.5’s effect on elderly daily mortality, rescaled to annual exposure. This calculation implies pollution explains 17–35 percent of total recession-induced mortality declines, depending on which Deryugina et al. mortality estimates are used. Approach 2 (mediation analysis): Adding the county-level PM2.5 shock as an additional control in the mortality regression attenuates the Great Recession mortality coefficient from −0.52 percent to −0.33 percent per percentage-point unemployment increase—a 37 percent attenuation. Both approaches are suggestive rather than definitive, as the mediation analysis requires the strong assumption that the recession shock and PM2.5 shock are conditionally independent of other unmeasured mediators.

Q9. What are the specific calibration parameters in the welfare model and how does the paper set the mortality decline parameter?

The authors extend Krebs (2007)’s income process calibration (pH = 0.03, pL = 0.05, dH = 0.09, dL = 0.21, g = 0.02, σ = 0.01, πH = 0.5) and use 2007 SSA life tables for age-specific mortality rates in normal times. The recession mortality parameter is set to dm = −0.015 for all ages, derived from a 3.1 percentage-point unemployment increase in a typical recession multiplied by the estimated 0.5 percent mortality decline per percentage-point. VSLY values are parameterized at two, five, or eight times annual consumption ($100k, $250k, or $400k at $50k annual consumption). Risk aversion γ takes values 1.5, 2, and 2.5. For the Great Recession-specific exercise, dmA = −0.023 (4.6 × 0.5 percent), dmHS = −0.037, and dmC = 0.0006.

Q10. How does accounting for endogenous mortality change the distributional welfare analysis of the Great Recession by education group?

Under exogenous mortality, the welfare cost of the Great Recession at age 35 is 2.89 percent of average annual consumption for those with high school or less versus 1.23 percent for those with more than high school—the less educated bear roughly twice the burden. Under endogenous mortality, the mortality declines are concentrated entirely among the less educated (dmHS = −0.037 vs. dmC ≈ 0), so accounting for mortality disproportionately offsets welfare losses for that group. By around age 65, the welfare costs of the Great Recession converge across education groups, and after age 65, the less educated bear lower welfare costs than the more educated, reversing the exogenous-mortality ranking. This result depends on the same education differential in mortality impacts that drives the main empirical finding.

Q11. What robustness checks demonstrate that the baseline mortality estimates are not driven by geographic or functional-form choices?

The baseline CZ-level estimate of −0.50 percent (SE = 0.15) is replicated almost exactly at the state level (−0.62, SE = 0.25) and county level (−0.49, SE = 0.10). A Poisson regression yields −0.45 percent (SE = 0.14). Dropping the top/bottom decile of CZs by shock size yields −0.46 percent (SE = 0.16). Adding Census-division-by-year fixed effects attenuates the estimate slightly to −0.38 percent (SE = 0.14) but retains statistical significance. Dropping CZs with high fracking activity and dropping the ten most populous CZs both produce estimates similar to baseline. Quartile regressions show monotone mortality reductions across quartiles of the unemployment shock, consistent with approximate linearity.

Q12. What does the expert survey reveal about prior beliefs, and how does the paper’s finding compare?

In a spring 2023 survey of over 300 experts, 50 percent predicted the Great Recession would increase mortality and only 27 percent predicted a decrease. Of those predicting a decrease, 93 percent gave a magnitude larger (in absolute value) than the paper’s negative point estimate of 0.50 percent per percentage-point unemployment increase, and 82 percent gave a prediction larger than the upper bound of the 95 percent confidence interval. This illustrates that the paper’s finding—mortality is meaningfully pro-cyclical during the Great Recession—was highly surprising to the empirical and policy economics community.

Key Concepts

Pro-cyclical mortality: The phenomenon whereby mortality rates fall during economic downturns and rise during expansions. The paper documents this for the Great Recession using a spatial identification strategy, in contrast to the time-series correlation that had weakened in the two decades before the Great Recession. The term “pro-cyclical” means mortality moves in the same direction as the business cycle (up in booms, down in recessions), implying recessions are associated with fewer deaths.

Internal vs. external effects (of recessions on mortality): The paper distinguishes internal effects—whereby an individual’s own reduced employment or consumption affects her own mortality—from external effects, which are changes in mortality from reduced aggregate economic activity that hold constant one’s own employment and consumption. This distinction has direct welfare implications: external effects (e.g., less pollution from lower industrial output) are genuine welfare improvements for people who did not lose income, while internal effects of behavioral change are mitigated by the envelope theorem if behavior is privately optimal.

Commuting Zone (CZ) shock: The paper’s primary treatment variable, defined as the percentage-point change in the CZ unemployment rate between 2007 and 2009. CZs are aggregations of counties (741 total) designed to approximate local labor markets. The median CZ experienced a 4.6-percentage-point increase, with substantial variation ranging from roughly 2.9 points (bottom quartile) to 6.7 points (top quartile).

Value of a Statistical Life-Year (VSLY): The dollar value placed on one additional year of life in expectation, used in the welfare calibration. In the paper’s framework it equals VSLY = bcγ − c/(γ−1), where b is a preference parameter governing the marginal utility of life-years. Results are reported for VSLYs of $100k, $250k, and $400k corresponding to two, five, and eight times average annual consumption of $50k, following Hall and Jones (2007).

Endogenous mortality in welfare analysis: The paper’s central theoretical contribution is augmenting the Krebs (2007) welfare cost-of-recessions framework to allow mortality to vary with the aggregate state of the economy. When mortality is endogenously lower in recessions, the willingness to pay to eliminate recession risk falls—and at high enough VSLY or old enough ages, recessions become welfare-improving because the mortality benefit outweighs the consumption cost.

Mortality displacement (harvesting): The possibility that short-run mortality declines merely reflect the premature death of already-frail individuals being slightly delayed, without meaningful longevity gains. The paper argues this is not the operative concern given 10-year persistence and uses auxiliary Medicare models to show marginal lives saved have only 6 percent shorter counterfactual life expectancy than average decedents of the same age.

PM2.5 mediation analysis: An empirical approach in which the county-level change in fine particulate matter (PM2.5, in µg/m³) between 2006 and 2010 is added as a covariate in the mortality regression. Under the assumption that the recession shock and the PM2.5 shock are conditionally independent of other unmeasured mediators, the attenuation in the recession-mortality coefficient when controlling for PM2.5 identifies the share of the mortality effect operating through the pollution channel. A 37 percent attenuation is found in the 2007–2009 period.

Marginal Returns to Public Universities

Mon, 01 Jan 0001 00:00:00 +0000

This paper asks whether enrolling in an American public university generates positive net returns for marginal students — those who barely qualify for admission — and whether those returns justify public expenditures. The question is policy-relevant because marginal students have weak academic preparation, face high dropout risk, and the net returns to expanding admission margins are theoretically ambiguous.

The author assembles administrative records spanning all 35 public universities in Texas, covering the universe of Texas public high school graduates from 2004–2014 (approximately 2.7 million students). Texas public universities collectively enroll over 10 percent of all American public university students. The data link high school records (test scores, demographics, coursework, attendance, disciplinary infractions) to college application and admission records, postsecondary enrollment and degree completion records, financial aid packages, institutional expenditure data from IPEDS, and quarterly earnings records from the Texas Workforce Commission unemployment insurance system.

The identification strategy exploits hundreds of decentralized SAT/ACT score cutoffs in university admissions — varying across schools and application years — that generate sharp discontinuities in admission probability. A fuzzy regression discontinuity design compares applicants just above versus just below each cutoff. On average, crossing a cutoff raises the probability of admission by 27 percentage points and the probability of enrolling at the target university by 15 percentage points. Density tests and pre-college covariate balance validate the smoothness assumptions. The typical cutoff complier is more disadvantaged than the average college applicant but comparable to the average Texas high school graduate.

Roughly half of cutoff compliers would fall back to another, typically less selective, four-year institution if rejected; 43 percent would fall back to a two-year community college; and only about 6 percent would forgo higher education entirely. The pooled estimates therefore blend intensive-margin effects (more selective versus less selective four-year college) with extensive-margin effects (four-year college versus community college or no college).

Main causal findings for enrollment compliers: the typical marginally admitted student completes approximately one additional year of credits in the four-year sector and becomes 12 percentage points more likely to ever earn a bachelor’s degree from any institution. About half of the additional four-year credits are offset by 15 fewer credits in the two-year sector, and associate degree or certificate completion falls by 7 percentage points. All bachelor’s degree gains are in non-STEM fields; STEM degree completion shows no detectable increase. Compliers become about 3 percentage points more likely to hold a graduate degree by 10 years out.

On earnings, admitted compliers earn less than rejected counterparts in the first five years due to continued enrollment. Year six is the crossover point; by years 8–12, compliers earn a stable 8.6 percent earnings premium in log terms (8.2 percent in dollar ratio terms, representing a LATE of $3,339 against an untreated complier mean of $40,829), with earnings ranks rising approximately 4 percentiles from a base near the 50th percentile.

Marginally admitted students pay no additional net tuition on average: $4,600 in additional gross tuition is nearly fully offset by grant aid, though they take on $5,300 more in student loans. Society incurs approximately $10,000 in additional educational expenditures per complier. Internal rates of return are 26 percent for students, 16 percent for society, and 7 percent for the government budget. At a 3 percent discount rate, the lifetime net present value of enrolling the typical marginal applicant is approximately $80,000 — $70,000 accruing to the student and $10,000 to taxpayers.

Earnings gains are similar across institutions of varying selectivity, but significantly smaller for low-income compliers, who spend more time enrolled, complete fewer degrees, and major in less lucrative fields. A bounding method shows that extensive-margin compliers (those who would otherwise not attend any four-year college) experience larger effects than intensive-margin compliers.

Q: What is the core research question and why is credible evidence scarce? A: The paper asks whether enrolling marginal students in American public universities generates positive net returns — private, social, and fiscal — and what drives heterogeneity in those returns. Credible evidence is scarce because most existing work is correlational and fails to account for selection bias: individuals with more college education may have had pre-existing advantages, confounding college’s causal effect with systematic sorting into it. Even if average returns are positive, the policy-relevant question is whether the marginal student — who has weak preparation and high dropout risk — represents a good investment.

Q: What is the regression discontinuity design, and what does the first stage look like? A: The author infers hundreds of decentralized SAT/ACT score cutoffs across approximately 700 application cells (combinations of university, year, GPA quartile, and test type) by searching for the score value with the largest discontinuity in admission and enrollment within each cell. This procedure delivers a superconsistent estimator of each cell’s true cutoff. Pooled across all cells, crossing a cutoff raises the probability of admission by 27 percentage points and the probability of enrollment at the target university by a precisely estimated 15 percentage points. The density of applicants and a rich set of pre-college characteristics run smoothly through the cutoffs, supporting the exclusion restriction.

Q: Who are the cutoff compliers, and are they representative of any broader population? A: Compliers — applicants who enroll in the target university if and only if they barely cross its cutoff — comprise approximately 15 percent of marginal applicants. In observable characteristics, compliers are roughly representative of the broader population of marginal applicants at the cutoff. They are significantly more disadvantaged than the average public university applicant, but broadly comparable to the average Texas public high school graduate in terms of academic preparation and family income.

Q: What are the next-best alternatives for marginal applicants who are rejected? A: Approximately 47 percent of compliers would fall back to another Texas four-year college (mostly public), 43 percent to a two-year community college, and approximately 9 percent would not enroll in any Texas institution. National Student Clearinghouse data for the 2008–2014 cohorts confirm that only 4 percent of untreated compliers attend a college outside the THECB universe, meaning approximately 6 percent of all compliers truly forgo higher education altogether if rejected. The empirically relevant extensive margin is therefore between the four-year sector and the two-year sector, not between college and no college.

Q: How does cutoff crossing change the institutional characteristics a complier experiences? A: Compliers are propelled into substantially better-resourced environments: the average math test score of college peers rises by half a standard deviation; peers are 12 percentage points less likely to have been low-income; gross tuition rises by $2,400 (a 42 percent increase over the untreated complier mean of $5,700); educational spending per student rises by $3,200 (43 percent over the untreated mean); peers’ 10-year BA completion rate rises by 28 percentage points; and peer mean earnings 8–12 years after college entry are $6,700 higher.

Q: What are the educational attainment effects? A: Cutoff crossing causes compliers to complete approximately 28 additional credits at any four-year institution (roughly one full year of a four-year program) and increases the probability of ever earning a bachelor’s degree by 12 percentage points, raising the completion rate from approximately 40 percent to just above 50 percent. About 15 fewer two-year sector credits are offset against the four-year gains, and associate degree or certificate completion falls by 7 percentage points. All bachelor’s degree gains are in non-STEM fields; there is no detectable increase in STEM degrees. Graduate degree completion rises by approximately 3 percentage points by 10 years out.

Q: What is the earnings trajectory, and when does the premium materialize? A: Admitted compliers earn less than rejected counterparts in the first five years after application because they remain enrolled longer. Year six is the crossover point. By years 8–12, the earnings premium stabilizes at approximately 8.6 percent in log terms and 8.2 percent in dollar ratio terms (a LATE of $3,339 against an untreated complier mean of $40,829). Earnings rank rises by approximately 4 percentiles from a base near the 50th percentile. These results are robust across sandwich earnings, all-quarters-with-earnings, and zero-imputed specifications.

Q: What does the cost-benefit analysis show? A: Marginally admitted students pay no additional net tuition on average: $4,600 in additional gross tuition is nearly fully offset by additional grant aid. They do borrow $5,300 more in student loans, likely financing higher room, board, and consumption costs at four-year colleges. From society’s perspective, compliers generate approximately $10,000 in additional educational expenditures. Cumulative undiscounted earnings benefits surpass costs after 8 years for students, 11 years for society, and 19 years for taxpayers. At a 3 percent discount rate, the lifetime net present value is approximately $80,000 total — $70,000 accruing to the student and $10,000 to taxpayers — with internal rates of return of 26 percent for students, 16 percent for society, and 7 percent for the government budget.

Q: Does selectivity of the admitting institution predict larger earnings returns? A: No. Compliers at more selective institutions experience substantially larger increases in peer quality than those at less selective institutions, but they are also less likely to be on the extensive margin of four-year enrollment and experience smaller BA attainment gains. These factors roughly offset, producing no systematic difference in earnings gains across institutions of varying selectivity. More selective institutions also impose no additional cumulative cost on society, while compliers actually pay slightly less in additional net tuition at more selective schools.

Q: How does the commonly used measure of college value-added (mean peer earnings) compare to actual complier returns? A: Mean peer earnings overpredicts actual value-added for marginal students by a factor of two: compliers attend an institution with $6,700 higher average peer earnings as a result of admission but gain only $3,300 themselves. The measure also overpredicts the earnings return to selectivity by a factor of three: a 100-SAT-point increase in target school selectivity predicts $3,000 higher peer earnings but only a statistically insignificant $900 higher gain in the complier’s own earnings.

Q: How do earnings returns differ by family income? A: Compliers from low-income families experience significantly smaller earnings gains compared to higher-income compliers. The gap is not explained by differential changes in college quality induced by admission. Instead, low-income compliers gain fewer degrees despite spending more time in college and major in less lucrative fields, consistent with related findings in the literature on family income gaps in degree completion and major choice.

Q: How do earnings returns differ by gender and by race? A: Female and male compliers eventually earn similar log earnings and earnings rank gains, but women reach their gains more quickly — likely because men take longer to finish college. White and Asian compliers experience similar earnings gains and BA completion improvements as Black and Hispanic compliers, despite white and Asian students experiencing larger increases in college selectivity and spending per student as a result of admission.

Q: What is the method for separating intensive- and extensive-margin effects? A: The two complier types are not directly distinguishable in the data. The author first uses an endogenous but strong stratification variable — having at least one other Texas public university admission offer — to identify some mean potential outcomes for each type. He then imposes an empirically-informed rank assumption to bound the remaining unknown mean potential outcomes, delivering tightly informative upper and lower bounds on each margin’s effects without requiring full nonparametric identification. The results show that pooled effects are driven by larger returns for extensive-margin compliers who would not have attended any four-year college, with smaller contributions from intensive-margin compliers shifting between four-year institutions.

Q: How do this paper’s earnings estimates compare to prior studies, and what explains the differences? A: This paper’s 8 percent earnings gain is smaller than the 17–26 percent reported in prior studies (Zimmerman 2014: 22%; Kozakowski 2023: 26%; Smith, Goodman, and Hurwitz 2025: 17%; Bleemer 2024: 21%; Hoekstra 2009: 20%). The differences are likely explained by the much larger educational attainment and institutional quality gains induced by those studies’ natural experiments: in Zimmerman (2014), enrollment compliers gain roughly three additional years of four-year education versus one year in this paper; in Bleemer (2024), compliers experience roughly $30,000 more in institutional spending per student versus approximately $3,000 in this paper.

Q: What are the scope conditions for these results? A: The results pertain to marginal applicants to Texas public universities (excluding UT-Austin, which uses holistic admission with no detectable SAT/ACT cutoffs) from the 2004–2014 high school graduation cohorts. The identified effects are local average treatment effects for compliers — applicants who would enroll in the target university if and only if they barely crossed its admission cutoff — and do not represent effects for always-takers or infra-marginal students. Earnings are measured only for Texas-based workers covered by the state unemployment insurance system, which captures an estimated 90 percent of the civilian labor force.

Cutoff complier: An applicant who enrolls in their target university if and only if their SAT/ACT score barely exceeds that university’s admission cutoff. Compliers are the population whose behavior — and thus whose treatment effects — are identified by the fuzzy RD design. They comprise approximately 15 percent of marginal applicants and are more disadvantaged than the average public university applicant but broadly comparable to the average high school graduate.

Extensive versus intensive margin: The extensive margin refers to the contrast between attending any four-year college versus falling back to a two-year community college or no college. The intensive margin refers to the contrast between attending a more selective versus a less selective four-year institution. Approximately half of cutoff compliers are on each margin; the paper treats them as economically distinct parameters requiring separate identification.

Fuzzy regression discontinuity (RD) design: An identification strategy that uses the discontinuous jump in admission probability at a test score cutoff as an instrument for enrollment, recovering the LATE for compliers via the ratio of the reduced-form discontinuity in outcomes to the first-stage discontinuity in enrollment. “Fuzzy” refers to the fact that crossing the cutoff changes admission and enrollment probabilities with a discrete jump rather than with certainty.

Internal rate of return (IRR): The discount rate at which the net present value of an investment equals zero — here, the discount rate equating the discounted stream of earnings benefits to the discounted stream of costs. The paper estimates IRRs separately for students (26 percent), society (16 percent), and the government budget (7 percent), reflecting different cost and benefit definitions from each perspective.

Rank assumption (bounding method): An empirically-informed assumption about the ordering of mean potential outcomes across latent complier types (extensive vs. intensive margin) that, combined with partial identification from a strong endogenous stratification variable, yields tight upper and lower bounds on each margin’s causal effects without requiring full nonparametric identification.

Net tuition: Gross tuition charges minus grant aid. For the typical marginal complier, gross tuition rises by $4,600 but is nearly fully offset by additional grant aid, yielding approximately zero additional net tuition cost — meaning the private financial cost of attending a public university for marginal students is effectively zero on net, though they take on $5,300 more in student loans to finance room, board, and consumption.

Sandwich earnings measure: A procedure applied to quarterly state earnings data that retains only quarters with positive earnings sandwiched between other quarters with positive earnings, discarding high-variance transition quarters between employment spells. Annualized by multiplying the quarterly average by four; used to reduce noise from entry and exit transitions in administrative earnings records.

Micro MPCs and Macro Counterfactuals: The Case of the 2008 Rebates

Mon, 01 Jan 0001 00:00:00 +0000

Layer 1 — Overview

Research question. Do the high marginal propensities to consume (MPCs) estimated in the leading household studies of the 2008 U.S. tax rebates—particularly Parker et al. (2013), which found MPCs of 50–90 percent within three months—imply plausible macroeconomic counterfactuals? And if not, what combination of micro-level bias corrections and general equilibrium forces reconciles the micro evidence with aggregate data?

Setting. The 2008 Economic Stimulus Act distributed approximately $100 billion in tax rebates, totaling eleven percent of January 2008 monthly disposable income. Among the 85 percent of households receiving a check, the average amount was $1,000. Rebates were distributed primarily from April through July 2008, with nearly half delivered in May alone. The timing of receipt was determined by the last two digits of Social Security numbers, providing quasi-random variation exploited by the household-level literature.

Methodology. The paper proceeds in two halves. In the first, the authors construct macro counterfactuals by calibrating a standard medium-scale two-good, two-agent New Keynesian (TANK) model with the micro MPCs from the literature and simulating what aggregate consumption would have been absent the rebate. The model contains life-cycle permanent income households and hand-to-mouth households whose dynamic spending propensities are calibrated directly to match the household-level estimates. General equilibrium effects—including Keynesian income multipliers, real interest rate movements, and changes in the relative price of durable goods—are incorporated. Counterfactual consumption paths are constructed by subtracting model-simulated deviations from steady state from actual NIPA consumption data.

In the second half, the authors revisit both the micro estimates and the macro model. On the micro side, they identify three upward biases in standard two-way fixed effects (TWFE) estimates applied to CEX data: (1) omitted variable bias from excluding the lagged rebate indicator; (2) “forbidden comparisons” bias arising from comparing cohorts with heterogeneous treatment effects, following Borusyak et al. (2022) and Sun and Abraham (2020); and (3) a rebate reporting bias in which households are systematically more likely to report receiving the rebate in the month that coincides with large expenditure increases, causing spurious positive correlation between reported receipt and contemporaneous spending. On the macro side, the baseline model is modified to incorporate an upward-sloping supply curve for durable goods (calibrated to a supply elasticity of 5, midway between House and Shapiro (2008) and Goolsbee (1998)), replacing the baseline assumption of frictionless conversion between nondurable and durable intermediates.

Main findings with quantitative magnitudes.

Implausibility of baseline counterfactuals. When calibrated to Parker et al.’s (2013) micro MPC of 0.9, the baseline model implies that real PCE absent the rebate would have collapsed by 6.0 percent from April through July 2008—a decline exceeded historically only by the Covid-19 lockdowns. Even the more modest micro MPC of 0.5 implies a 2.7 percent three-month PCE decline, comparable only to the 1980 Volcker disinflation with credit controls. For motor vehicle expenditures, the counterfactual drops range from 38 percent (micro MPC = 0.3) to 67 percent (micro MPC = 0.9)—larger than any historical experience, including the 30 percent Covid decline. Contemporaneous professional forecasters (Federal Reserve Greenbooks, Survey of Professional Forecasters, Goldman Sachs) predicted at most small consumption declines in summer 2008. Even the authors’ own pessimistic forecast model—incorporating actual oil price paths and a Lehman Brothers bankruptcy dummy—implies that the cumulative difference between actual and forecast consumption attributable to the rebate was at most $20 billion out of $100 billion in rebates, for an implied GE-MPC of at most 0.2.

Bias correction in micro MPC estimates. Applying all three bias corrections to CEX data (the preferred specification with lagged rebate indicator, cohort-level treatment effects, and lagged expenditure controls), the estimated three-month MPC falls from 0.50 to 0.28 in the full sample and from 0.82 to 0.34 in the rebate-recipients-only sample, with both rounding to approximately 0.3. The Borusyak-Jaravel-Spiess (BJS) imputation method yields an MPC of 0.20 in the full sample and 0.37 in the rebate-only sample, consistent with the OLS corrections.

Composition of spending. In the preferred corrected specification, essentially all of the total expenditure MPC of 0.3 is accounted for by motor vehicle spending: the MPC on motor vehicles is 0.30 in the full sample and 0.26 in the rebate-only sample, while the MPC on all other expenditures is −0.02 (full sample) and 0.08 (rebate-only sample).

General equilibrium dampening via inelastic durable supply. In the model with a calibrated durable supply elasticity of 5, rebate-induced demand for motor vehicles raises the relative vehicle price by approximately 1.1 percent in July 2008. This price increase crowds out durable expenditure by optimizing households through intertemporal substitution. At the preferred micro MPC of 0.3, the general equilibrium MPC (GE-MPC) for total PCE is only 0.07, well below the 0.3 micro estimate. At a micro MPC of 0.5, the GE-MPC is 0.22. The combination of the bias-corrected micro MPC and dampening general equilibrium forces implies a general equilibrium consumption multiplier below 0.2 for the 2008 rebates.

Importance of durable goods composition for HANK models. A model that abstracts from durable goods and calibrates the full expenditure micro MPC to nondurable spending predicts a GE-MPC of 0.36 when the micro MPC is 0.30—five times larger than the 0.07 implied by the model with durable goods. This contrast illustrates that the distribution of spending across nondurable and durable goods is a key determinant of the aggregate fiscal multiplier, in addition to heterogeneity in wealth and income emphasized by the existing HANK literature.

In depth

Q1. What is the central empirical puzzle the paper addresses?

A. The leading household studies of the 2008 rebates estimate very high three-month MPCs (50–90 percent). When these estimates are plugged into a standard New Keynesian model to construct counterfactual consumption paths absent the rebate, the model implies that PCE would have collapsed by 2.7–6.0 percent from April through July 2008 and then sharply recovered just as Lehman Brothers failed in September. No contemporaneous forecaster or narrative evidence suggests such extreme, short-lived macroeconomic stress was present. The Lehman collapse itself caused only a 1.1 percent three-month PCE decline—smaller than all three counterfactual declines implied by micro MPCs of 0.3, 0.5, or 0.9.

Q2. What are the features of the TANK model used to construct the counterfactuals?

A. The model is a two-good (nondurable and durable), two-agent (optimizing life-cycle and hand-to-mouth) New Keynesian model calibrated at monthly frequency, building on Ramey (2021) and Galí et al. (2007). Intermediate goods can, in the baseline, be frictionlessly converted into either nondurable or durable goods (implying a fixed relative price of one). Durable goods (interpreted as motor vehicles) enter household utility, with optimizing households facing a Calvo-type adjustment friction motivated by Evans and Ramey (1992) calculation costs. The fraction of hand-to-mouth consumers and their dynamic propensities to spend are calibrated directly to match the micro MPC estimates from the household literature. The model incorporates a Calvo-style price-adjustment structure for nondurables, sticky wages set by unions, capital with adjustment costs and variable utilization, and an inertial monetary policy rule.

Q3. How does the model translate micro MPCs into macro counterfactuals, and why does it amplify rather than dampen the micro estimates in the baseline?

A. The model’s GE-MPC equals the micro MPC’s direct demand effect plus Keynesian income multiplier effects. Because the rebate is highly transitory, there is little movement in the real interest rate (the Phillips curve is flat and monetary policy is inertial), so the dominant general equilibrium force is the income multiplier. This amplifies, rather than dampens, the micro MPCs. As a result, the GE counterfactuals exhibit even sharper V-shapes than the pure micro counterfactuals.

Q4. What narrative and forecast evidence do the authors use to argue the baseline counterfactuals are implausible?

A. Contemporary forecasts from the Federal Reserve Greenbooks, the Survey of Professional Forecasters, and Goldman Sachs all predicted at most small consumption declines in summer 2008—Goldman Sachs forecast only −0.125 percent (not annualized) per quarter in Q2–Q3 2008. The authors also construct their own “pessimistic” time-series forecast that incorporates actual oil price paths (which rose from $98 to $140 per barrel by July 2008) and a Lehman Brothers bankruptcy dummy; even this forecast lies above all three model counterfactuals in summer 2008 and displays no V-shape. Furthermore, the cumulative difference between actual PCE and the pessimistic forecast over April–October 2008 totals only $20 billion—implying a GE-MPC of at most 0.2 even if the entire gap were attributed to the rebate.

Q5. What is the first bias in standard TWFE estimates of the MPC, and how large is its effect?

A. The first bias is omitted variable bias from excluding the lagged rebate indicator. In a first-differenced panel regression, lagged treatment enters the error term. Because current treatment reduces the probability of past treatment, current and lagged treatment are negatively correlated, and omitting the lag inflates the OLS estimate of the contemporaneous effect. Including a lagged rebate indicator reduces the contemporaneous spending response by $40 in the full CEX sample (from $470 to $434) and by approximately $237 in the rebate-only sample (from $764 to $527).

Q6. What is the “forbidden comparisons” bias and how is it corrected?

A. When treatment effects are heterogeneous across cohorts (e.g., the June rebate cohort has a larger MPC than the September cohort), standard homogeneous TWFE estimates use later-treated cohorts as control groups for earlier-treated cohorts even after accounting for average mean-reversion. Because the mean-reversion of the earlier (larger-effect) cohort is larger than that of the later cohort, this comparison is contaminated, inflating the estimate. The authors correct for this by allowing cohort-specific treatment effects, following Sun and Abraham (2020). This reduces the contemporaneous effect by a further $90 in the full sample; in the rebate-only sample the correction raises the estimate slightly (by $70) because later treatment effects are larger in that sample.

Q7. What is the rebate reporting bias and what mechanism underlies it?

A. The rebate reporting bias arises because households in the CEX are systematically more likely to report receiving the rebate in the interview month that coincides with high expenditure. Although the true timing of rebate checks is determined by Social Security number last-digits (and is thus random), the reported timing may reflect recall issues: households more readily remember and report receiving the rebate when it was accompanied by a large purchase. The empirical signature is a statistically significant negative effect of future rebate receipt on current expenditure (−$863 in the full sample, −$575 in the rebate-only sample at the 10% level), indicating that rebate reporters had unusually low spending in the period prior to reporting receipt. Controlling for lagged expenditure and income decile fixed effects corrects for this bias, reducing the three-month MPC in the full sample from 0.37 to 0.28.

Q8. What are the authors’ preferred bias-corrected MPC estimates, and how do they compare across specifications and estimators?

A. After correcting for all three biases (preferred specification, column 4 of Table 3), the implied three-month MPC is 0.28 in the full sample and 0.34 in the rebate-only sample, both approximately 0.3. The Borusyak-Jaravel-Spiess imputation method, which imposes weaker assumptions and overcomes the first two biases by construction, yields an MPC of 0.20 (full sample) and 0.37 (rebate-only sample), with an average consistent with the OLS-corrected estimates. Both methods point to an MPC around 0.3, substantially below the 0.5–0.9 range from the baseline Parker et al. (2013) approach.

Q9. How is almost all of the total expenditure MPC concentrated in motor vehicles?

A. After bias correction, the MPC on motor vehicles is 0.30 in the full sample and 0.26 in the rebate-only sample. The MPC on all other PCE is −0.02 (full sample) and 0.08 (rebate-only sample), neither statistically significant. This concentration in durables is consistent with Adams et al. (2009) and Aaronson et al. (2012), and is corroborated by CEX vehicle-expenditure data showing a car-purchase response concentrated in the three months surrounding receipt of the rebate.

Q10. How does introducing an upward-sloping supply curve for durable goods change the model’s general equilibrium predictions?

A. In the modified model, durable goods producers face a production externality (or fixed factor) that makes the short-run supply of motor vehicles upward-sloping, with supply elasticity calibrated to 5. When rebate recipients increase demand for motor vehicles, the relative price of motor vehicles rises by approximately 1.1 percent in July 2008 (consistent with the observed 1.5 percent spike in the BLS new vehicle price index relative to core CPI around the rebate distribution). This price increase induces optimizing households to intertemporally substitute away from durable goods. Because durable demand is highly price-elastic (long-run elasticity of −1 to −15 depending on the study), even a modest relative price increase generates substantial crowding out of durable expenditure by non-recipients.

Q11. What are the GE-MPC estimates in the modified model with less elastic durable supply, and how do they decompose?

A. At the preferred micro MPC of 0.3, the GE-MPC for total PCE is 0.07—general equilibrium forces dampen the micro effect. At micro MPC of 0.5, GE-MPC is 0.22 (modest dampening). At micro MPC of 0.9, the GE-MPC rises to 1.42 (amplification). Decomposing by good type at micro MPC of 0.3: the GE-MPC on motor vehicles is 0.09 and the GE-MPC on nondurables is −0.03. The dampening is concentrated almost entirely in durable expenditure.

Q12. How sensitive are the GE-MPC results to the calibration of durable demand elasticity?

A. The baseline calibration uses a long-run vehicle demand elasticity of −15, based on household-level evidence from Bachmann et al. (2021). When the authors instead use the lower-bound estimate of −6.4 from Baker et al. (2019), the GE-MPC at micro MPC of 0.3 rises from 0.07 to 0.12. Even at this lower demand elasticity there is substantial crowding out in general equilibrium, so the qualitative conclusion is robust.

Q13. Why does a nondurables-only model with the same overall MPC substantially overstate the fiscal multiplier?

A. When abstracting from durable goods and calibrating a nondurable MPC of 0.30 (to match the overall expenditure MPC), the model predicts a GE-MPC of 0.36—five times larger than the 0.07 from the two-good model. This occurs because nondurable demand is far less price-elastic than durable demand, and the nearly-flat Phillips curve makes nondurable supply very elastic, so there is no relative-price-driven crowding out channel. The comparison illustrates that the distribution of spending across nondurable and durable goods is a quantitatively important determinant of the fiscal multiplier, independent of the level of the MPC.

Q14. What evidence is provided that the control group in the household regressions is itself affected by the rebate in general equilibrium?

A. Figure 9 in the paper plots motor vehicle spending per household by rebate-receipt status using CEX data. When rebate recipients begin reporting receipt in June 2008, motor vehicle expenditure in the rebate group rises while simultaneously falling in the never-rebate group. This pattern is consistent with the model’s prediction that the rebate-induced rise in relative motor vehicle prices crowds out purchases by non-recipient households. This general equilibrium spillover means the difference-in-differences micro MPC estimate remains valid as a micro estimate (the symmetric crowding out does not affect the treated-versus-control difference), but the aggregate GE-MPC is less than the micro MPC.

Q15. How do the authors verify that their preferred corrected specification recovers true MPCs?

A. In Appendix C.6 the authors simulate household-level data from the modified Section 5 model and apply both the original Parker et al. (2013) specification (Equation 1) and their preferred corrected specification (Equation 5). The Parker et al. specification produces upward-biased MPC estimates in the simulated data, consistent with Kaplan and Violante’s (2014) theoretical argument. The preferred corrected specification recovers the true MPCs from the model, validating the correction methodology.

Key Concepts

GE-MPC (General Equilibrium Marginal Propensity to Consume). The paper’s term for the aggregate increase in total consumer spending per dollar of tax rebate, incorporating both the direct micro-level demand effect of the rebate on hand-to-mouth households’ consumption and the induced macroeconomic income effects from Keynesian multipliers and relative price changes. Distinct from the micro MPC, which captures only the household-level spending response before any general equilibrium feedbacks.

Micro MPC. The causal effect of receiving a temporary lump-sum transfer on a household’s own consumer expenditure, expressed as a fraction of the transfer amount, estimated from household panel data via difference-in-differences event studies. In the paper’s usage, this is a partial equilibrium concept that excludes any impact of the policy on prices, wages, or other households’ incomes.

Forbidden comparisons bias. A form of bias in two-way fixed effects event study estimates that arises when treatment effects are heterogeneous across cohorts and later-treated units are used as control groups for earlier-treated units whose outcomes are still reverting after treatment. Named and formalized in Borusyak and Jaravel (2017) and Borusyak et al. (2022); in this paper it manifests because cohorts receiving rebates in June have systematically larger spending responses than those receiving in September, so using September recipients as a “clean” control for June reversal yields contaminated estimates.

Rebate reporting bias. A bias specific to the CEX survey data in which the timing of a household’s self-reported rebate receipt is correlated with unusually high contemporaneous expenditure (and correspondingly low prior-period expenditure), likely due to recall effects. Because the true rebate timing is random but the reported timing is not, this correlation inflates the difference-in-differences estimate of the spending effect.

Two-good, two-agent New Keynesian (TANK) model. A medium-scale New Keynesian model containing two types of households (optimizing life-cycle consumers and hand-to-mouth consumers who exhaust current income) and two goods (nondurables and durable goods interpreted as motor vehicles). The model is used in this paper as a framework to translate micro MPC estimates into aggregate general equilibrium counterfactuals, calibrated at monthly frequency.

Durable supply elasticity. The elasticity of real durable goods production with respect to the relative price of durable goods, calibrated in the paper to 5. In the baseline model, this elasticity is infinite (the relative price is fixed at one because intermediates convert frictionlessly). With a finite supply elasticity of 5, rebate-induced durable demand causes the relative vehicle price to rise, generating crowding out of optimizing households’ durable expenditure.

Calvo durable adjustment friction. An adjustment friction imposed on optimizing households’ durable goods purchases, motivated by Evans and Ramey’s (1992) calculation cost model. Only a fraction 1−θd of households reoptimize their durable stock each period (with probability drawn randomly), producing a Calvo-type reduced form. This friction limits both the extensive and intensive margins of durable adjustment and prevents unrealistically large intertemporal substitution of durable purchases in response to price changes.

Macro counterfactual. In this paper’s usage, the simulated path of aggregate consumption that would have occurred in the absence of the 2008 tax rebate, constructed by subtracting the model-implied impulse response to the rebate from the actual observed NIPA consumption series. Plausibility of the counterfactual is assessed by comparison to contemporaneous forecasts and to historical episodes of large consumption declines.

Minimum Wages, Efficiency, and Welfare

Mon, 01 Jan 0001 00:00:00 +0000

Overview

Research question. Can minimum wages improve welfare through efficiency — by correcting monopsony-driven under-employment — and, if so, by how much? What is the optimal minimum wage, and how much of the welfare gain from a higher minimum wage comes from efficiency versus redistribution?

Model and methodology. The paper develops a tractable general equilibrium oligopsony model with heterogeneous workers (four types: non-high-school, high-school, college workers, and capital owners) and heterogeneous firms (varying in total factor productivity), embedded in a continuum of local labor markets where firms compete strategically in Cournot fashion. Firms face downward-sloping labor supply curves; their market power generates wages below the marginal revenue product of labor (markdowns). The model is calibrated to US data using the Census Longitudinal Business Database (LBD, 2014), the Bureau of Labor Statistics Current Population Survey (CPS, 2019), and the Survey of Consumer Finances (SCF). Key calibration targets include: average firm size of 22.83 workers (LBD), 29 percent of workers earning below $15/hr (CPS), labor and capital income shares, and household-level earnings and capital income ratios. The model is validated by quantitatively replicating four strands of empirical evidence: (i) reallocation effects of the German minimum wage introduction (Dustmann et al., 2021); (ii) employer spillover responses to Amazon’s voluntary $15 minimum wage (Derenoncourt et al., 2021); (iii) wage distribution compression evidence from Brazil (Engbom and Moser, 2021); and (iv) heterogeneous employment effects by market concentration (Azar et al., 2019).

Three channels for efficiency gains. The model identifies three mechanisms through which a minimum wage can improve efficiency under oligopsony: (1) a direct effect in which constrained firms with monopsony markdowns increase wages and expand employment toward the competitive level (Region II firms); (2) a spillover effect in which unconstrained competitor firms narrow their own markdowns in response to constrained firms’ increased wages and market shares; (3) a reallocation effect in which employment is shifted away from low-productivity firms (which enter Region III — constrained on labor demand) toward more productive firms.

Main findings on efficiency versus redistribution. Under the $15.12/hr minimum wage that maximizes social welfare under utilitarian weights (population-share weights), less than 5 percent of the welfare gains come from improved efficiency, while more than 95 percent come from redistribution. When the government is additionally given access to budget-neutral lump-sum transfers that fully address redistribution goals, the efficiency-maximizing minimum wage narrows to a range of approximately $7.50–$10.00 per hour, which is robust across social welfare weight specifications. The welfare gains attributable to efficiency alone are approximately 0.16–0.20 percent in consumption-equivalent terms, representing only about 1–2 percent of the welfare gains achievable in an economy with no labor market power at all (which would be 15.26 percent in consumption-equivalent terms under the same conditions with optimal transfers).

Why efficiency gains are small. Three structural reasons limit efficiency gains: (i) low-productivity firms — which are the firms most affected by a binding minimum wage in Region II — have endogenously narrow markdowns even absent a minimum wage, because they face more elastic labor supply and command small market shares; (ii) the calibrated production function has relatively flat marginal revenue product of labor schedules (decreasing returns parameter α = 0.940), so once firms enter Region III, employment rationing occurs rapidly; (iii) the large, high-productivity firms with the widest markdowns are not materially affected by the minimum wages of their small, low-wage competitors because those competitors have small market shares — making spillovers quantitatively negligible even though the model matches empirical cross-employer wage elasticities.

Optimal minimum wages under alternative frameworks. Without transfers and under utilitarian weights, the optimal minimum wage is $15.12. Without transfers but under Negishi weights (which rationalize the observed competitive equilibrium and load approximately 62 percent of weight on college workers and owners versus their 35 percent population share), the optimal is $6.97. Under a 97 percent weight on high-school graduates, the optimal rises to $18.32. With optimal lump-sum transfers, the optimal collapses to $7.76–$10.11 regardless of social welfare weights — a range robust across Frisch elasticity variants (ϕ ∈ {0.30, 0.62, 0.86}), regional decompositions (low, medium, and high income US states), short-run capital-fixed scenarios (where the optimum declines by approximately $1 under utilitarian weights), and the removal of household heterogeneity entirely (which yields an optimum of $7.74).

Distributional proxies versus welfare. Wage inequality (college–non-college log wage premium, cross-sectional variance of log wages) and the labor income share are monotonically improving as the minimum wage rises, even as welfare is hump-shaped and eventually declining. A rise in the minimum wage from $7.50 to $15 reduces the college–non-college log wage premium from 0.53 to 0.43 (roughly one-fifth), reduces the cross-sectional variance of log wages by nearly half, and raises the aggregate labor income share by approximately 3 percentage points — all while welfare (under utilitarian weights with no transfers) reaches its maximum at $15.12 and then declines. These standard proxies therefore do not reliably indicate welfare.

Scope conditions. All results are long-run steady-state comparisons unless otherwise noted. Results assume no price passthrough and a unit elasticity of substitution between capital and labor. The paper abstracts from capital–labor substitution responses and occupational choice. The redistribution channel quantified here is specific to the utilitarian welfare criterion and to the existing distribution of capital and profit income, in which owners (6 percent of households) earn 92 percent of dividends.

In depth

Q1. What are the three regions of firm behavior in response to a binding minimum wage, and what are their efficiency implications?

A: A firm can be in one of three regions. In Region I the minimum wage is not binding: the firm pays its optimal monopsony wage and employment is inelastically below the competitive level. In Region II the minimum wage binds and exceeds the firm’s optimal monopsony wage, but labor supply at the minimum wage still falls short of labor demand: employment and efficiency improve as the shadow markdown narrows. In Region III the minimum wage exceeds the competitive wage, so unconstrained labor supply would exceed demand: the firm rations employment and the rationing constraint binds, reducing efficiency. At the boundary of Region II and Region III, the shadow markdown equals one and the firm is at its efficient employment level. Only a firm-specific minimum wage targeting each firm’s competitive wage could deliver economy-wide efficiency.

Q2. How does the paper define and use “shadow wages” to characterize equilibrium?

A: The shadow wage for a firm is the effective wage that rationalizes equilibrium employment given rationing constraints. Formally, when a firm rations employment (Region III), households act as if facing a shadow wage equal to the actual minimum wage multiplied by a rationing factor p < 1 (the Lagrange multiplier on the rationing constraint, normalized as a fraction). Shadow wages aggregate across firms into market- and type-level shadow wages via CES aggregation. The key insight is that shadow wages, not observed wages, are allocative: aggregate labor supply for each worker type is determined by the type-level shadow wage, not by the minimum wage that firms actually pay. This allows the paper to express aggregate efficiency via two wedges — the aggregate shadow markdown (capturing average market power) and a misallocation term — without tracking all firm-specific constraints individually.

Q3. What are the two aggregate efficiency wedges and how do they behave as the minimum wage rises?

A: The two wedges are: (i) the aggregate shadow markdown µ̃, which is a productivity-weighted average of firm-level shadow markdowns and measures the extent to which aggregate wages fall short of marginal revenue products; and (ii) the misallocation term ω, which measures whether employment is allocated toward more productive firms and equals one when all shadow markdowns are identical. As the minimum wage rises from zero, µ̃ initially narrows (improving efficiency) because firms in Region II expand toward their competitive employment level and constrained firms’ market shares rise, tightening the residual labor supply of unconstrained competitors and narrowing their markdowns. But as the minimum wage rises further, Region III rationing causes shadow markdowns to widen rapidly — first for low-productivity firms and then progressively for more productive ones — so µ̃ turns back downward. The misallocation term ω first improves as low-productivity firms are pushed out, but then worsens because rationing at intermediate-productivity firms redirects employment from high- to medium-productivity firms.

Q4. What does the model validation exercise on the German minimum wage (DLSUB 2021) show?

A: The paper calibrates the model to the German context by setting a minimum wage of $8.95/hr equivalent to 48 percent of the pre-reform median wage — matching Germany’s 8.50 euro introduction in 2015, where 15 percent of workers earned below the threshold. The model produces employment effects that are slightly positive (consistent with empirical findings of no disemployment), average wage increases consistent with both constrained and unconstrained firms raising wages, a negative elasticity of the number of operating firms with respect to minimum wage exposure (correctly signed, moderately smaller than data), and a positive elasticity of average firm size with respect to exposure (slightly larger than the data). The reallocation direction — small unproductive firms shrinking and workers moving to larger, more productive firms — matches the data qualitatively and within the range of data estimates across specifications.

Q5. What does the Amazon spillover replication (DNWT 2021) show, and what does it imply about the minimum wage spillover channel?

A: Derenoncourt et al. (2021) estimate a cross-employer wage elasticity of 0.26: when Amazon raised wages by approximately 18.1 percent, competitors raised wages by 4.7 percent on average. The model replicates this by treating Amazon as the largest (or second-largest) firm in each market, exogenously narrowing its markdown by a fraction ζ calibrated to deliver an 18.1 percent wage increase. Competitors in the model raise wages through the strategic interaction mechanism: Amazon’s higher wage and market share tightens competitors’ residual supply curves, inducing them to narrow their own markdowns. The model matches the 0.26 cross-employer elasticity when Amazon is the largest firm in markets with at least 36 competitors, or the second-largest in markets with at least 12. Critically, the authors note that this empirical evidence concerns responses to a large firm raising wages; for minimum wages the question is whether large firms respond to their small wage competitors, which the model shows they do not substantially, because small firms have negligible market shares.

Q6. How does the paper separate efficiency from redistribution, and what is the key methodological innovation?

A: The paper gives the government access to budget-neutral, unrestricted lump-sum transfers across households in addition to the minimum wage. With transfers available, the government can use them to meet any redistributive objective encoded in arbitrary social welfare weights. Whatever is left for the minimum wage to do must be purely efficiency-improving. The paper shows (via aggregation theorems) that optimal lump-sum transfers can be computed in closed form for any social welfare weights, and that the social welfare maximizing allocation subject to transfers can be decentralized by transfers that sum to zero across households. Under this framework, the efficiency-maximizing minimum wage lies between $7.50 and $10.00 per hour regardless of whether utilitarian, Negishi, or 97 percent high-school-weighted social welfare functions are used — collapsing the original $0–$31 range to a tight interval.

Q7. How are Negishi weights computed, and why are they important for interpreting the results?

A: The Negishi weights are the social welfare weights under which a planner would choose the observed competitive equilibrium with zero lump-sum transfers. They are computed by inverting the planner’s first-order conditions: for the competitive equilibrium to be optimal under some set of weights, the implied consumption ratios must match observed data. The calibrated Negishi weights assign a combined weight of approximately 62 percent to college workers and owners, who constitute only 35 percent of the population. This means the competitive equilibrium is disproportionately aligned with higher-income households. A utilitarian planner, which weights households by population shares, therefore sees large scope for redistribution toward non-college workers — which is exactly why the utilitarian-optimal minimum wage is $15.12 and why 94 percent of its welfare gains come from redistribution rather than efficiency.

Q8. What are the quantitative welfare gains from the efficiency-maximizing minimum wage, and how small are they relative to the potential gains from eliminating monopsony?

A: With optimal lump-sum transfers, the welfare gains from the efficiency-maximizing minimum wage are approximately 0.16–0.20 percent in consumption-equivalent terms, robust across social welfare weight specifications, Frisch elasticity variations, and regional decompositions. The welfare gains associated with an economy in which all firms’ markdowns are set to one (no labor market power at all), also evaluated with optimal transfers, are 15.26 percent in consumption-equivalent terms. The efficiency-maximizing minimum wage therefore recovers approximately 1–2 percent of the potential welfare gains from eliminating monopsony. Equivalently, the efficiency gains correspond to roughly a 0.1 percent increase in TFP. These gains are small despite the model matching all empirical evidence on the channels through which efficiency gains could occur.

Q9. How do employment effects of minimum wages vary by market concentration, and why?

A: In concentrated markets (upper tercile of HHI), firms have larger monopsony markdowns, so a binding minimum wage pushes them into Region II — where employment expands — over a wider range of minimum wage values before entering Region III. This produces large, positive employment effects in concentrated markets. In less concentrated markets, firms already have narrow markdowns (they are closer to competitive), so even small minimum wage increases push them into Region III, where employment contracts. The model replicates the statistically significant positive effects in high-concentration markets and negative effects in low-concentration markets documented by Azar et al. (2019), for initial minimum wages below approximately $8/hr. At higher initial minimum wages, however, even high-concentration markets exhibit negative employment effects as more firms enter Region III.

Q10. What does the robustness exercise for Mississippi reveal?

A: Mississippi has the lowest per capita income in the US, and a $15 minimum wage would bind for 41.3 percent of its workers (versus 29.4 percent nationally). Despite this, the model finds that Mississippi would benefit from a $15 federal minimum wage under utilitarian weights, and the Mississippi-specific optimal minimum wage is $14.89 — nearly identical to the national optimum. The reason is an offsetting compositional effect: while Mississippi has lower average wages (pushing toward a lower optimal), it has a larger share of high-school graduates (63 percent versus 52.8 percent nationally) who prefer higher minimum wages (around $17 in the model). These two forces wash out, producing a stable optimal close to the national figure.

Q11. What happens to common empirical proxies for inequality and worker power as the minimum wage rises?

A: The college–non-college log wage premium declines from 0.53 to 0.43 (a fall of roughly one-fifth) as the minimum wage rises from $7.50 to $15. The cross-sectional variance of log wages falls by nearly half over this range, driven equally by declining within- and between-type inequality. The aggregate labor income share rises by approximately 3 percentage points, and the share of output created in non-high-school jobs paid to non-high-school workers rises by 7 percentage points. All of these proxies are monotonically improving in the minimum wage throughout, even as aggregate welfare under the model’s social welfare function is hump-shaped and declining past the optimum. The paper concludes that observations of declining inequality or a rising labor share are consistent with falling welfare, so these proxies cannot serve as reliable welfare indicators.

Q12. How does the short-run (fixed-capital) analysis differ from the long-run baseline?

A: In the short run, capital at each firm is fixed at the type-specific level chosen under a zero minimum wage. This creates sharper decreasing returns in labor (parameter γα rather than α̃), overhead costs that can make operation unprofitable, and a narrower range of minimum wages over which firms remain in Region II. The result is that firms in the short run enter Region III at lower minimum wages than in the long run, limiting the range of efficiency gains. Quantitatively, the efficiency-maximizing optimal minimum wage declines by approximately $1 under utilitarian weights (from about $10 to about $9 in the short-run exercise) and by only about $0.20 under Negishi weights. The robustness conclusion is that the difference between short- and long-run optimal minimum wages is modest, and the main finding that efficiency gains are small is preserved.

Key Concepts

Shadow wage (w̃ᵢⱼ): The effective wage that rationalizes a firm’s equilibrium employment in the presence of a minimum wage. When labor is rationed at firm ij (Region III), the shadow wage equals the actual minimum wage multiplied by a rationing factor pᵢⱼ < 1, where pᵢⱼ is derived from the Lagrange multiplier on the household’s rationing constraint. The shadow wage is allocative — it determines labor supply decisions — while the observed minimum wage wage is not. When the rationing constraint is slack (Regions I and II), the shadow wage coincides with the observed wage.

Shadow markdown (µ̃ᵢⱼ): The ratio of a firm’s shadow wage to its marginal revenue product of labor. In Region I (unconstrained), this equals the standard monopsony markdown. In Region II (constrained, on the labor supply curve), the shadow markdown narrows as the minimum wage increases, moving the firm toward its efficient employment level. In Region III (constrained, on the labor demand curve), the shadow markdown equals the rationing multiplier pᵢⱼ and widens, reflecting efficiency losses from rationing. An aggregate shadow markdown µ̃ is computed as a productivity-weighted average of firm-level shadow markdowns across all firms in the economy.

Misallocation wedge (ω): A productivity-weighted measure of how well employment is allocated across firms. In an efficient allocation with identical shadow markdowns, ω = 1. When high-productivity firms have wider markdowns than low-productivity firms (the baseline oligopsony outcome), ω < 1 because employment is directed away from productive firms. A minimum wage can improve ω by shrinking low-productivity firms but worsens it when high-productivity firms enter Region III and are over-rationed relative to medium-productivity firms.

Oligopsony with Cournot competition: The specific form of labor market power in this model. In each local labor market (defined as a NAICS 3-digit industry × commuting zone cell), a finite number of firms compete strategically in employment quantities, taking their competitors’ employment levels as given (Cournot assumption). Each firm has an upward-sloping labor supply curve derived from nested CES household preferences, and exercises a markdown on the marginal revenue product of labor. This differs from monopsony (one firm) or perfect competition (infinitely many firms), and generates both direct effects and spillover effects of minimum wages.

Negishi weights: The vector of social welfare weights under which the observed competitive equilibrium allocation would be the solution to a social planner’s problem with zero lump-sum transfers. In this model, the calibrated Negishi weights assign roughly 62 percent combined weight to college workers and owners (who constitute only 35 percent of the population), reflecting the fact that the market equilibrium allocates a disproportionate share of consumption to high-income households. The Negishi weights are used both to identify the gap between market outcomes and utilitarian objectives (motivating redistribution) and as one alternative normative benchmark.

Efficiency-maximizing minimum wage: The minimum wage that maximizes social welfare when the government additionally has access to budget-neutral lump-sum transfers across households. Because transfers can be optimized to handle any redistributive objective encoded in any arbitrary social welfare weights, the minimum wage under this framework serves solely to improve productive efficiency. In the calibrated model, the efficiency-maximizing minimum wage is approximately $7.50–$10.00 per hour, robust to social welfare weight specifications, Frisch elasticity variations (ϕ ∈ {0.30, 0.86}), and regional income differences.

Rationing constraint (n̄ᵢⱼₖ): A firm-specific, type-specific upper bound on the labor a household may supply to a firm in equilibrium. These constraints are taken as given by households and determined in equilibrium by firms’ labor demand decisions. When the minimum wage is above the firm’s competitive wage (Region III), the firm’s labor demand is less than what households would want to supply at that wage, so the rationing constraint binds. The binding rationing constraint generates the shadow wage discount (pᵢⱼ < 1) and is the mechanism by which high minimum wages reduce efficiency in the model.

Monetary Cooperation during Global Inflation Surges

Mon, 01 Jan 0001 00:00:00 +0000

In a multicountry model with nominal wage rigidities, two sectors (tradable with convex supply, nontradable with flat supply), and free capital mobility, the paper studies optimal monetary policy during a global demand reallocation shock — a shift in preferences toward tradables (ω₀ > ω). Under cooperation (Proposition 1), the optimal response allows inflation to rise: higher tradable goods prices reduce real wages (restoring labor demand), generate expenditure switching back toward nontradables, and boost nontradable employment through an income effect. Cooperation achieves full employment as long as the inflation cost is below the full-employment threshold; otherwise it strikes the optimal inflation-unemployment balance. Under noncooperation (Proposition 3), each national central bank perceives it can attract capital inflows by raising its policy rate — inflows sustain nontradable demand and reduce the domestic sacrifice ratio of disinflation. But in a symmetric Nash equilibrium, synchronized rate hikes cancel each other through global credit market clearing; only the global monetary contraction remains. The result is lower inflation than under cooperation but higher unemployment — a competitive appreciation trap that mirrors the competitive depreciation failures of the Great Depression and the 2008 crisis, but in the opposite direction (global scarcity rather than deficiency of tradables). In a numerical example calibrated to α = 0.64 (convex tradable supply, implying 0.57 price-output elasticity, from Boehm and Pandalai-Nayar 2022) and ω = 0.3 (US pre-COVID tradables share), a 3 percentage point demand reallocation (matching the US COVID episode) requires approximately 1.5 percentage points of inflation to maintain full employment under cooperation; without any inflation, unemployment rises by approximately 8 percentage points. At ω₀ = 0.35, the uncooperative equilibrium reduces inflation by approximately 1 percentage point relative to cooperation but pushes unemployment to approximately 7 percent. For the COVID-19 episode, the authors conclude gains from cooperation were likely small (full employment maintained globally); for the 1980s synchronized tightening — when central banks explicitly sacrificed employment to fight inflation — the model implies substantially positive gains, consistent with the heated cooperation debates and the 1985 Plaza Accord.

In depth

Q1. How does a demand reallocation shock generate an inflation-unemployment tradeoff?

A shift in preferences toward tradable goods (ω₀ > ω) reduces demand for nontradable goods, causing nontradable firms to fire workers; since nominal wages are rigid, the only way to sustain full employment is through a rise in the price of tradables (P^T), which operates through three distinct channels. First, higher P^T raises tradable sector firms’ real revenue per worker (nominal wages fixed), inducing them to hire more workers and expand output — the direct labor demand channel. Second, higher P^T generates income effects: as tradable output and income rise, households increase consumption of both tradable and nontradable goods, boosting nontradable employment through the income channel. Third, higher P^T generates expenditure switching away from tradables and toward nontradables (since nontradable goods become relatively cheaper), which also sustains nontradable employment. All three channels require letting P^T rise, which means tolerating inflation. In this sense, the demand reallocation shock acts as a cost-push shock — it shifts the Phillips curve upward, so that higher inflation is required to achieve any given level of employment. If the inflation cost is sufficiently low, the optimal response allows full employment; otherwise an interior solution trades off inflation against economic slack.

Q2. What is the optimal cooperative monetary policy, and how large are the quantitative tradeoffs?

Proposition 1: under international cooperation, the optimal response to ω₀ > ω entails a rise in inflation; if the full-employment inflation level P^fe satisfies χ’(P^fe) ≤ (1/ω₀)(α/(1−α) + 1 − ω₀), the cooperative optimum achieves full employment; otherwise the interior optimum sets χ’(P̄) equal to that expression, balancing marginal inflation cost against marginal employment benefit. The cooperative optimum is strictly superior to strict inflation targeting (P = 1) because the latter allows large unemployment without achieving any structural rebalancing. The global central bank internalizes the income effect from tradable expansion: as Y^T rises, households immediately spend the income on consumption of both goods, further boosting nontradable employment — an amplification mechanism that self-oriented national banks will not fully internalize. In the calibrated numerical example (α = 0.64, ω = 0.3, χ(P) = χ̄(P−1)²/2 with χ̄ = 299.25), a reallocation shock matching the US COVID-19 episode (ω₀ − ω ≈ 0.03) requires approximately 1.5 percentage points of inflation to maintain full employment; under strict inflation targeting (P = 1), unemployment rises by approximately 8 percentage points. These magnitudes are consistent with the observation that during the pandemic inflation cycle, central banks were willing to allow inflation rather than trigger a labor market collapse.

Q3. How does capital mobility change the inflation-unemployment tradeoff faced by individual countries?

Capital mobility reduces the domestic sacrifice ratio — the employment cost of disinflation — through two channels: trade deficits directly sustain nontradable demand (offsetting the fall in tradable sector employment), and they buffer tradable consumption from drops in domestic tradable output. When a single country contracts its monetary policy and P^T falls, domestic tradable output falls; but households react by borrowing internationally, so domestic consumption of tradables falls by less than one-for-one with output (formally: ∂C^T/∂Y^T = ω_{i,0}(1−β)/(ω_{i,0}(1−β)+β) < 1). Capital inflows thus sustain nontradable demand and nontradable employment, partially offsetting the contractionary effect on employment. From each country’s perspective, containing inflation “exports” part of the output loss abroad, making disinflation individually less costly than in a closed economy. This is precisely what creates the coordination failure in the global case: each country perceives a lower sacrifice ratio for disinflation because it does not internalize that this lower sacrifice ratio exists only if the rest of the world continues to produce and lend tradable goods.

Q4. How does the coordination failure arise in a global reallocation shock, and what is the precise mechanism of competitive appreciations?

Proposition 3: in a Nash equilibrium with a global symmetric shock, the full-employment inflation level P^fe coincides with the cooperative benchmark (since C^T_i = Y^T_i in symmetric equilibrium and capital flows net to zero), but if the inflation cost is high enough, self-oriented central banks impose a lower inflation ceiling (MP^u < MP^c) — resulting in lower inflation and higher unemployment than cooperation. Each national central bank individually seeks to reduce domestic inflation by hiking its policy rate to attract capital inflows (which ease the nontradable sector employment constraint through the open economy Phillips curve). But the individual strategy of hiking to attract inflows — which amounts to trying to appreciate the exchange rate (S_i = P^T_{i,t}/P^T_t) — is frustrated in a symmetric Nash equilibrium: when all countries hike simultaneously, capital flows net to zero globally, exchange rates remain unchanged, and only the synchronized monetary contraction remains. This is the mechanism of competitive appreciations: countries try to fight domestic inflation by appreciating their currencies, but appreciate against each other, leaving only a global slump. In the numerical example at ω₀ = 0.35, the uncooperative equilibrium reduces inflation by approximately 1 percentage point relative to cooperation but pushes unemployment to approximately 7 percent (vs. full employment under cooperation at that shock size).

Q5. How do competitive appreciations differ from competitive depreciations, and what are the scope conditions?

Competitive appreciations are the mirror image of competitive depreciations (which characterized the Great Depression and the aftermath of the 2008 GFC): in both cases each country uses its monetary policy to shift costs abroad, but the direction differs — depreciations arise during periods of weak global demand when countries try to steal demand from neighbors; appreciations arise during periods of global tradable goods scarcity and high inflation when countries try to export inflation. The structural difference is the initial state: competitive depreciations occur when global aggregate demand is deficient and the zero lower bound binds — each country wants to depreciate to boost exports; competitive appreciations occur when global demand for tradables is strong relative to supply (ω₀ > ω) and inflation is high — each country wants to appreciate to attract capital inflows that buffer domestic employment from disinflation. The key asymmetry is the direction of the international spillover: in the depreciation case, countries export demand; in the appreciation case, countries export inflation costs. The gains from cooperation in both cases arise for the same reason — the Nash equilibrium involves globally excessive monetary tightening or loosening relative to the cooperative benchmark — but the policy recommendation is opposite in sign.

Q6. What do the model’s predictions imply for the COVID-19 episode and the 1980s disinflation, and when do gains from cooperation materialize?

Gains from monetary cooperation arise only when condition (28) is violated — when central banks are willing to sacrifice full employment to fight inflation; for the COVID-19 episode, gains were likely small (labor markets remained strong throughout); for the 1980s synchronized tightening, the model implies positive gains that would have been achievable through international cooperation. For the COVID-19 episode: throughout the 2021–2023 inflation cycle, unemployment rates in advanced economies remained low and fiscal support maintained aggregate demand, suggesting monetary policy did not sacrifice employment — the model implies condition (28) did not bind and the cooperative optimum was approximately achieved. The world “escaped competitive appreciations this time.” For the 1980s disinflation: the synchronized monetary tightening under Volcker (US), Bundesbank (Germany), and others was accompanied by a deep global recession and explicitly prioritized inflation reduction over employment — precisely the conditions under which condition (28) binds and competitive appreciations generate a suboptimal outcome. These dynamics motivated the heated international cooperation debates of the period, culminating in the Plaza Accord of 1985 (Sachs 1985; Frankel 2015). The model also applies to negative tradable supply shocks (supply chain disruptions, tariffs) in Supplemental Appendix E, so its predictions about cooperation gains extend to protectionist-driven scarcity.

Key concepts

demand reallocation shock : a shift in the preference weight on tradable goods (ω₀ > ω) that reduces nontradable demand relative to tradable demand; in the model it corresponds to a structural demand shift toward durables and goods (as observed during the COVID-19 recovery), creating simultaneous inflationary pressure in tradables and deflationary pressure in nontradables, and generating an inflation-unemployment tradeoff absent in standard cost-push formulations.

convex tradable supply : the feature of the tradable sector (parameterized by α > 0) whereby supply is upward-sloping due to capacity constraints — a 1% rise in the tradable goods price P^T raises tradable output by (1−α)/α percent; calibrated to α = 0.64 (implying a 0.57 price-output elasticity) following Boehm and Pandalai-Nayar (2022) for sectors at high capacity utilization; without this feature, tradable supply would be perfectly elastic and the inflation-unemployment tradeoff would disappear.

competitive appreciations : the Nash equilibrium coordination failure in which each national central bank hikes its policy rate to attract capital inflows (reducing domestic disinflation costs), generating nominal exchange rate appreciation; since all countries do this simultaneously, appreciations cancel out in equilibrium, leaving only a globally excessive monetary contraction with lower-than-cooperative inflation and higher-than-cooperative unemployment; mirror image of competitive depreciations but arising from global scarcity (not deficiency) of tradables.

sacrifice ratio : the employment cost per unit of disinflation; reduced in open economies relative to closed economies because capital inflows buffer domestic tradable consumption from drops in domestic tradable output, and sustain nontradable demand; self-oriented central banks perceive a lower sacrifice ratio than a global central bank, which is the source of the competitive appreciation externality.

nominal wage rigidity : the short-run friction that makes demand reallocation shocks costly: with flexible wages, reallocation from nontradable to tradable employment would occur through real wage adjustment alone; with rigid nominal wages, real wages fall only if tradable goods prices rise (inflation), so monetary accommodation is required for structural reallocation without unemployment.

Monetary Policy and Sovereign Risk in Emerging Economies (NK-Default)

Mon, 01 Jan 0001 00:00:00 +0000

Layer 1: Overview

This paper develops a New Keynesian small open economy model with endogenous sovereign default — the NK-Default framework — and uses it to study the interplay between monetary policy and sovereign risk in emerging markets. The core finding is that sovereign default risk amplifies inflation volatility through an expectations channel: when default risk rises, forward-looking firms increase prices in expectation of high future inflation and depressed consumption during a potential default, so that current inflation rises even before any default occurs. Conversely, tight monetary policy disciplines government overborrowing by raising the cost of domestic monetary distortions, which the government internalizes by reducing its borrowing. Calibrated to eight emerging-market inflation targeters (Brazil, Chile, Colombia, Mexico, Peru, Philippines, Poland, South Africa) over 2004–2019, the model quantitatively matches the positive comovement of spreads with inflation and nominal rates, and the temporary nature of inflation events (approximately 4.5% inflation spike, 2.3% spread increase, resolved within roughly a year). Counterfactual experiments find that default risk accounts for approximately 50% of both inflation business-cycle volatility and the inflation increase during these events, and that a 1% tighter monetary policy would reduce spreads by about 0.3% during inflation events. An interest rate rule augmented to respond to default risk dominates strict inflation targeting in welfare and reduces mean spreads by 2.2 percentage points; strict inflation targeting is not the optimal monetary regime when sovereign risk is present.

In depth

Q1. What is the structural architecture of the NK-Default model?

The NK-Default framework combines the standard New Keynesian small open economy model of Gali and Monacelli (2005) with the Eaton-Gersovitz (1981) sovereign default structure extended to long-term foreign-currency debt, producing a model in which households, firms, a monetary authority, and a fiscal government all interact. Households consume domestic and foreign goods and supply labor; intermediate goods producers are monopolistically competitive and set prices subject to Rotemberg (1982) quadratic adjustment costs, generating a forward-looking New Keynesian Phillips Curve (NKPC); the monetary authority follows a nominal interest rate rule targeting domestic goods inflation; and the government borrows internationally in long-term foreign-currency perpetuity bonds, choosing each period whether to repay or default, with default leading to temporary exclusion from international financial markets and a transitory productivity reduction. The bond price schedule compensates risk-neutral international lenders for expected losses from default and falls with the government’s indebtedness. A key methodological choice is the use of global solution methods rather than local approximations, because the nonlinear dynamics around default are central to the mechanisms.

Q2. What is the default amplification mechanism, and how does it transmit to inflation?

Default amplification operates through an expectations channel encoded in the forward-looking NKPC: when default risk rises, firms’ expectations of higher future inflation (during the inflation that would accompany a default event) and lower future consumption (because default depresses productivity and restricts borrowing) both increase, causing firms to raise current prices, generating current inflation without any contemporaneous policy change. Formally, the NKPC relates current inflation π to a unit-cost term and to the expectation term E[Y’u’_C(π’-π)π’], which increases with default risk because default states feature high inflation and high marginal utility. The resulting current inflation increase then triggers the monetary authority’s interest rate rule to tighten, which in turn depresses consumption through the Euler equation, amplifying the monetary distortion (wedge). In the simplified quasi-linear preferences setting, higher borrowing B’ increases functions F and M — the expectation terms in the NKPC and Euler equation — and Proposition 1 establishes formally that higher borrowing raises default risk, inflation, the nominal domestic rate, and the monetary wedge under Assumption 1.

Q3. How does monetary policy discipline sovereign borrowing?

Tight monetary policy disciplines government overborrowing because the government internalizes the additional costs that monetary distortions impose on the economy: when the monetary authority raises interest rates, the resulting monetary wedge — the gap between the marginal product of labor and households’ marginal rate of substitution — acts as an additional cost on borrowing from the government’s perspective, discouraging excessive debt accumulation. Proposition 2 establishes this formally: under Assumption 2 (one-time deviation from constrained efficiency), a policy rate i > i_ST (above the strict-inflation-targeting rate) generates a positive monetary wedge that modifies the government’s optimal borrowing condition with an additional term reflecting the cost to the sovereign of the higher wedge its borrowing induces. Contractionary monetary policy thus reduces the incentive to borrow and lowers equilibrium default risk. The paper also derives Proposition 3: a default-risk monetary rule of the form i = ī·Φ^αD can achieve the constrained-efficient default risk and an arbitrarily small monetary wedge simultaneously, by choosing αD appropriately — meaning that targeting default risk can address both the pricing friction and the overborrowing incentive.

Q4. What are the quantitative findings on default amplification and the disciplining mechanism?

Quantitatively, default risk accounts for approximately 50% of inflation business-cycle volatility and approximately 50% of the inflation increase during the temporary inflation events (4.5 p.p. inflation spike, 2.3 p.p. spread increase, nominal rate rise from baseline 5–6% to 8–9%), based on comparison with a reference model without default. For the disciplining mechanism, panel-data regressions using monetary policy shocks recovered from estimated Taylor rules across the eight countries find that a 1% contractionary monetary shock reduces sovereign spreads, consistent with model predictions. During the inflation events, a 1% tighter monetary policy would have reduced spreads by approximately 0.3 percentage points. Comparing alternative monetary policy regimes against strict inflation targeting (which implements flexible-price allocation): the baseline interest rate rule (responding only to inflation) reduces mean spreads by 0.5 percentage points relative to strict inflation targeting; an augmented rule that also responds to default risk reduces mean spreads by 2.2 percentage points. Welfare under the baseline rule exceeds that under strict inflation targeting, and welfare under the default-risk rule exceeds both, with the ranking holding across all robustness extensions.

Q5. How does the model fit the data across targeted and untargeted moments?

The model is calibrated to match key business-cycle statistics of the eight emerging-market inflation targeters and successfully replicates several untargeted moments, including the positive correlations of spreads with inflation (mean 0.5 across countries in data) and nominal rates (mean 0.3), the relative volatility of inflation to output (mean 0.8), and the mean spread level of approximately 2%. The temporary inflation events — constructed as windows around periods of elevated inflation — are matched with a combination of low productivity shocks and expansionary monetary shocks, and the model’s impulse response functions for inflation, output, nominal rates, and spreads during these events align with the empirical paths. The model also fits the positive elasticity of inflation expectations to default risk and the negative elasticity of spreads to monetary policy shocks, both of which are estimated from data and used as untargeted validation moments. Structurally, the model is parameterized to match the mean and volatility of inflation, spreads, and the correlation of spreads with output (mean -0.5 across countries), among other moments.

Q6. How do the model’s results hold up across extensions, especially local currency debt and discretionary monetary policy?

The main results — default amplifies inflation, tight monetary policy disciplines borrowing, and the default-risk rule dominates strict inflation targeting — are robust across all extension economies, including the case of local currency sovereign debt, alternative default costs (no productivity loss, endogenous domestic financial frictions), and loose monetary policy during defaults. In the local currency debt extension, which introduces the classic incentive to erode debt via inflation, the paper shows that monetary discretion delivers substantially worse outcomes: average inflation doubles relative to the commitment case and — crucially — sovereign spreads also double under discretion, because market participants anticipate the inflationary incentive. This result shows that the disciplining benefits of commitment in monetary policy rules extend to the sovereign debt dimension: the country’s ability to commit to a rule lowers spreads by reducing the expected future inflation that lenders must be compensated for. The endogenous financial frictions extension — in which banking sector health depends on nominal rates and spreads — generates similar monetary-fiscal interactions, confirming that the mechanisms are not specific to the productivity-cost assumption.

Q7. What is the paper’s relationship to the literature on nominal rigidities and sovereign default?

The NK-Default framework differs critically from related papers that introduce downward nominal wage rigidity (e.g., Na, Schmitt-Grohe, Uribe, Yue 2018; Bianchi, Ottonello, Presno 2023) in that price-setting frictions arise from optimal forward-looking pricing by monopolistically competitive firms under Rotemberg costs, not from a mechanical wage floor, so that inflation expectations matter for current inflation and output in a standard NKPC. This means that expected future default events — through their effects on expected inflation and expected marginal utility — transmit to current equilibrium in a way that downward-rigid-wage models cannot replicate. The paper also differs from the literature studying the inflation incentive for local-currency debt dilution (e.g., Calvo 1988; Du, Pflueger, Schreger 2020): the baseline model assumes foreign-currency debt and a rule-based monetary authority that has no incentive to inflate away debt, so the mechanisms operate through expectations and discipline rather than through the debt-erosion channel. The paper connects these strands in the local-currency extension.

Q8. What are the welfare and policy implications for central bank mandates in emerging markets?

The paper provides formal support for monetary policy rules that respond to financial or sovereign-risk conditions — beyond standard inflation targeting — in emerging economies: the welfare ranking is default-risk rule > baseline rule > strict inflation targeting, with the gap between the default-risk rule and strict inflation targeting driven by lower mean and volatility of spreads, which reduce the frequency and severity of default amplification events. Strict inflation targeting, which delivers the flexible-price allocation, is not optimal because it leaves the overborrowing incentive of the fiscal government unchecked, generating excessive default risk that feeds back into inflation volatility through the expectations channel. A monetary rule with sufficient responsiveness to inflation or to default risk disciplines fiscal behavior and reduces welfare costs from both pricing frictions and default risk, suggesting that emerging-market central bank mandates that focus exclusively on inflation targeting at the expense of financial stability considerations may be suboptimal relative to rules that jointly address monetary and fiscal distortions.

Key Concepts

NK-Default framework: the paper’s model combining a New Keynesian small open economy (Gali-Monacelli structure with Rotemberg price-setting frictions and a Taylor-type interest rate rule) with the Eaton-Gersovitz endogenous sovereign default structure extended to long-term foreign-currency perpetuity bonds; the joint treatment of monetary policy and sovereign risk for emerging economies.
default amplification: the mechanism by which elevated sovereign default risk increases current inflation and depresses output through the forward-looking NKPC expectations channel: firms raise prices in anticipation of high future inflation and low consumption during a potential default, so current inflation rises even without any contemporaneous fiscal action; established as Proposition 1 in the simplified model and confirmed quantitatively.
monetary discipline: the mechanism by which contractionary monetary policy raises the cost of government borrowing through monetary distortions (the monetary wedge), inducing the fiscal government to reduce its indebtedness and thereby lowering equilibrium default risk; established as Proposition 2 and confirmed empirically using panel-data regressions of spreads on monetary policy shocks.
monetary wedge: the deviation of the marginal product of labor from households’ marginal rate of substitution between labor and consumption, arising from price-setting frictions; serves as the quantitative measure of monetary distortions and is the channel through which monetary policy affects government borrowing incentives.
default-risk monetary rule: an interest rate rule of the form i = ī·Φ^αD that responds directly to the one-period-ahead default probability Φ; shown in Proposition 3 to achieve both the constrained-efficient level of government debt and an arbitrarily small monetary wedge simultaneously, by incorporating an additional cost of borrowing for the fiscal government through the rule’s response.
temporary inflation events: empirical regularities in eight emerging-market inflation targeters in which inflation, spreads, and nominal policy rates temporarily spike together (inflation rises approximately 4.5%, spreads by 2.3%, within roughly one year) before reverting to lower levels; the model replicates these patterns using a combination of low productivity shocks and expansionary monetary shocks.

Monetary policy in open economies with production networks

Mon, 01 Jan 0001 00:00:00 +0000

This paper studies the design of monetary policy in a multi-sector small open economy with domestic input-output linkages and cross-border production networks, under nominal price rigidities in domestic sectors. The main result is that the monetary policy that closes the domestic output gap is nearly optimal, and it is implemented by stabilizing an aggregate inflation index that weights each sector’s inflation by its role as a supplier of inputs and a net exporter within the international production network. Sectors with small direct or indirect import shares receive large weight in the index; ignoring cross-border linkages leads monetary policy to overemphasize inflation in sectors that are intensive exporters directly or indirectly through downstream sectors. Three channels link sectoral markup wedges to the aggregate output gap: the CPI channel (present in closed economies too) and the net export income and net profit income channels (unique to open economies with cross-border linkages). Using the World Input-Output Database, the output-gap-closing policy is shown to outperform alternatives that abstract from economic openness or input-output linkages.

In depth

Q1. What is the output gap monetary policy and how is it implemented?

The output gap (OG) monetary policy stabilizes the aggregate inflation index proportional to the aggregate output gap—defined as the difference between output in the sticky-price equilibrium and in the efficient flexible-price equilibrium—by weighting each sector’s inflation according to the product of its price rigidity and its OG weight. The price rigidity component maps positive sectoral inflation into a negative sectoral markup wedge under nominal rigidities; the OG weight measures the sector’s contribution to the aggregate output gap through domestic and cross-border network linkages. This policy eliminates first-order aggregate distortions and is shown to be nearly optimal.

Q2. What are the CPI, net export income, and net profit income channels?

Three channels link a negative sectoral markup wedge to a positive aggregate output gap: the CPI channel (lower domestic prices raise real factor prices and stimulate supply), the net export income channel (lower domestic prices increase net exports and domestic labor income), and the net profit income channel (two opposing effects: lower prices increase net export profits but also raise the cost of imported inputs). The CPI channel operates in closed economies as well, while the net export income and net profit income channels are unique to open economies with cross-border input-output linkages.

Q3. Why does ignoring cross-border linkages lead monetary policy to overweight intensive exporters?

Failing to account for cross-border production networks causes monetary policy to overemphasize inflation in sectors that export intensively directly and indirectly—because the net profit income channel, which reduces the OG contribution of intensive exporters by raising the cost of their imported inputs, is omitted when the economy is treated as closed. Without the cross-border linkages, intensive direct or indirect exporting sectors appear to have larger aggregate output gap contributions through domestic channels alone, causing the aggregate inflation index to over-weight those sectors.

Q4. How do the results relate to existing monetary policy frameworks?

The paper bridges the multi-sector closed-economy result that optimal policy targets a Domar-weighted aggregate inflation index and the one-sector open-economy result that optimal policy trades off domestic inflation against terms-of-trade distortions, showing that cross-border input-output linkages modify the Domar weights through the net export income and net profit income channels. In the limit where all sectors have no cross-border linkages, the OG weights reduce to Domar weights; the one-sector open economy policy prescription is a special case of the general framework.

Q5. What is the empirical validation?

Using the World Input-Output Database, the paper computes the theoretical sectoral OG weights for actual economies and shows that the OG monetary policy outperforms alternative policies that ignore either economic openness or input-output linkages. The database provides cross-country cross-sector data on intermediate input flows that allow computation of the model’s OG weight formulas for real economies.

Key concepts

output gap (OG) monetary policy : the monetary policy that closes the aggregate output gap (difference between sticky-price and efficient flexible-price output), implemented by stabilizing the network-weighted aggregate inflation index; shown to be nearly optimal in the open-economy production network framework.

sectoral OG weight : the weight assigned to a sector’s inflation in the aggregate inflation index under OG monetary policy; measures the sector’s contribution to the aggregate output gap through the CPI, net export income, and net profit income channels; differs from the Domar weight in open economies due to cross-border linkages.

Domar weight : the ratio of a sector’s gross output to GDP; the weight used in the closed-economy multi-sector optimal inflation index literature; coincides with the OG weight when there are no cross-border production linkages.

labor wedge : a weighted average of sectoral markup wedges proportional to the aggregate output gap; the monetary policy target in the OG framework.

efficiency wedge : a weighted average of exogenous sectoral shocks; determines the efficient flexible-price equilibrium; independent of sectoral markup wedges at first order, so the OG policy can separately close the aggregate distortions caused by markup wedges.

Monetary policy trade-offs amid global supply chain disruptions

Mon, 01 Jan 0001 00:00:00 +0000

This paper employs a proxy structural VAR model to examine the effects of global supply chain (GSC) shocks on U.S. macroeconomic variables and the Federal Reserve’s historical response, and evaluates two counterfactual monetary policy rules using the COVID-19 episode. Large fiscal stimulus amplifies inflation while cushioning the output downturn from GSC shocks. Historically, the Fed adopted a loose stance, looking through price surges from supply chain disruptions. The first counterfactual—which stabilizes inflation—entails less accommodation and yields a more favorable inflation-output trade-off, reflecting greater price flexibility and limited output losses. The second counterfactual—which minimizes a dual-mandate loss function—calls for greater initial easing; under inflation targeting (IT) this involves moderate accommodation, while under average inflation targeting (AIT) the looser initial policy generates more persistent inflation and ultimately requires a contractionary response, worsening the trade-off.

In depth

Q1. What is the empirical strategy?

The paper estimates a proxy structural VAR model that identifies GSC shocks using the news-based Supply Bottleneck Index (SBI) of Burriel et al. (2024) as a proxy, then evaluates the Fed’s historical response to those shocks and two counterfactual policy rules that substitute for the historical stance. The proxy SVAR approach identifies the GSC shock’s impulse response function and then traces the macroeconomic dynamics that would have obtained under alternative policy rules, holding the non-policy shocks at their historical values. The SBI captures sudden decreases in supply chain functioning from natural disasters, geopolitical events, strikes, and pandemics.

Q2. What is the role of fiscal stimulus in amplifying GSC shock effects?

Large fiscal stimulus—such as the U.S. transfers and spending during COVID-19—amplifies the inflationary impact of GSC shocks while cushioning the output downturn; the interaction between supply disruptions and fiscal expansion is thus an important determinant of the inflation-output dynamics. Without the large fiscal stimulus, GSC shocks would generate the standard supply-shock trade-off with less amplified inflation. With stimulus, the combination of higher aggregate demand (from fiscal transfers) and reduced aggregate supply (from GSC disruptions) creates a strongly inflationary environment.

Q3. What does the first counterfactual (inflation-stabilizing policy) show?

The counterfactual that stabilizes inflation requires less monetary accommodation than the historical stance and yields a more favorable inflation-output trade-off, suggesting that the Fed’s historical ’look-through’ approach was suboptimal given the interaction with fiscal stimulus. The intuition is that earlier and firmer monetary tightening in response to GSC-driven inflation would have reduced inflation expectations pass-through and prevented a larger buildup of price pressures, while the output cost of that tighter stance was limited by the greater price flexibility the model identifies in this environment.

Q4. What is the comparison between IT and AIT in the second counterfactual?

The second counterfactual calls for greater initial easing than the historical stance; under inflation targeting (IT) this involves moderate accommodation, while average inflation targeting (AIT) implies an even looser initial policy that generates more persistent inflation and ultimately requires a contractionary response, worsening the inflation-output trade-off relative to IT. The AIT result reflects the design of that framework: making up for periods of below-target inflation with above-target periods creates a commitment to easy policy even when supply-side inflationary pressures are elevated, producing a worse outcome when supply shocks drive inflation above target.

Key concepts

proxy structural VAR : a structural VAR identified using an external instrument (the proxy variable) that is correlated with the structural shock of interest but uncorrelated with other shocks; used here to identify GSC shocks using the Supply Bottleneck Index. global supply chain (GSC) shock : a sudden decrease in the supply provision or functioning of supply chains stemming from adverse events (natural disasters, pandemics, geopolitical events); identified in this paper as acting like supply shocks, lowering output and raising prices. average inflation targeting (AIT) : a monetary policy framework in which the central bank targets the average rate of inflation over time, implying accommodation of below-target periods with above-target periods; shown here to imply looser initial policy and more persistent inflation in response to supply shocks, worsening the trade-off relative to standard IT.

Motivating banks to lend? Credit spillover effects of the Main Street Lending Program

Mon, 01 Jan 0001 00:00:00 +0000

Overview

Research Question. Minoiu, Zarutskie, and Zlate ask whether participation in the Main Street Lending Program (MSLP)—a Federal Reserve emergency facility launched in mid-2020 to channel credit to small and mid-sized firms during the COVID-19 pandemic—caused banks to lend more outside the program. The authors focus on credit spillover effects: did MSLP-participating banks ease standards and expand volumes on their general commercial and industrial (C&I) loan books, beyond the direct loans originated under the program itself?

Institutional Context. The MSLP opened for lender registration on June 15, 2020 and began accepting loan submissions on July 6, 2020, expiring December 31, 2020. Of $600 billion in available SPV capacity, only $16.05 billion was actually deployed, making overall take-up approximately 2.7% of capacity. Despite this, the program required participating banks to retain 5% of each loan’s credit risk while offloading 95% to the SPV, and charged borrowers LIBOR plus 300 bps. Registration rate among all Call Report banks was 11.7% (614 out of 5,242 banks), with participation rising steeply with bank size: from 6.5% of banks in the below-$1-billion asset group to 63.8% of banks with assets above $50 billion.

Data and Methodology. The analysis draws on multiple data sources: (a) supervisory Y-14Q H1 loan-level data covering C&I loans above $1 million commitments, reported by 32 bank holding companies (BHCs) that account for roughly three-quarters of total U.S. C&I loans; (b) Y-14Q A9 loan portfolio segment data for small business C&I loans (below $1 million commitments) from 22 BHCs; (c) quarterly Senior Loan Officer Opinion Survey (SLOOS) microdata for April, July, and October 2020, providing bank-level assessments of lending standard changes, loan terms, demand shifts, and stated reasons for tightening; (d) Dealscan syndicated loan originations for 262 banks (51 MSLP participants); and (e) bank balance sheet data from Call Reports, including the Ellul-Yerramilli risk management index (RMI) for 16 BHCs. The core empirical design is a difference-in-differences (DiD) comparing MSLP-participating vs. non-participating banks before (2020:Q1–Q2) and after (2020:Q3) program implementation. To address nonrandom selection, the authors instrument MSLP participation with three variables: (i) a dummy for banks that cited registration as “too burdensome” in the September 2020 supplementary SLOOS; (ii) a dummy for banks with prior experience pledging loan collateral at the Fed’s discount window; and (iii) a dummy for banks with prior experience pledging securities collateral at the discount window. Firm×quarter fixed effects absorb time-varying credit demand at the borrower level (Khwaja-Mian design), and bank×borrower fixed effects further control for relationship-specific lending patterns.

Main Findings — Extensive Margin (Large Business Loans). In the Y-14Q H1 data, MSLP banks were 30–32% more likely to renew existing loans than non-MSLP banks in 2020:Q3, with the probability of renewal 1.6–1.7 percentage points higher (against a sample average renewal rate of 5.3%). New loan originations were 22–27% more likely at MSLP banks, or 1.1–1.4 percentage points higher (against a sample average origination rate of 5.1%). 2SLS estimates are similar in magnitude to OLS, indicating selection bias is modest.

Main Findings — Extensive Margin (Small Business Loans and Survey Data). In the A9 small business segment data, MSLP lenders had 17.3% more small business loan accounts outstanding in 2020:Q3 than non-MSLP banks. In SLOOS microdata, MSLP banks were approximately 15 percentage points less likely to report tightening C&I lending standards in 2020:Q3 (conditional on demand controls), compared to an actual tightening rate of 37.5%. This effect is larger for small (more financially constrained) firms (16–17 percentage points) than for large firms (13–14 percentage points).

Main Findings — Intensive Margin. On loan terms, MSLP banks charged spreads that were approximately 9 basis points lower on renewed/originated C&I loans in the Y-14Q data, and 13.5 basis points lower in the Dealscan syndicated loan sample, compared to non-MSLP banks in 2020:Q3. 2SLS estimates are somewhat larger (19–30 bps). In the Dealscan sample, MSLP banks also extended syndicated loans that were 11.2% larger (about $2.4 million more given a $22 million average loan size). Survey data confirm MSLP banks were less likely to tighten most individual loan terms.

Aggregate Magnitude. The authors estimate that, in the absence of the MSLP, total loan renewals and originations at Y-14Q reporting banks in 2020:Q3 would have been approximately 10% lower. Scaling to the broader banking sector, the estimated credit spillover effect is approximately $44.8 billion in C&I lending—nearly three times the $16.05 billion in direct MSLP loan purchases.

Mechanism. Survey and objective evidence both point to reduced risk aversion as the primary channel, rather than immediate balance sheet constraint relief. MSLP banks were significantly less likely to cite “reduced tolerance for risk” as a reason for tightening lending standards after the program’s introduction, while showing no differential propensity to cite capital or liquidity deterioration. Banks with higher risk management index scores (more risk-averse institutions) exhibited larger spillover effects on two of three lending margins. Indicators of immediate balance sheet tightness (excess capital cushions, cost of capital, core deposit reliance) do not predict larger spillovers, with a partial exception for lower excess capital and higher loan loss reserves — suggesting future rather than current balance sheet constraints may have played some role.

Scope Conditions and Robustness. The backstop mechanism is explicitly tied to the program’s credibility period: the spillover effects are smaller in 2020:Q4, consistent with the Treasury’s November 19, 2020 announcement that the program would not be extended, which diminished its backstop role. Placebo regressions using 2018 and 2019 data find no differential lending behavior between MSLP and non-MSLP banks before the program, supporting parallel trends. Results are robust to controls for PPP participation, credit line drawdown exposure, loan loss provisioning, and bank-level loan portfolio cyclicality.

In depth

Q1. What precisely is the “spillover effect” that the paper measures, and how does it differ from the direct effect of the MSLP?

A: The direct effect is the $16.05 billion in MSLP loans purchased by the SPV — credit extended specifically through the program. The spillover effect refers to changes in banks’ general C&I lending behavior outside the program: renewals and originations of non-MSLP loans, changes in lending standards and terms for all business borrowers, and changes in small business loan volumes. The sample in the Y-14Q regression explicitly excludes MSLP loans themselves, so the estimates reflect only the indirect, broader credit effects.

Q2. What instruments does the paper use for MSLP participation, and why are they plausibly exogenous?

A: Three IVs are employed: (1) a dummy for banks that cited program registration as “too burdensome” as a very important reason for not joining (from the September 2020 supplementary SLOOS); (2) a dummy for banks that pledged loan collateral at the Fed’s discount window in December 2019; and (3) a dummy for banks that pledged securities collateral at the discount window in the same period. The exclusion restriction argument is that (1) reflects banks’ administrative capacity and prior Fed engagement rather than underlying balance sheet strength or lending appetite, and that (2) and (3) reflect familiarity with Fed collateral processes in ways that made a loan-based program easier to understand and join — without independently affecting lending standards or volumes in 2020:Q3.

Q3. How large are the spillover effects on the extensive margin of large corporate lending?

A: In the Y-14Q H1 data across 32 BHCs, MSLP banks renewed loans 1.6–1.7 percentage points more frequently and originated new loans 1.1–1.4 percentage points more frequently in 2020:Q3, relative to non-MSLP banks. Against sample averages of 5.3% renewal rate and 5.1% origination rate, these translate to MSLP banks being 30–32% more likely to renew and 22–27% more likely to originate loans. The 2SLS estimates are broadly similar in magnitude, suggesting that self-selection bias in OLS is limited.

Q4. What are the estimated aggregate dollar spillovers from the MSLP?

A: The paper calculates that, in the absence of the program, total loan renewals and originations at Y-14Q H1 MSLP banks in 2020:Q3 would have been lower by approximately $33.6 billion (derived from 44,274 bank-borrower pairs × 1.38 existing loans per pair × 3.06 percentage points of extra loan activity × $17.98 million average loan size). Scaling to all Y-14Q banks (MSLP and non-MSLP alike), the shortfall would represent roughly a 10% reduction in total 2020:Q3 loan renewals and originations. Extrapolating to the full banking sector (since Y-14Q banks cover about 75% of total C&I lending), and assuming similar spillover magnitudes for banks outside the sample, total MSLP spillovers amount to roughly $44.8 billion — approximately three times the $16.05 billion in direct MSLP loan purchases.

Q5. What is the estimated effect on C&I lending standards using survey data?

A: Using SLOOS microdata, the paper estimates that MSLP banks were approximately 15 percentage points less likely to tighten C&I lending standards in 2020:Q3 compared to non-MSLP banks, after controlling for demand conditions. The actual tightening rate in 2020:Q3 was 37.5%, meaning the counterfactual tightening rate absent the program would have been approximately 5 percentage points higher. In a further hypothetical where all SLOOS sample banks had participated, the counterfactual tightening rate would have been nearly 10 percentage points higher than actual.

Q6. Are spillover effects larger for small or large borrowers, and what does this imply?

A: The SLOOS-based estimates show that MSLP banks were 16–17 percentage points less likely to tighten lending standards for small firms (annual sales below $50 million), compared to 13–14 percentage points less likely for large and middle-market firms — a statistically significant difference. The authors interpret this as consistent with the MSLP reducing risk aversion broadly, with the largest effect on borrowers facing greater credit constraints where uncertainty about creditworthiness was highest.

Q7. What evidence supports the risk aversion (psychological backstop) mechanism over the balance sheet constraint mechanism?

A: From SLOOS data, MSLP banks were significantly less likely (at the 1% level) to cite “reduced tolerance for risk” as a reason for tightening lending standards after the program’s introduction, while showing no differential likelihood of citing deteriorating capital or liquidity positions as reasons. Furthermore, splitting banks by the risk management index (RMI), the spillover effects are stronger for high-RMI (more risk-averse) banks on two of three lending outcomes. Conversely, proxies for immediate balance sheet constraints — excess capital cushions, core deposit ratios, equity issuance, and cost of capital — do not yield consistently stronger spillover effects for more constrained banks. The only partial exception is lower excess capital and higher loan loss reserves, which are associated with more loan renewals, suggesting future rather than current balance sheet constraints may have contributed.

Q8. What is the risk management index (RMI), and how is it used here?

A: The RMI is an index developed by Ellul and Yerramilli (2013) that captures the strength of a bank’s internal risk management function, constructed from variables including whether the bank has a chief risk officer (CRO), the CRO’s executive status and relative compensation, risk committee member experience, and meeting frequency. Available for 61 BHCs over 2011–2013, it is matched to 16 BHCs in the Y-14Q H1 sample and used as a pre-COVID proxy for institutional risk aversion. Banks above the median RMI show larger MSLP spillover effects on loan renewals and tightening standards, consistent with the interpretation that the MSLP reduced effective risk aversion more for banks that had higher baseline risk-consciousness.

Q9. How do the authors address the concern that PPP participation — not MSLP participation — might drive the results?

A: First, they test directly that MSLP participation does not predict outstanding PPP/federally-guaranteed loan balances (in Q2 or Q3 2020) in the A9 loan segment data, finding no correlation. Second, they add an interaction of PPP loan balances (divided by total assets) × Post to the baseline regression in Table A10 and find that while PPP lending is positively associated with loan renewals and originations, the MSLP bank × Post coefficient remains statistically significant and similar in magnitude to the baseline, ruling out PPP participation as the driver of the baseline results.

Q10. What explains the low take-up of the MSLP despite its large designed capacity?

A: Survey responses from the September 2020 supplementary SLOOS indicate several demand- and supply-side constraints: banks reported they could generally meet credit demand outside the program; borrower leverage limits (capped at 4–6× EBITDA depending on facility) were seen as too restrictive; the LIBOR plus 300 bps interest rate was high relative to historical pricing for eligible firms; and registration and loss-sharing arrangements were viewed as burdensome and uncertain. The paper interprets these findings as consistent with banks treating the MSLP primarily as a backstop — a facility they would activate only if economic conditions deteriorated significantly — rather than a primary lending channel.

Q11. How does the paper address the threat that MSLP participation reflects bank-level cyclicality in loan portfolios?

A: Table 10 controls for bank-specific C&I loan portfolio cyclicality, measured as the correlation between each bank’s C&I loan growth and aggregate banking-sector C&I loan growth estimated over 1985:Q1–2021:Q2 using two functional forms. The MSLP bank × Post coefficient estimates remain very similar to the baseline after including these controls, ruling out the concern that MSLP participants were simply banks with naturally more procyclical or countercyclical lending patterns.

Q12. What happens to the estimated spillover effects in 2020:Q4, and what does this reveal?

A: The paper shows (Table A6) that extending the sample to include 2020:Q4 yields somewhat smaller estimated spillover effects than in the baseline 2020:Q3 period. The authors attribute this to the November 19, 2020 announcement by Treasury Secretary Mnuchin that the MSLP would not be extended beyond year-end, which effectively ended the program’s backstop role and — consistent with the psychological backstop mechanism — reduced banks’ confidence in the program’s future availability and thus the spillover motivation.

Q13. Does the paper find spillover effects on intensive margin loan terms, and how large are they?

A: On loan spreads, MSLP banks charged approximately 9 basis points lower spreads on floating-rate C&I loans renewed or originated in 2020:Q3 in the Y-14Q data (2SLS: 19 bps), and approximately 13.5 bps lower spreads in the Dealscan syndicated loan sample (2SLS: 30 bps). The 9 bps OLS estimate implies the average spread across all LIBOR-indexed C&I loans in 2020:Q3 would have been approximately 4 bps higher absent the program (i.e., 0.43 × 9 bps), relative to an actual average spread of 235 bps — an effect the authors characterize as economically small. On loan size, the Dealscan evidence indicates MSLP banks extended syndicated loans that were 11.2% larger (2SLS: 25% larger).

Key Concepts

Credit Spillover Effects: As used in this paper, spillover effects refer to the impact of MSLP participation on participating banks’ lending behavior outside and beyond the program itself — specifically, changes in loan renewal rates, new loan origination rates, lending standards, and loan terms for non-MSLP C&I loans. This is distinct from the direct effect (i.e., loans originated through the MSLP proper).

Psychological Backstop: The paper’s term for the mechanism by which the MSLP reduced participating banks’ effective risk aversion without necessarily easing their immediate balance sheet constraints. By committing to provide lending support if conditions deteriorated, the program built banks’ confidence to lend ex ante, functioning as “insurance” against bad outcomes rather than a direct funding facility. The mechanism is distinguished from balance sheet easing by the fact that constrained and unconstrained banks exhibited similar spillover effects.

Extensive Margin of Lending: The binary dimension of lending activity — specifically, whether a bank renews an existing loan or originates a new loan within a bank-borrower pair. In this paper, measured as the share of existing loan commitments within each bank-borrower pair that are renewed or newly originated each quarter. Contrasted with the intensive margin.

Intensive Margin of Lending: The quantitative dimension of existing lending relationships — specifically, the average loan size and average spread on loans renewed or originated in a given period, conditional on a loan being extended.

Senior Loan Officer Opinion Survey (SLOOS): A quarterly Federal Reserve survey of senior lending officers at large U.S. banks covering self-reported changes in C&I lending standards, terms (including spreads, maximum loan size, maturity, covenants, collateral requirements), demand conditions, and — in supplementary editions — reasons for changing standards. Used in this paper both as an outcome variable (tightening standards) and as a control variable (changes in loan demand) and as a source of IV variation (burden of MSLP registration).

Risk Management Index (RMI): An index developed by Ellul and Yerramilli (2013) measuring the strength of a bank’s internal risk management function, combining information on the presence and compensation of a chief risk officer, risk committee composition, and meeting frequency. Used in this paper as a pre-pandemic proxy for institutional risk aversion to test whether the MSLP disproportionately reduced risk aversion in banks with stronger risk controls.

Difference-in-Differences with Granular Fixed Effects: The primary identification strategy, comparing changes in lending outcomes between MSLP-participating and non-participating banks before (2020:Q1–Q2) and after (2020:Q3) program implementation. The paper uses firm×quarter fixed effects following Khwaja and Mian (2008) to absorb borrower-level credit demand, and bank×borrower fixed effects following Chodorow-Reich (2013) to absorb relationship-specific supply factors — isolating the bank credit supply effect attributable to MSLP participation.

Originate-and-Distribute Feature (of MSLP): The MSLP’s design in which banks originate MSLP loans but sell 95% of the credit exposure to the SPV, retaining only 5%. This feature was intended to free up balance sheet capacity for further lending. The paper tests whether this channel (easing current balance sheet constraints) explains the observed spillovers, finding limited support relative to the risk aversion reduction channel.

Multinational production and global shock propagation during the great recession

Mon, 01 Jan 0001 00:00:00 +0000

Mussa Puzzle Redux

Mon, 01 Jan 0001 00:00:00 +0000

The Mussa (1986) puzzle is the empirical observation of a sharp, simultaneous increase in the volatility of both nominal and real exchange rates following the end of the Bretton Woods fixed exchange rate system in 1973 — a fact commonly interpreted as evidence for monetary non-neutrality. This paper resolves the puzzle by developing a model in which the dominant driver of nominal exchange rate fluctuations is a “financial shock” — a shock to the international demand for a country’s assets that is orthogonal to goods market fundamentals. Under a fixed rate, the central bank offsets financial shocks through reserve intervention, preventing them from moving the exchange rate; under a float, financial shocks freely move the nominal and real exchange rate simultaneously. The same framework also reconciles the Meese-Rogoff disconnect (exchange rates are unpredictable from macro fundamentals), the Backus-Smith puzzle, and the forward premium puzzle within a single unified model, with the financial shock accounting for the dominant share of exchange rate variance in each case.

In depth

Q1. What is the financial shock and how does it differ from standard macro shocks?

The financial shock is an orthogonal disturbance to international portfolio demand — the preference of foreign investors for holding domestic versus foreign assets — that is disconnected from productivity, monetary policy, and goods-market conditions. Because it is uncorrelated with macro fundamentals, it generates exchange rate movements without corresponding movements in output, prices, or interest rate differentials, producing the observed disconnect between exchange rates and macro variables.

Q2. Why does the Mussa pattern arise from regime switching?

Under a fixed rate, the central bank absorbs financial shocks via reserve intervention, sterilizing their exchange rate effects; the real exchange rate is equally insulated because the nominal rate is fixed and prices adjust slowly. Under a float, the same financial shocks freely move the nominal exchange rate, and with sticky prices this passes through to the real exchange rate. The variance of the real exchange rate therefore jumps discontinuously at the regime switch, matching the sharp Mussa empirical finding without requiring any change in the shock process.

Q3. How unified is the resolution across exchange rate puzzles?

A single model with the financial shock, sticky prices, and a standard asset pricing kernel simultaneously matches the Mussa pattern (regime-switching real volatility), the Meese-Rogoff disconnect (exchange rates unpredictable from fundamentals), the Backus-Smith puzzle (exchange rates and relative consumption uncorrelated), and the forward premium puzzle (high-interest-rate currencies appreciate). The financial shock accounts for the majority of exchange rate variance in each application.

Key concepts

Mussa puzzle : the discrete jump in real exchange rate volatility at the Bretton Woods breakdown (1973); resolved in this paper as the change in the central bank’s absorption of financial shocks between fixed and floating regimes.

financial shock : a disturbance to international portfolio demand orthogonal to goods-market fundamentals; the paper’s key mechanism for exchange rate disconnect, the Mussa pattern, and several other exchange rate puzzles.

Oil price fluctuations, US banks, and macroprudential policy

Mon, 01 Jan 0001 00:00:00 +0000

This paper estimates the effect of oil price fluctuations on US banking variables using a Bayesian SVAR with sign restrictions following Baumeister and Hamilton (2019). Oil market shocks that lead to a contraction in world economic activity are found to unambiguously lower the amount of bank credit to the US economy, tend to decrease US banks’ net worth, and tend to increase the US credit spread. The effects can be strong and long-lasting or more modest and short-lived, depending on the source of the oil price fluctuation. The effects are found to be stronger for smaller and lower-leveraged banks.

In depth

Q1. What is the empirical strategy?

The paper extends the state-of-the-art oil market SVAR of Baumeister and Hamilton (2019) to incorporate three US banking variables—banks’ net worth, the US credit spread, and the amount of bank credit extended—estimated with monthly data over January 1974 through December 2019. An agnostic approach is taken on sign restrictions for the US banking block: no restrictions are imposed on banking variables beyond those already imposed by Baumeister and Hamilton (2019) on the oil block, so the results for banking variables are driven primarily by data rather than prior restrictions. This extends earlier work that studied oil prices and credit spreads (Abbritti et al., 2020) or oil prices and stock markets (Kilian and Park, 2009) in isolation.

Q2. What is the main finding regarding the effect of oil shocks on banks?

Oil market shocks that lead to a contraction in world economic activity are found to unambiguously lower the amount of bank credit to the US economy, tend to decrease US banks’ net worth, and tend to increase the US credit spread. “Unambiguously” reflects that the sign restrictions impose no prior on the direction of credit’s response, so the finding that credit falls is driven entirely by data. The paper is the first to characterize the effect of oil market shocks on banks’ net worth and to estimate the credit effect within the SVAR framework.

Q3. How do the effects differ by the source of oil price fluctuations?

The effects on banking variables can be strong and long-lasting or more modest and short-lived, depending on the underlying source of the oil price change—reflecting the SVAR framework’s decomposition of oil price movements into distinct structural shocks. The distinction between oil supply shocks, demand shocks driven by global activity, and demand shocks driven by speculative factors implies that shocks of the same sign in the oil price may have different magnitudes and durations of effects on banks, consistent with Kilian (2009)’s decomposition.

Q4. Which banks are most affected?

The effects of oil market shocks on banking variables are found to be stronger for smaller and lower-leveraged banks. Smaller banks may be more exposed to oil-related regional economic downturns through concentrated loan portfolios, while lower-leveraged banks may face different collateral and risk dynamics relative to more highly leveraged peers.

Key concepts

oil market Bayesian SVAR : a structural vector autoregression that uses a Bayesian prior over sign restrictions to identify oil supply shocks, oil demand shocks related to global real activity, and oil-specific demand shocks, following Baumeister and Hamilton (2019); extended here to include US banking variables. credit spread : the difference between yields on corporate bonds or loans and a risk-free reference rate; used as a measure of the credit risk premium and financial conditions in US credit markets.

On measuring the welfare cost of inflation

Mon, 01 Jan 0001 00:00:00 +0000

Measuring the welfare cost of inflation requires specifying a money demand function, a definition of money, and an approach to consumer surplus; existing estimates vary widely because these choices are not standardized. This paper advances the literature by applying neoclassical monetary demand theory that integrates the demand for money with the demands for consumption and leisure, using the Normalized Quadratic (NQ) flexible functional form that avoids imposing specific elasticity assumptions. The main contribution is to extend the Serletis and Xu (2021, 2023) framework to derive Hicksian (compensating variation) money demand functions from the NQ model and compare welfare cost estimates based on these against estimates from the Marshallian (consumer surplus) approach—a comparison not previously made within this integrated demand-system framework. The paper uses U.S. CFS Divisia monetary aggregates across multiple levels of monetary aggregation and finds that the two approaches yield internally consistent but quantitatively different welfare cost estimates, with the Hicksian compensating variation approach providing theoretically preferred measures that are robust across specifications.

In depth

Q1. What is the neoclassical demand system approach and how does it differ from earlier methods?

The Serletis-Xu framework integrates the demand for money with the demands for consumption goods and leisure in a joint utility maximization problem, estimating a flexible NQ functional form in a systems context rather than fitting a single-equation money demand specification. Earlier approaches—such as the log-log specification (Lucas 2000) or semi-log specification (Ireland 2009)—estimate a single money demand equation under a maintained functional form assumption and a fixed interest elasticity (often −0.5 as in the Baumol-Tobin model). The NQ approach, derived from the dual demand system of Diewert (1974), makes no assumption about the functional form of money demand and allows demand interactions among consumption goods, leisure, and money (as recommended by Abbott and Ashenfelter 1976 and Barnett 1979), which is necessary for correct welfare measurement when money is consumed jointly with other goods.

Q2. What is the distinction between the Marshallian and Hicksian approaches to measuring welfare cost?

The Marshallian (Bailey 1956) approach measures the area under the inverse money demand curve between the zero-inflation and positive-inflation nominal interest rates, which corresponds to consumer surplus but does not hold utility constant. The Hicksian (compensating variation) approach measures the income that must be given to the consumer to restore the same utility after the inflation increase as before—holding utility constant rather than income. The Hicksian approach is theoretically preferred because it measures the true welfare loss from inflation under standard consumer theory; the Marshallian approach can under- or over-estimate the true cost depending on income effects. The paper’s main contribution is to derive the Hicksian demands from the NQ model and compute the compensating variation, previously not done within this flexible-functional-form demand system framework.

Q3. What role do Divisia monetary aggregates play?

The paper uses CFS (Center for Financial Stability) Divisia monetary aggregates—which aggregate monetary assets using economic quantity indices that weight components by their monetary service flows—rather than simple-sum aggregates such as M1 or M2. Simple-sum aggregates treat all monetary assets as perfect substitutes regardless of yield differentials, introducing a substitution bias that misrepresents the quantity of monetary services; Divisia aggregates are theoretically consistent with the neoclassical demand system approach used here. The paper reports welfare cost estimates across multiple levels of monetary aggregation to assess sensitivity to the definition of money.

Q4. How do the results compare with the prior literature?

The paper’s estimates, while internally consistent with the NQ flexible form and Divisia aggregates, are in the range of prior estimates in the literature; the Hicksian compensating variation estimates differ from Marshallian consumer surplus estimates in ways consistent with theory, providing a more theoretically grounded benchmark. The wide range of estimates in the existing literature (discussed in the paper’s Table 1)—from the Lucas (2000) log-log model to the Ireland (2009) semi-log model—reflects sensitivity to functional form, money definition, data frequency, and methodology; the paper’s NQ framework addresses functional-form sensitivity while comparing the two surplus measures.

Key concepts

compensating variation (Hicksian welfare cost of inflation) : the income required to restore a consumer’s utility to its pre-inflation level after an inflation increase, holding utility constant; the paper’s main new estimate, derived from Hicksian money demand functions.

Normalized Quadratic (NQ) flexible functional form : a globally flexible functional form (Diewert and Wales 1988) used to approximate the consumer’s cost function without imposing restrictions on substitution elasticities; allows derivation of both Marshallian and Hicksian demand functions.

Divisia monetary aggregates : theoretically consistent monetary aggregates that weight monetary assets by their monetary service flows (user costs) rather than summing them with equal weights; CFS Divisia aggregates are used here as the measure of money.

On the Nature of Entrepreneurship

Mon, 01 Jan 0001 00:00:00 +0000

This paper uses a novel longitudinal administrative dataset drawn from U.S. Internal Revenue Service (IRS) and Social Security Administration (SSA) records to characterize income dynamics and the determinants of entrepreneurial entry for pass-through business owners — sole proprietors, partners, and S corporation owners — who collectively account for over 50 percent of all U.S. business net income. The sample covers 2000–2015 and includes up to 1.3 billion person-year observations for individuals aged 25–65. The authors construct balanced panels using birth cohorts 1950–1975, impute education (college attainment) and skill (cognitive, interpersonal, manual) via machine-learning classifiers trained on CPS and O*NET data, and estimate life-cycle income profiles using a three-component model that separates individual fixed effects, group-specific time effects, and group-cohort-specific age effects.

The paper’s central departure from prior work is coverage of the full income distribution, including the high-earning right tail that household surveys such as the CPS misrepresent due to top-coding and small samples. When the IRS and CPS samples are compared on a consistent classification basis, median self-employment income is lower in the IRS data at all ages, consistent with the survey literature’s emphasis on the “typical” self-employed individual. However, mean incomes diverge sharply: the IRS shows mean self-employment income rising from $23 thousand at age 25 to $93 thousand at age 55, whereas the CPS (with incorporated owners reclassified) shows a rise from only $41 thousand to $73 thousand. Roughly 80 percent of self-employment income in the IRS data accrues to individuals above the $100 thousand threshold, compared to 42–53 percent in the CPS. The IRS-CPS gap is dominated by the right tail and concentrated in professional services and health care. For paid-employed individuals, the IRS and CPS medians and means are close at all ages, confirming the discrepancy is specific to self-employment.

The life-cycle estimation finds that individuals who have “tried self-employment” — a group earning virtually all self-employment income — start at similar average incomes to primarily paid-employed peers at age 25 but reach $134 thousand by age 55, compared with $79 thousand for paid-employed peers with the same observable characteristics. Age effects for the self-employed are 63 percent higher than for the paid-employed at age 26 and remain elevated until age 55. Time effects show dramatically greater cyclical volatility for the self-employed: income growth declined by $9,655 (2008) and $8,785 (2009) for the self-employed versus $373 and $1,583 for paid-employed in the same years, concentrated in real estate and construction.

On the determinants of entry, the paper finds: (i) no evidence that house-price appreciation raises entry rates, contra collateral-constraint hypotheses; (ii) most entrants have lower asset incomes than future entrants with the same characteristics, arguing against a liquid-wealth precondition; (iii) most entrants have higher prior labor income than future entrants, consistent with entry being driven by on-the-job experience rather than fallback from low-paid work; (iv) almost all founders report positive individual tax income in their first year of operation despite negative business net income and no external debt financing. Self-employed income growth exhibits greater dispersion — a 10th-to-90th percentile range roughly 2.5 times wider than for the paid-employed — and a Kelly skewness about 0.1 higher. A standard consumption-risk model calibrated with household-finance estimates of risk aversion rationalizes the patterns if individuals are insured against the most adverse downside shocks. Entry and exit rates are stable across the sample period, including the Great Recession, and the entrepreneurship share does not decline.

The subgroup congruent with non-pecuniary motivation — primarily self-employed individuals earning less than paid-employed peers with matching characteristics — comprises roughly 57 percent of primarily self-employed by count but earns only 16 percent of total self-employment income.

In depth

Q1. Why do IRS and CPS data give such different pictures of self-employment income?

The CPS suffers from top-coding of high incomes and small samples that underrepresent high earners in key industries. The IRS-CPS mean income gap for the self-employed is dominated by the right tail: in the main IRS sample, individuals above the $100 thousand threshold earn roughly 80 percent of all self-employment income, versus 42 percent in the comparable CPS sample. The average income of top earners above $100 thousand is $355 thousand in the IRS versus $218 thousand in the CPS. The gap is concentrated in professional services and health care and persists across all income thresholds and sample definitions tested. No analogous discrepancy exists for paid-employed individuals, where IRS and CPS medians and means are close at all ages.

Q2. What does the comparison look like at the median versus the mean?

At the median, IRS self-employment income is lower than both CPS samples at all ages, with the gap largest for younger owners and those with incorporated businesses — a pattern consistent with the survey-based “self-employment discount” narrative. At the mean, the IRS shows much higher income at older ages: by age 55, IRS mean self-employment income is $93 thousand versus $73 thousand in the CPS sample that includes reclassified incorporated-owner wages. The divergence arises because the mean is sensitive to the right tail, which the CPS systematically underrepresents.

Q3. How does the paper estimate life-cycle income profiles while separating age, time, and cohort effects?

Individual income is decomposed into an individual fixed effect (permanent latent ability and preferences), a group-specific time effect (business-cycle fluctuations common to a group), and a group-cohort-specific age effect (life-cycle income growth). Identification exploits the overlapping cohort structure of the 16-year panel: age effects are assumed equal across cohort bins of size at least two, allowing time and age effects to be separately identified. The model is estimated in levels rather than logs to accommodate business losses. Groups are defined as a Cartesian product of 32,256 subgroups based on education, three skill dimensions, industry (21 two-digit NAICS codes), demographics (gender, cohort, marital status, children), and employment-status history.

Q4. What are the headline life-cycle income profile findings for self- versus paid-employed?

Among the “primarily employed” group, those who have tried self-employment and those who are primarily paid-employed have similar average incomes at age 25. By age 55 the self-employed reach an estimated $134 thousand (2012 dollars) versus $79 thousand for paid-employed peers with identical observable characteristics. The estimated age effect for the self-employed is 63 percent higher than for the paid-employed at age 26 and remains higher through age 55. These gaps would widen further if incomes were adjusted upward for the BEA-estimated net misreporting rates of 46 percent for unincorporated owners and 14 percent for S corporation owners.

Q5. How large is the group consistent with non-pecuniary motivation, and how much income does it earn?

The non-pecuniary subgroup — primarily self-employed individuals (at least 12 years in self-employment) who earn less on average than primarily paid-employed peers matched on gender, education, skills, and other characteristics — is numerically larger, comprising approximately 57 percent of primarily self-employed by count. However, this group earns only 16 percent of total self-employment income. Adjusting for paid-employed fringe benefits and self-employed income misreporting can change the group’s size but does not alter the finding that it accounts for a small income share. The paper concludes that non-pecuniary motives may guide occupational choice for many individuals but are not the driver of the typical dollar earned in self-employment.

Q6. How does idiosyncratic income risk compare between self- and paid-employed?

Self-employed income changes are substantially more dispersed: the 10th-to-90th percentile range of income growth is roughly 2.5 times wider for the self-employed than for the paid-employed. Income changes for the self-employed are also more right-skewed, with a Kelly skewness difference of approximately 0.1. When a standard consumption-risk model — augmented with a lower bound on consumption growth to allow for external insurance — is parameterized with risk-aversion estimates from the household finance literature, the observed patterns are rationalized if individuals are insured against the most adverse downside shocks, i.e., the attractive aspect of self-employment is large potential upside with insured downside.

Q7. What happened to self-employed income and exit rates during the Great Recession?

Time effects show steep income growth declines for the self-employed of -$9,655 in 2008 and -$8,785 in 2009, compared with much more modest declines of -$373 and -$1,583 for paid-employed peers. The aggregate income declines are concentrated in cyclically sensitive self-employed subgroups in real estate and construction, with their paid-employed counterparts experiencing only modest declines. Despite these large income shocks, exit rates from self-employment showed little change during the Great Recession, either in aggregate or in the cyclically sensitive sectors. Entry rates were likewise stable, and the share of entrepreneurs in the population did not decline over the full sample period.

Q8. Does the evidence support collateral constraints as a binding barrier to entrepreneurial entry?

No. The paper tests the hypothesis, standard in the liquidity-constraints literature, that entry rates should be higher for homeowners experiencing house-price appreciation (which raises collateral value). The IRS data do not support this prediction. Separately, comparing asset incomes (interest, dividends, capital gains) of current entrants and future entrants with the same characteristics, the paper finds that most current entrants have lower asset incomes and less liquid wealth than those who switch later, which also argues against a liquid-wealth precondition for entry.

Q9. What does prior labor income reveal about why people enter self-employment?

Current entrants have higher prior labor income than matched future entrants with the same characteristics, indicating they enter with accumulated on-the-job experience rather than being pushed into self-employment as a fallback after failure in paid work. This is consistent with self-employment being a deliberate, experience-driven career transition for most entrants rather than a last resort for low earners. The paper interprets this as positive evidence for the role of experience-based human capital in driving entrepreneurial choice.

Q10. How do founders finance startup costs if most have negative business net income in early years?

Almost all founders in the sample report positive income on their personal (individual) tax form in the first year of operation, even though most report negative business net income and carry no external debt financing. This pattern suggests founders rely on personal income sources — prior savings, part-time paid employment, or spousal income — to cover startup costs rather than external debt, implying that formal credit-market financing constraints are not the primary barrier to entry for most entrants in the sample.

Q11. What are the scope conditions and key limitations?

The sample covers pass-through owners (sole proprietors, partners, S corporation owners) and excludes C corporation shareholders, whose entrepreneurial income does not flow to individual returns until distributed. Income measures exclude most employer fringe benefits; capital gains are excluded from self-employment income, and the authors note their inclusion would strengthen the main findings. The analysis covers 2000–2015 for cohorts born 1950–1975, and income is reported before taxes and transfers. Baseline estimates are not adjusted for misreporting, though BEA-implied adjustments of 46 percent for unincorporated owners and 14 percent for S corporation owners would widen the income gaps further.

Pass-through business owner: An individual who owns a sole proprietorship, partnership, or S corporation, such that business net income flows directly onto the owner’s personal tax return; excludes C corporation shareholders whose income appears only upon dividend or capital-gains distributions.

Tried self-employment: The paper’s primary self-employed comparison group within the “primarily employed” category — individuals with any years in self-employment (including frequent switchers and those with most years in self-employment) — who collectively earn virtually all self-employment income.

Group-specific age effect: The paper’s estimate of how individual income changes with age within a defined subgroup (determined by education, skill, industry, demographics, and employment history), identified by exploiting overlapping birth cohorts in the 16-year panel and separated from individual fixed effects and business-cycle time effects.

Primarily employed: Individuals with at least 12 of 16 sample years in either self- or paid-employment, with at most one intermediate year of non-employment; the paper’s main analytical focus for life-cycle income comparisons.

SOI Databank: The Statistics of Income Databank, a de-identified balanced panel combining SSA demographic records with IRS tax filing data for all living U.S. individuals with a Social Security number over 1996–2015; the paper’s primary data source providing Schedule C, K-1, W-2, and related filing information.

Kelly skewness: A robust measure of distributional asymmetry used by the paper to characterize income growth; the paper reports that Kelly skewness of self-employed income changes exceeds that of paid-employed by approximately 0.1, indicating greater right-skewness in self-employment income dynamics.

Non-pecuniary motivation subgroup: Primarily self-employed individuals who earn less on average than primarily paid-employed peers matched on observable characteristics, taken by the paper as consistent with non-wage job amenities (autonomy, flexibility) driving occupational choice; found to be 57 percent of primarily self-employed by count but earning only 16 percent of total self-employment income.

Published | Macro Paper Warehouse

Adverse Selection and Small Business Finances

Layer 1: Overview

In depth

Q1. What is the paper’s core identification challenge in the empirical section, and how does it address it?

Q2. What is the signaling mechanism in precise terms, and how does it differ from Leland-Pyle (1977)?

Q3. How are the four equilibrium regimes generated and what determines which one prevails?

Q4. Why is the loan size (intensive margin) undistorted even when the extensive margin (market tightness and down payment) is distorted?

Q5. What is the key externality that makes the competitive equilibrium constrained inefficient, and how does the planner correct it?

Q6. What is the non-monotonicity of screening intensity in δ_L, and what is the intuition?

Q7. How does the moral hazard extension change the results compared with the baseline?

Q8. How does this paper relate to Guerrieri, Shimer, and Wright (2010) and what does it add?

Q9. What robustness and consistency checks are run in the empirical section?

Q10. What are the policy implications and their scope conditions?

Q11. What are the main data limitations acknowledged in the empirical analysis?

Key Concepts

An Analytical Model of Behavior and Policy in an Epidemic

Layer 1: Overview

In depth

Q1. What is the ‘identification’ or solution strategy, and what makes the analytical characterization possible?

Q2. What is the core economic mechanism behind ’excessive caution,’ and the two ways the paper frames the externality?

Q3. Why is the optimal lockdown ’late, strong, and short’ rather than gradual?

Q4. How do equilibrium and optimal cumulative deaths compare, and why does the more cautious equilibrium produce MORE deaths?

Q5. What is the role of the cost-benefit ratio κ and the ‘fatalism effect’?

Q6. What is the practical ‘back-of-the-envelope’ contribution?

Q7. How do the results relate to and differ from prior numerical econ-epi work?

Q8. What do the costate (shadow-value) dynamics reveal?

Q9. What robustness/extension checks does the paper run?

Q10. What is the calibration used for the figures, and is it meant to be quantitatively serious?

Q11. What are the key caveats and the scope of the policy implications?

Key Concepts

Central Banks as Dollar Lenders of Last Resort: Implications for Regulation and Reserve Holdings

Layer 1: Overview

In depth

Q1. What is the paper’s core empirical identification strategy and what are the main limitations?

Q2. What is the mechanism through which reserve accumulation creates a global externality?

Q3. What roles do capital requirements and funding-mix regulation play in the model, and how do they differ?

Q4. Under what conditions does the global planner prefer more reserves than the decentralized outcome (the ‘wrong-way’ effect)?

Q5. How does the paper handle the correlation between banking crises and exchange rate depreciations?

Q6. What does the risk-sharing extension (Section 5) contribute?

Q7. How does this paper relate to Bocola and Lorenzoni (2020), and what is the key theoretical distinction?

Q8. How does this paper relate to the literature on ‘mercantilist’ versus ‘precautionary’ motives for reserve accumulation?

Q9. How does this paper connect to the literature on international coordination of financial regulation?

Q10. Why are Eurozone countries excluded from the empirical sample?

Q11. What are the scope conditions on the empirical results?

Q12. What is the model’s treatment of the dollar interest rate and safe asset scarcity?

Q13. What is the welfare decomposition from the global numerical example?

Q14. What policy implications does the paper draw, and how are they scoped?

Key Concepts

Deciphering Federal Reserve Communication via Text Analysis of Alternative FOMC Statements

In depth

Q1. What are the alternative FOMC statements and how are they used?

Q2. How is the policy stance measure constructed?

Q3. How is the stance decomposed into expected and surprise components?

Q4. How is the measure validated and what are its macroeconomic effects?

Q5. What counterfactual analysis does the framework enable?

Key concepts

Did the US Really Grow Out of Its World War II Debt?

Layer 1: Overview

In depth

Q1. What is the identification/counterfactual strategy and what are its main threats?

Q2. How are the effects of the peg and surprise inflation distinguished, and can they be separated?

Q3. What is the decomposition relative to Hall and Sargent (2011)?

Q4. Why do the Table 2 surplus contributions differ from the Table 1 scenario differences?

Q5. How do the findings reconcile with Blanchard’s (2019) claim that r < g since 1979?

Q6. What is a notable nuance about the post-1979 period in the primary-balance counterfactual?

Q7. What robustness checks are run?

Q8. What heterogeneity across the debt structure matters?

Q9. What are the timing/measurement complications?

Q10. What are the policy implications and their scope conditions?

Key Concepts

Expecting Floods: Firm Entry, Employment, and Aggregate Implications

Layer 1: Overview

In depth

Q1. What is the identification strategy, and what are the main threats to it?

Q2. What are the main mechanisms, and how are they distinguished empirically and in the model?

Q3. Why does population fall less than employment, and why do firm exits decline?

Q4. What heterogeneity is documented?

Q5. What robustness checks are run?

Q6. What model extensions are explored and how do results change?