O33 | Macro Paper Warehouse

AI and task efficiency

Mon, 01 Jan 0001 00:00:00 +0000

AI can improve decisions, raise firm productivity, and accelerate human capital growth through its effect on signal quality in problem-solving tasks, but the consequences are heterogeneous across the skill distribution and depend on how AI changes the hierarchy within firms. This paper proposes a framework in which AI improves the accuracy of the signals that guide human decisions—individually and in groups—and derives implications for firm organization, wages, and productivity. It also examines preliminary evidence: a cross-sectional regression of changes in TFP growth (2024 versus 2022) on sectoral AI exposure (Eisfeldt et al. 2024) for Compustat firms yields a positive relationship, statistically significant at the 10% level at the 3-digit NAICS sector level and at the 5% level at the firm level, with a slope coefficient of 0.206 for the firm-level regression. The paper compares AI to earlier general purpose technologies (GPTs)—electricity and information technology—finding that if there is a productivity delay for AI it appears shorter than the five- and eight-year delays documented for electrification and IT.

Summary of a forthcoming paper, AI-assisted and human-reviewed. See the linked original for the authoritative claims and full conditions.

In depth

Q1. What is the theoretical framework linking AI to decisions and productivity?

The paper models several mechanisms through which AI may improve outcomes by raising the accuracy of signals that guide problem-solving: when signal accuracy rises, individuals and groups make better decisions, potentially enabling lower-level workers to handle more complex tasks and reducing the need for expensive higher-level solutions. For example, if AI allows managers to understand problems faster, they can handle more problems at a given time, potentially reducing demand for specialized expert judgment at lower hierarchy levels. Alternatively, if AI allows lower-level workers (clerks, nurses) to handle tasks previously requiring specialists (partners, doctors), the demand for specialists may fall and the wage premium for top-tier workers may narrow. The direction of the effect depends on whether AI is a better complement to high-skill or to low-skill tasks.

Q2. What does the empirical evidence show about AI’s current productivity effects?

A cross-sectional regression using 5,009 Compustat firm-level observations for 66 three-digit NAICS sectors finds a positive and statistically significant relationship between sectoral AI exposure in 2022 (from Eisfeldt et al. 2024) and the change in annual TFP growth between 2024 and 2022, with a sector-level slope coefficient that is statistically significant at the 10% level. The firm-level regression (including 3-digit NAICS fixed effects) yields a slope of 0.206 on AI exposure, significant at the 5% level (t-statistic 2.08), with R² = 0.20 and 1,996 observations. The relationship is absent when examining TFP growth levels in any individual year between 2019 and 2022, consistent with AI’s macroeconomic effects only becoming measurable after the release of GPT-4 in March 2023.

Q3. How does AI compare with prior general purpose technologies?

The paper relates AI to the earlier GPT literature, noting that productivity growth tended to be lower at the start of both the electrification and IT eras—with delays of approximately five and eight years respectively before productivity gains became measurable—and that if there is a similar delay for AI it appears shorter based on the preliminary 2024 data. This comparison suggests that AI may be a GPT with unusually rapid diffusion or a shorter learning curve, though the authors caution that the evidence is still preliminary and depends on the dating of AI’s “arrival.”

Q4. Why might AI effects differ across the hierarchy within firms?

AI’s effect on a firm hierarchy depends on whether it complements or substitutes for skills at each level: if AI primarily helps managers (by speeding problem diagnosis), it may reduce demand for specialized lower-level workers; if it primarily helps clerks (by enabling them to handle more complex documents), it may reduce demand for partners while raising demand for lower-level staff. The paper argues that the distributional consequences—whether AI raises or lowers wage dispersion—depend on this complementarity/substitutability pattern, which likely varies by industry, as illustrated by the contrasting cases of automotive assembly (AI may help managers but not line workers) and law firms (AI may help clerks handle more complex work).

Key concepts

AI as signal accuracy improvement : the paper’s framework for thinking about AI’s effect on decision quality: AI raises the precision of the signals that guide problem-solving, which leads to better individual and group decisions regardless of the specific mechanism.

general purpose technology (GPT) delay : the empirical phenomenon documented by Jovanovic and Rousseau (2005) in which productivity growth is lower at the start of a major GPT era before eventually accelerating; the paper examines whether AI exhibits the same pattern, finding that any delay appears shorter than for electrification (five years) or IT (eight years).

Automation and Rent Dissipation

Mon, 01 Jan 0001 00:00:00 +0000

Acemoglu and Restrepo examine the effects of automation in economies where labor market distortions cause some workers to earn rents—wages above their opportunity cost or outside option. The central question is how the interplay between automation and these distortions shapes wages, inequality, and productivity. The paper makes three contributions: a theoretical framework identifying a rent dissipation mechanism, reduced-form empirical evidence using US data from 1980 to 2016, and a general equilibrium quantification of automation’s aggregate effects.

The theoretical framework extends the task model of Acemoglu and Restrepo (2022) to incorporate task-specific wage wedges. In this setup, a firm employing labor of type g in task x pays a wage equal to the base wage multiplied by an exogenous wedge capturing rents from efficiency wages, bargaining, licensing, regulations, or norms. Because these wedges artificially inflate labor costs in high-rent tasks, firms have a stronger incentive to automate precisely those tasks—automation saves more in labor costs where rents are highest. Proposition 3 establishes that endogenous adoption decisions are tilted toward high-rent tasks: the rent distribution in automated tasks first-order stochastically dominates the rent distribution across all tasks. This targeting generates the rent dissipation mechanism. The equilibrium is inefficient on both the intensive margin (too little employment in high-rent tasks) and the extensive margin (excessive automation of high-rent tasks that a social planner would prefer to keep labor-intensive).

The rent dissipation mechanism has three consequences identified theoretically. First, it amplifies average wage losses for exposed groups beyond what displacement alone would produce, pushing displaced workers toward lower-paying jobs. Second, it compresses within-group wage dispersion by concentrating losses at higher percentiles of the within-group distribution, generating a U-shaped pattern of wage changes: workers at low percentiles earn no rents and experience only base-wage adjustments, while workers between the 70th and 95th percentiles face the steepest declines due to loss of high-rent jobs. Third, it is inefficient: because the tasks targeted by automation are not those where wages reflect scarcity or skill but rather distortionary rents, a planner would have preferred more labor allocated to these tasks, and rent dissipation offsets part or all of the cost-saving productivity gains from automation.

The empirical analysis covers 500 detailed demographic groups defined by education (five levels), gender, five age groups, five race/ethnicity groups, and nativity. Task displacement is measured as a weighted sum of industry-level automation exposure using three proxies: adjusted industrial robot penetration, specialized software services, and dedicated machinery in value added. Workers in the middle and lower-middle of the wage distribution lost 15–20% of their tasks to automation between 1980 and 2016, while post-college workers saw few tasks automated.

A 10 percentage point increase in task displacement is associated with a 24% decline in group-level relative wages (β = −2.36, s.e. = 0.13), falling to 19% after controlling for gender, education, sectoral demand, and rent shifters (β = −1.90, s.e. = 0.29). The U-shaped pattern in within-group wage changes is clearly visible: wages decline by 25–30% per 10 percentage point task displacement at the 70th–90th percentiles, compared to only 16% at the 5th–40th percentiles. Decomposing the average wage effect, the base-wage component is β = −1.53 (s.e. = 0.33) and the rent-dissipation component is β = −0.37 (s.e. = 0.11), implying a rent dissipation rate of approximately 37%. Across multiple proxies for rents—inter-industry/occupation wage differentials, wage losses after job displacement, and quit rates—the average estimated rent dissipation rate is approximately 35%. Rent dissipation accounts for one-fifth of the overall relative wage decline experienced by groups exposed to automation.

In the general equilibrium quantification (with elasticity of substitution λ = 0.5, average cost savings π = 30%, and average rent in automated tasks of 35%), automation accounts for 52% of the rise in between-group wage inequality since 1980: 42 percentage points via baseline displacement effects on labor demand, and 10 percentage points via rent dissipation. Cost savings from automation increased TFP by approximately 3% between 1980 and 2016, but inefficient rent dissipation offsets 60–90% of these gains, leaving net TFP gains of only 0.3–1.3% and net aggregate consumption gains of only 0.45–1.95% over the 36-year period.

Q: What is the rent dissipation mechanism, and why does it arise? A: Rent dissipation arises because labor market wedges make high-rent tasks artificially costly to staff with workers, giving firms a stronger incentive to automate precisely those tasks. When automation displaces workers from high-rent jobs, workers lose the premium above their opportunity cost that those jobs paid, amplifying wage losses beyond what displacement alone would cause. The mechanism is endogenous: firms do not randomly automate tasks but disproportionately target tasks where rents are highest, since doing so saves the most in labor costs. Proposition 3 formalizes this as first-order stochastic dominance of the rent distribution in automated tasks over the rent distribution in all tasks.

Q: Why is rent dissipation inefficient? A: In a distorted economy, high-rent tasks already feature too little employment at the equilibrium—firms under-hire in these tasks because the wage wedge makes labor artificially expensive. A social planner would want to allocate more labor to these tasks, not less. When automation further removes labor from high-rent tasks, it moves the economy further from the efficient allocation, dissipating rents that reflect distortions rather than true scarcity. The TFP formula shows that this inefficient targeting offsets part or all of the cost-saving gains from automation, and can even reduce aggregate productivity if the cost savings are small relative to the rent losses.

Q: What is the U-shaped pattern of within-group wage changes, and what does it indicate? A: The U-shaped pattern means that wage declines due to automation are smallest at the bottom percentiles of a group’s within-group wage distribution, largest in the 70th–95th percentile range, and then smaller again at the very top. Workers at low percentiles earn no rents, so they experience only the base-wage adjustment from reduced labor demand. Workers in the middle-upper range of the distribution hold the high-rent jobs that are disproportionately automated, so they lose both the base-wage component and the rent component of their wages. This pattern is directly visible in US data 1980–2016, with declines of 25–30% per 10 percentage point task displacement at the 70th–90th percentiles versus 16% at the 5th–40th percentiles.

Q: How is task displacement measured, and which groups are most exposed? A: Task displacement is measured as a weighted sum of industry-level automation exposure, accounting for each demographic group’s specialization in routine tasks within industries. Three proxies are used: the adjusted penetration of industrial robots, the increase in specialized software services, and the increase in dedicated machinery in value added. Workers in the middle and lower-middle of the wage distribution—broadly corresponding to non-college workers—lost 15–20% of their tasks to automation between 1980 and 2016. Post-college degree workers saw few tasks automated.

Q: How large is the rent dissipation rate, and how robust is this estimate? A: The baseline estimate from the U-shaped within-group wage change decomposition implies a rent dissipation rate (μ_Ag/μ_g − 1) of approximately 37% (β = −0.37, s.e. = 0.11). Using inter-industry and occupation wage differentials as a proxy for rents, the estimate is 39% (β = −0.39, s.e. = 0.11). Using wage losses after job displacement, the estimate is 20% (β = −0.20, s.e. = 0.04). After purging compensating differentials from the wage differential proxy the estimate remains 37%; after purging from the displacement-loss proxy it falls to 19%. Quit-rate evidence is consistent with rent dissipation: automation shifts workers toward higher-quit-rate jobs, which are lower-rent jobs. The average across proxies is approximately 35%.

Q: How much of between-group wage inequality since 1980 does automation explain, and what share is due to rent dissipation specifically? A: Automation accounts for 52% of the rise in between-group wage inequality in the US since 1980. Of this 52 percentage points, 42 percentage points are attributable to the baseline displacement effect working through reduced labor demand for exposed groups. The remaining 10 percentage points are attributable to rent dissipation—automation pushing exposed groups away from high-rent tasks into lower-paying employment. Rent dissipation thus accounts for roughly one-fifth (10/52) of automation’s total contribution to between-group inequality.

Q: How large are the productivity gains from automation, and how much does rent dissipation offset them? A: Cost savings from automation increased TFP by approximately 3% between 1980 and 2016. However, inefficient rent dissipation offsets 60–90% of these gains, because automation disproportionately targets high-rent tasks rather than tasks where the efficiency case is strongest. The net TFP increase attributable to automation is only 0.3–1.3% over the 36-year period, and the corresponding net increase in aggregate consumption is only 0.45–1.95%.

Q: How does automation affect within-group versus between-group inequality, and why is this notable? A: Automation increases between-group inequality by reducing relative wages of exposed groups (largely non-college workers) relative to unexposed groups, accounting for 52% of the rise in between-group inequality since 1980. At the same time, automation reduces within-group wage dispersion for exposed groups by compressing wages at higher percentiles. This contrasts with the standard view that inequality is fractal—rising at all levels of aggregation due to skill-biased demand—and helps explain why within-group inequality has risen steadily for college workers since the 1980s while remaining flat and then declining for non-college workers since the 1990s.

Q: What do the propagation matrix and rent-impact matrix represent in the general equilibrium analysis? A: The propagation matrix encodes how task reallocation due to automation in one demographic group creates competition for marginal tasks across other groups, transmitting the wage effects of automation to groups not directly displaced. The rent-impact matrix encodes how this task reallocation changes the rent composition of employment across groups. Both matrices are estimated from US data on task shares and group-level wage elasticities and are used to translate partial-equilibrium estimates of task displacement and rent dissipation into general equilibrium effects on wages and productivity for all demographic groups simultaneously.

Q: What are the policy implications of inefficient rent dissipation? A: Because rent dissipation is inefficient, the social value of automation is lower than what firms and consumers are willing to pay—firms capture all the labor cost savings but do not internalize the welfare cost of destroying high-rent jobs that the distorted equilibrium already under-supplies. Second-best interventions should address the underlying distortions generating rents rather than trying to slow automation directly. The paper suggests that strengthening labor market institutions supporting worker rents in non-automatable tasks could partially counteract the adverse distributional consequences of automation.

Q: How does this paper relate to Bound and Johnson (1992) and Borjas and Ramey (1995)? A: Bound and Johnson (1992) decompose changes in the US wage structure between 1979 and 1988 into technology, supply, and rent components (modeled as exogenous industry wedges), finding that 10–20% of between-group wage changes reflect rent losses. Borjas and Ramey (1995) estimate that trade increased the college premium by 1.3–2.6 log points between 1976 and 1990, with 15–33% due to loss of rents from trade-exposed jobs. Both are comparable to this paper’s finding that rent dissipation accounts for one-fifth of the wage effect of automation, though Bound and Johnson’s estimates include all factors affecting rents while this paper isolates automation specifically.

Worker rents: Wages above a worker’s opportunity cost or outside option, arising from efficiency wages, bargaining, licensing, regulations, or norms. Modeled as task-specific multiplicative wedges (μ_gx ≥ 1) that force firms to pay more than the base wage for labor in particular tasks. Explicitly excludes compensating differentials and skill premia.

Rent dissipation: The loss of above-opportunity-cost wages experienced by workers displaced from high-rent tasks into lower-paying employment. Occurs because automation endogenously targets high-rent tasks where labor is most expensive, and pushes workers into tasks where rents are lower. Quantified as the ratio of average rents in automated tasks to average rents across all tasks, minus one (approximately 35% in US data 1980–2016).

Task displacement: The share of tasks performed by a demographic group that are automated away, measured as a weighted sum of industry-level automation exposure accounting for the group’s specialization in routine tasks. Distinct from employment loss because it captures reallocation of tasks from labor to capital within the production function.

U-shaped within-group wage change profile: The pattern whereby automation generates the largest wage declines at intermediate-to-upper percentiles (70th–95th) of an exposed group’s within-group wage distribution, with smaller declines at the bottom, because high-percentile workers disproportionately hold high-rent jobs targeted by automation. Predicted theoretically and confirmed empirically in US data 1980–2016.

Propagation matrix: A matrix estimated from US data on task shares and group-level wage elasticities that encodes how automation of tasks performed by one demographic group creates competition for marginal tasks with other groups, transmitting wage effects across the demographic distribution in general equilibrium.

Inefficient automation targeting: The mechanism by which labor market distortions cause firms to automate high-rent tasks that a social planner would prefer to keep labor-intensive, since the distorted equilibrium already features too little employment in those tasks. Results in rent dissipation offsetting 60–90% of automation’s direct TFP gains from cost savings.

Rent-impact matrix: A matrix that encodes how task reallocation due to automation changes the rent composition of employment across demographic groups, used alongside the propagation matrix to compute general equilibrium effects of automation on wages and productivity accounting for distortions.

Heterogeneous innovations and growth under imperfect technology spillovers

Mon, 01 Jan 0001 00:00:00 +0000

Layer 1 — Overview

Research Question. Jo and Kim ask two related questions: (1) How do firms use different types of innovation when learning others’ technology takes time? (2) How does this process alter the aggregate implications of firm innovation, particularly in the context of increasing competition?

Model. The paper develops a discrete-time infinite-horizon endogenous growth model with multi-product firms pursuing two types of innovation — “own-innovation” (improving existing product quality) and “creative destruction” (entering new product markets by displacing incumbents) — subject to a novel friction called “imperfect technology spillovers.” The friction takes the specific form of lagged learning: creative destruction builds on the one-period-lagged technology of the target market’s incumbent, while only the incumbent can observe the current frontier technology level. This one-period lag creates a technology gap (Δ = q_t / q_{t−1}) between the incumbent’s frontier and the level available to rivals. Four possible technology gap values arise in equilibrium: Δ₁ = 1 (no gap), Δ₂ = λ (one successful own-innovation), Δ₃ = η (one successful creative destruction), and Δ₄ = η/λ. The step sizes satisfy λ² > η > λ, meaning a single creative destruction improves quality more than a single own-innovation, but two consecutive own-innovations dominate a single creative destruction.

Key Mechanisms. The learning friction generates two novel mechanisms. First, the “market-protection effect”: incumbents with a technology advantage (Δ > 1) intensify own-innovation to widen the gap and protect their product lines when competitive pressure rises. Formally, own-innovation probability is highest for Δ₂ products and declines monotonically (z₂ > z₃ > z₄ > z₁), and ∂z₂/∂x > ∂z₃/∂x > 0 while ∂z₁/∂x < 0, conditional on value coefficients. Second, the “technological barrier effect”: higher overall own-innovation and creative destruction intensity widens the average technology gap across products, reducing rivals’ conditional probability of successfully taking over a product market. This is distinct from the standard Schumpeterian effect (lower expected future profits) and from the escape-competition effect in step-by-step models (which apply only to neck-and-neck, single-product firms).

Data and Empirical Strategy. The empirical analysis combines the USPTO PatentsView database, the Longitudinal Business Database (LBD), the Longitudinal Firm Trade Transactions Database (LFTTD), the Census of Manufactures (CMF), Compustat, and NBER-CES data, covering the universe of U.S. patenting firms from 1976 to 2016, with main analyses from 1982 to 2007. Own-innovation is proxied by the self-citation ratio of patents (the ratio of self-citations to total backward citations); creative destruction by new products added and low-self-citation patents. Exogenous competitive pressure comes from China’s WTO accession in 2001, instrumented by the industry-level NTR tariff gap (the gap between non-NTR and NTR rates in 1999) following Pierce and Schott (2016).

Empirical Findings. Pre-shock (1982–1999): patents with lower self-citation ratios (closer to creative destruction) have significantly longer backward citation gaps (coefficient −2.29 to −2.59, p < 0.01 across specifications), confirming that learning others’ technology takes more time. Creative-destruction-type patents also have higher market value (Kogan et al. stock return measure) and scientific value (forward citations), with self-citation ratio negatively associated with both (e.g., coefficient on self-citation for market value: −0.289 without firm FE; −0.110 with firm FE, p < 0.01). Conditional on patenting, higher self-citation ratios are negatively associated with employment growth (coefficient −0.256, p < 0.05), number of industries added (−0.158, p < 0.05), and products added (−0.274, p < 0.01).

Post-shock (DID): foreign competition had no statistically significant effect on overall patent counts, but firms with above-average innovation intensity in industries with high NTR gaps significantly increased their self-citation ratio — indicating a shift toward own-innovation. The triple-interaction coefficient is 0.795 (p < 0.01) with baseline controls. For a firm with average lagged innovation intensity (0.18) in an industry with an average NTR gap (0.291), this corresponds to a 4.2 percentage point increase in the seven-year growth rate of the self-citation ratio, representing a 15.0% increase relative to the average growth rate of 28.2 percentage points. Consistent with the technological barrier effect, firm entry rates are lower in industries with higher TFPR-skewness-based technological barriers (coefficient −0.012 to −0.016, p < 0.05).

Quantitative Analysis. Calibrated to the U.S. manufacturing sector in 1992, the model matches six target moments including average number of products (2.3), products added (0.3), firm entry rate (7.6%), average productivity growth (1.9%), high-growth-firm employment growth (22.5%), and import penetration (15.3%). Creative destruction contributes approximately 1.88 times more to growth per unit than own-innovation (step size ratio 0.075/0.04). The aggregate R&D-to-sales ratio (untargeted) is 4.6% in the model vs. 4.1% in data.

A counterfactual increasing outside entrants by 83% (matching the rise in import penetration from 15.3% to 25.1% between 1992 and 2007) generates a 1.51% increase in aggregate creative destruction arrival rate x, but firm-level creative destruction probability falls 1.33% and startup creative destruction also falls 1.33%. The aggregate R&D-to-sales ratio falls 1.6% and creative destruction R&D intensity falls 1.2%. Average domestic productivity growth declines 11.0%, with growth from creative destruction falling 13.0% and growth from domestic startups falling 1.7%. The total mass of domestic firms falls 6.4%.

In economies with creative destruction costs 80 times higher than the U.S. baseline, the same competitive pressure shock raises rather than lowers total R&D (by 1.0%), but domestic growth still falls 9.7%, because the marginal decline in creative destruction impedes the growth contribution and firm entry even when aggregate innovation spending rises.

Layer 2 — Q&A

Q1: What is the key friction that distinguishes this model from the existing multi-product firm literature (e.g., Klette and Kortum 2004; Akcigit and Kerr 2018)?

A: The key friction is “imperfect technology spillovers,” modeled as lagged learning: creative destruction can only build on the one-period-lagged technology of the target product (q_{j,t−1}), while the product’s current owner observes the frontier technology (q_{j,t}). In models without this friction — such as Akcigit and Kerr (2018) — rivals can instantly learn and copy frontier technology, so firms have no technological advantage and cannot protect their markets. In the current model, own-innovation by the incumbent widens the gap between q_{j,t} and q_{j,t−1}, creating a barrier that a rival must overcome even after successful creative destruction. This makes own-innovation an endogenous function of the technology gap, a feature absent from existing multi-product firm frameworks.

Q2: Why does the model predict that own-innovation increases with the technology gap up to a point, then decreases?

A: From Corollary 1, the ordering z₂ > z₃ > z₄ > z₁ reflects competing forces. Products with gap Δ₂ = λ gain the most from additional own-innovation in terms of reducing the probability of losing the product line (equation 2), so own-innovation is highest there. Products with Δ₃ = η or Δ₄ = η/λ already have substantial technological advantages from prior creative destruction, so the marginal value of own-innovation in reducing market loss probability is lower. Products with Δ₁ = 1 have no advantage at all: if a rival succeeds in creative destruction, the incumbent loses the product regardless of own-innovation (equation 1), so z₁ is lowest. Beyond a certain gap level, the incumbent is sufficiently protected that additional own-innovation has diminishing returns in deterrence.

Q3: What is the market-protection effect formally, and for which products is it strongest?

A: The market-protection effect (Corollary 2) is the positive response of a firm’s own-innovation to an increase in the aggregate creative destruction arrival rate x, conditional on the value coefficients A₁ and A₂ being fixed. It is strongest for products with Δ₂ = λ (∂z₂/∂x is the largest and positive), positive but weaker for Δ₃ = η (∂z₃/∂x > 0), of ambiguous sign for Δ₄ = η/λ, and negative for Δ₁ = 1 (∂z₁/∂x < 0). The asymmetry reflects the asymmetric payoff to own-innovation across gap levels: for Δ₂ products, successful own-innovation can turn a losing situation into a winning one because it shifts the technology gap from Δ₁ to Δ₂ from the rival’s perspective, effectively defeating the rival’s creative destruction attempt. This mechanism provides a micro-foundation for why frontier firms (like Google or NVIDIA) keep innovating intensely despite their technological leads, a pattern the standard step-by-step model cannot explain.

Q4: What is the technological barrier effect and how does it differ from the Schumpeterian effect?

A: The technological barrier effect refers to the reduction in rivals’ incentive for creative destruction caused by an increase in the average technology gap across product lines. When incumbents do more own-innovation or when outside firms do more creative destruction, the distribution of technology gaps shifts rightward (density at Δ₁ falls; density at Δ₂, Δ₃, Δ₄ rises). This raises the average technology barrier rivals must overcome to successfully take over a product market, reducing the conditional takeover probability x^{takeover} and the expected value of creative destruction B. In the U.S. counterfactual, the technological barrier effect accounts for 17.0% of the total change in the aggregate creative destruction rate x and 15.0% of the change in startup creative destruction x_e. In contrast, the Schumpeterian effect refers to the reduction in expected future profits from owning a product due to increased displacement risk (through the value coefficient A₂), a mechanism present in standard quality-ladder models. Both operate simultaneously but the technological barrier effect is a novel feature of this framework.

Q5: How is own-innovation vs. creative destruction measured empirically, and what validates this measure?

A: The self-citation ratio (the share of a patent’s backward citations that cite the same assignee’s earlier patents) is used as the primary measure: a higher ratio indicates greater reliance on the firm’s own prior knowledge, hence a higher probability that the innovation improves an existing product line (own-innovation). This is validated empirically in three ways. First, patents with lower self-citation ratios have significantly larger backward citation gaps (coefficient −2.29 to −2.59 across fixed-effect specifications on 728,721 observations), consistent with creative destruction requiring more time to learn others’ technology. Second, lower self-citation patents have higher market value and scientific value (forward citations), consistent with η > λ (creative destruction contributes more per event to quality). Third, firm-level regressions show that lower self-citation ratios are associated with higher employment growth, more products added, and more industries entered, consistent with creative destruction contributing more to firm expansion.

Q6: How does the DID identification strategy work, and what are the main results?

A: The identification exploits the removal of trade policy uncertainty (TPU) after China’s WTO accession in 2001. The treatment variable is the industry-level NTR gap (the gap between non-NTR and NTR tariff rates in 1999): industries with larger gaps experienced a larger reduction in uncertainty and thus a greater increase in Chinese import competition. The DID compares patenting firms across periods (1992–1999 vs. 2000–2007) and across high- vs. low-NTR-gap industries, with a triple interaction for firm-level innovation intensity (lagged five-year average patents per employee, normalized within two-digit NAICS). The main finding (Table 4): the NTR gap × Post interaction has no significant effect on overall patent counts (coefficient 0.238 without controls, standard error 0.237), but the triple interaction (NTR gap × Post × innovation intensity) has a positive and significant effect on the growth rate of the self-citation ratio (0.732 without controls, p < 0.05; 0.795 with baseline controls, p < 0.01). This implies that innovation-intensive firms in high-competition industries shifted their composition toward own-innovation, while overall patenting was unchanged — consistent with an offsetting rise in own-innovation and fall in creative destruction.

Q7: What are the aggregate growth effects of increasing competitive pressure in the calibrated model?

A: Using an 83% increase in outside entrants (matching the 1992–2007 rise in import penetration from 15.3% to 25.1%), average domestic productivity growth falls 11.0%. Decomposing: growth from domestic own-innovation falls 11.4%, growth from domestic creative destruction falls 13.0%, and growth from domestic startups falls 1.7% (Table 9). The aggregate R&D-to-sales ratio falls 1.6% and the creative destruction R&D intensity falls 1.2%, indicating that the decline in creative destruction R&D outweighs the rise in own-innovation R&D. The total mass of domestic firms falls 6.4% and the average number of products per firm falls 5.5%.

Q8: How do results differ in economies with high creative destruction costs vs. the U.S.?

A: When creative destruction costs (χ̃) are set 80 times higher than the U.S. baseline, the initial equilibrium has much lower creative destruction: R&D-to-sales ratio is 1.39% (vs. 4.58% in U.S.), creative destruction R&D intensity is 8.6% (vs. 63.9%), average number of products is 1.0 (vs. 2.3), and average domestic productivity growth is 1.4% (vs. 1.9%). Under the same competition shock, total R&D actually rises by 1.0% in this high-CD-cost economy (because own-innovation increases more than creative destruction falls, given the already low baseline of creative destruction), in contrast to the −1.6% in the U.S. However, domestic growth still falls 9.7% even in this economy, driven by reductions in creative destruction by incumbents and startups combined with a decline in the mass of domestic incumbents. This result holds even with a fixed firm mass (Table E5), confirming the mechanism is not solely due to entry/exit dynamics.

Q9: What is the technological barrier effect’s quantitative contribution to the decline in creative destruction?

A: In the U.S. counterfactual (Table 8 and associated decomposition), 17.0% of the total change in the aggregate creative destruction arrival rate x and 15.0% of the total change in startup creative destruction x_e are attributable specifically to the technological barrier effect — that is, to the shift in the technology gap distribution µ(Δℓ) holding all else equal. The conditional takeover probability x^{takeover} declines from 73.2% to 73.0%. The density at Δ₁ (the easiest gap to overcome) falls 0.4%, while densities at Δ₃ and Δ₄ rise 1.1% and 1.4% respectively, driven by increased creative destruction by outside firms and intensified own-innovation by incumbents.

Q10: What are the policy implications the paper draws from its framework?

A: The paper argues that policies evaluating innovation should account for composition, not just aggregate R&D levels or patent counts. Increased overall innovation driven by defensive own-innovation contributes less to economic growth than creative destruction and restricts firm entry — so it is less beneficial than it appears. In low-creativity economies (e.g., European economies with high regulatory barriers to creative destruction), increased foreign competition may raise aggregate R&D while still lowering domestic growth, misleading policymakers who track only total innovation spending. The model also suggests that the mixed empirical findings in the competition-innovation literature (Aghion et al. 2005; Bloom et al. 2016; Autor et al. 2020) can be reconciled by accounting for compositional shifts: the net effect of competition on total innovation is ambiguous because it raises own-innovation for technologically advantaged firms while reducing creative destruction for all firms.

Key Concepts

Imperfect Technology Spillovers: The novel friction introduced in this paper, modeled as lagged learning: firms attempting creative destruction can only access the one-period-lagged technology of the target product market (q_{j,t−1}), while the incumbent product owner observes and can improve from the current frontier (q_{j,t}). This asymmetry creates a persistent technological advantage for incumbents and enables strategic defensive innovation.

Own-Innovation: R&D investment by a firm to improve the quality of its existing product lines. Successful own-innovation raises product quality by a step size λ > 1. Own-innovation does not require learning others’ technology and, in the model, constitutes the incumbents’ defensive margin against creative destruction. At the aggregate level, it contributes more to total growth than creative destruction because it succeeds more frequently, but per successful event it contributes less (λ < η).

Creative Destruction: R&D investment enabling a firm to enter a new product market by displacing the incumbent. Successful creative destruction improves the lagged quality of the target product by a step size η > λ, where λ² > η > λ. It requires learning the incumbent’s one-period-lagged technology, takes longer to develop (evidenced empirically by longer backward citation gaps), and contributes more to firm growth and product expansion per event than own-innovation.

Technology Gap (Δ): The ratio of a product’s current-period technology to its previous-period technology (Δ_{j,t} = q_{j,t}/q_{j,t−1}). This gap summarizes the technological advantage the incumbent holds in a product market under imperfect spillovers. Four values are possible in equilibrium: Δ₁ = 1, Δ₂ = λ, Δ₃ = η, Δ₄ = η/λ. The gap determines both the incumbent’s own-innovation incentive and the rival’s probability of successfully completing a product takeover conditional on creative destruction.

Market-Protection Effect: The mechanism by which incumbents with a technological advantage (Δ > 1) increase own-innovation in response to heightened competitive pressure (an increase in the aggregate creative destruction arrival rate x). This effect is maximized for products with Δ₂ = λ and positive but diminishing for Δ₃. It is absent for Δ₁ = 1 products (where own-innovation cannot prevent displacement) and is formally distinct from the escape-competition effect in step-by-step innovation models, which applies only to neck-and-neck single-product firms.

Technological Barrier Effect: The reduction in rivals’ incentive for creative destruction caused by an increase in the average technology gap across the economy’s product lines. When incumbents intensify own-innovation and/or when outside creative destruction increases, the distribution of technology gaps shifts toward higher Δ values, reducing the conditional probability that a rival successfully takes over any given product market. This feedback mechanism endogenously suppresses creative destruction and firm entry beyond what the Schumpeterian effect alone would predict.

Self-Citation Ratio: The share of a patent’s backward citations that cite patents previously owned by the same firm. Used in the paper as a continuous proxy for the likelihood that a patent represents own-innovation vs. creative destruction: a ratio of 1 (100% self-citations) implies 100% probability of own-innovation; a ratio of 0 implies 100% probability of creative destruction. This measure follows Akcigit and Kerr (2018) and is validated in the paper against learning time, quality, and firm growth outcomes.

NTR Gap (Trade Policy Uncertainty Shock): The industry-level difference between non-NTR (column 2) and NTR (column 1) U.S. tariff rates in 1999, used as an instrument for the exogenous increase in Chinese competitive pressure following China’s WTO accession and the U.S. granting of Permanent Normal Trade Relations (PNTR) in 2002. Industries with larger NTR gaps experienced a greater reduction in trade policy uncertainty and thus a larger increase in competitive pressure from foreign firms.

Patent Term, Innovation, and the Role of Technology Disclosure Externalities

Mon, 01 Jan 0001 00:00:00 +0000

This paper examines how anticipated changes in patent term affect R&D and innovation, using the U.S. ratification of the Trade-Related Aspects of Intellectual Property Rights (TRIPs) agreement in 1995 as a quasi-natural experiment. The central research question is whether and how policy anticipation shapes the short- and long-run dynamics of innovative activity, given ambiguous theoretical predictions: news of a patent term reduction could either deter innovation (by signaling lower future returns) or accelerate it (by inducing innovators to file under the more favorable existing regime before it expires).

The identification strategy exploits a difference-in-differences (DiD) design using two sources of variation across 621 4-digit International Patent Classification (IPC) technological fields. The first is cross-sectional variation in field-specific pending periods — the time between patent application and grant during which monopoly rights are not fully enforceable — which determines whether TRIPs increased or reduced each field’s effective patent term (from 17 years post-grant to 20 years post-application minus the pending period). Fields with average pending periods exceeding three years faced expected reductions; those below faced extensions. On average across fields, TRIPs extended patent term by approximately 473 days (about 15 months), but approximately 45% of fields faced greater than 5% probability that individual patents would receive a term reduction. The second source is time variation from two events: a news event at the end of 1992 (when the Blair House Accord substantially reduced uncertainty about TRIPs adoption) and implementation in June 1995. The empirical sample spans 1985Q1–2000Q4 using PATSTAT patent data, augmented by firm-level R&D data from NBER-Compustat for 2,410 listed U.S. firms.

Three main empirical facts emerge. First (Fact 1), innovation and R&D accelerate more during the anticipation phase (1992Q4–1995Q2) in fields with a higher probability of patent term reduction. A one-percentage-point higher reduction probability corresponds to a 1.4% larger increase in granted patent applications before implementation; a one-month shorter average patent term extension corresponds to a 2.9% larger increase. At the firm level, a one-percentage-point higher reduction probability is associated with a 1.9% increase in annual R&D expenditure (approximately $1.7 million), ruling out the interpretation that rising patent counts merely reflect strategic filing adjustments.

Second (Fact 2), this heightened innovative activity persists for at least five years after implementation. Two years post-implementation, a one-percentage-point higher reduction probability corresponds to 1.44 additional quarterly patents (+2.7% in Poisson estimates), and a one-month shorter term extension corresponds to 3.3 more patents (+5.9%). This persistence is driven by indirect effects: the anticipation-induced burst in patenting generates additional follow-on innovation through technology disclosure externalities linked to cumulative knowledge creation. The elasticity of post-implementation innovation to news-phase innovation is estimated at approximately 2.1.

Third (Fact 3), the direct effect of patent term on innovation — estimated by augmenting the DiD specification to control for field-specific innovation histories — is negative for shorter extensions and consistent with prior literature. A one-month shorter patent term extension reduces quarterly patents by 1.7%, and a one-year reduction reduces them by 20.9%. These estimates align with Budish, Roin, and Williams (2015, 2016), who find that a one-year extension of patent monopoly increases R&D by 7%–22% in pharmaceuticals. The identification is supported by the absence of pre-trends, by the finding that pre-news pending period distributions predict realized post-news variation with coefficients near one (0.957–1.104), and by extensive robustness checks.

Q: What was the effective change in U.S. patent term under TRIPs, and why did it differ across fields? A: TRIPs shifted patent expiry from 17 years after grant to 20 years after application date. Because monopoly rights are only fully enforceable after grant, the effective term became 20 years minus the pending period. Fields with average pending periods shorter than three years received net extensions; fields with longer average pending periods faced net reductions. Cross-field variation in pending periods arises because applications in different technical fields are reviewed by distinct USPTO technical units with different complexity and backlog levels.

Q: What was the news event, and how was anticipation established? A: The paper identifies November 1992 — when the Blair House Accord substantially reduced uncertainty about TRIPs adoption — as the news event, with formal ratification in December 1994 and implementation in June 1995. Documentary evidence confirms anticipation: U.S. business executives were involved in TRIPs negotiations from 1986; the patent term change appeared in a 1991 GATT draft; an Advisory Committee report co-signed by IBM, 3M, Motorola, and others referenced it in August 1992; and a New York Times article noted proposed changes in September 1992.

Q: How is the probability of patent term reduction (PL_j) constructed, and what is its distribution? A: PL_j is the fraction of patents in field j granted before the TRIPs news with a pending period exceeding three years, computed using PATSTAT data on U.S. patents granted between January 1990 and May 1992. Approximately 45% of fields faced a reduction probability exceeding 5%, and 15% faced a probability exceeding 10%. Even fields with an average term extension greater than one year had individual-patent reduction probabilities as high as 40%. A 10-percentage-point increase in PL_j corresponds to approximately a four-month shorter average term extension.

Q: What is Fact 1 and what are its quantitative magnitudes? A: Fact 1 states that during the news phase, innovation and R&D increase relatively more in fields with higher patent term reduction probability and shorter average term extension. One year after the news (two years before implementation), a one-percentage-point higher reduction probability generates 0.19 additional quarterly patents (+0.5% in Poisson estimates); a one-month shorter average extension generates 0.35 additional units (+0.8%). These effects approximately triple one year before implementation. At the firm level, a one-percentage-point higher probability is associated with a 1.9% increase in annual R&D (~$1.7 million) in 1993.

Q: Why does news of a potential patent term reduction accelerate rather than deter innovation? A: Innovators who anticipate a reduction in future patent protection under the new regime have strong incentives to file applications before implementation to secure the longer 17-years-from-grant term while it remains available. The acceleration is therefore consistent with innovators preferring longer protection: they rush to file under the more favorable old regime rather than curtailing innovation. Complementary analyses exploiting within-field dispersion in pending periods find that firms were particularly responsive to scenarios involving adverse policy changes, consistent with loss aversion. The dynamics of the news-phase acceleration are also consistent with an R&D gestation lag of approximately two years, as estimated by Pakes and Schankerman (1984).

Q: What is Fact 2 and what drives the post-implementation persistence? A: Fact 2 states that the heightened innovation in fields with higher reduction probability persists for at least five years after June 1995, even though the direct effect of a shorter patent term is innovation-reducing. Two years post-implementation, a one-percentage-point higher reduction probability corresponds to 1.44 additional quarterly patents (+2.7% Poisson) and a one-month shorter extension to 3.3 additional patents (+5.9% Poisson). The persistence is driven by technology disclosure externalities: the news-phase acceleration generates new patented knowledge that subsequent innovations build upon. Fields where new inventions rely more heavily on past innovations from the same field — proxied by backward citation intensity — display stronger post-implementation persistence.

Q: How does the paper separate direct from indirect (externality-driven) post-implementation effects? A: Following Angrist and Pischke (2009), the paper augments the baseline DiD specification to control for field-specific innovation histories via a lagged moving average of past outcomes and pre-determined field attributes interacted with quarterly fixed effects. The resulting coefficients capture the effect of patent term variation orthogonal to the news-induced innovation dynamics. The direct effect estimates are negative post-implementation (Fact 3), while the overall estimates are positive (Fact 2), confirming that the indirect externality channel outweighs the direct channel in the post-implementation period.

Q: What is Fact 3 and how does its magnitude compare to prior literature? A: Fact 3 states that, controlling for the news shock, a shorter patent term extension leads to a relative decline in innovation post-implementation. The estimated semi-elasticity is 1.7% per one-month increase in patent term and 20.9% per one-year increase. These estimates align with Budish, Roin, and Williams (2015, 2016), who find a 7%–22% increase in pharmaceutical R&D per one-year extension, and with Hemous et al. (2023), whose model implies a 1.2% innovation increase per one-month extension.

Q: What is the estimated elasticity of post-implementation innovation to news-phase innovation, and what does it imply? A: Point estimates imply that one additional patent during the news phase generates approximately 5.1 additional patents post-implementation. Given average patent counts of 408.5 during the news phase and 1,000.3 post-implementation, this corresponds to a percent-to-percent elasticity of approximately 2.1. This elasticity captures the technology disclosure externality channel by which transitory accelerations in patenting generate persistent follow-on innovation.

Q: Why is ignoring anticipation (as in Abrams 2009) a problem for DiD identification? A: Anticipation inflates patenting in fields with higher reduction probability during the pre-implementation period, violating the DiD assumption that pre-implementation outcomes provide an unaffected baseline. For example, between April 1994 and March 1995, average monthly patents in field C12P (high reduction probability) were 15.1 units above pre-news levels, versus only 2.4 in field E05D (low reduction probability). Using this inflated pre-implementation level as the DiD reference baseline reverses the sign of the estimated implementation effect relative to the specification that uses the unaffected pre-news baseline.

Q: What evidence supports the technology disclosure externality mechanism over alternative explanations? A: The paper proxies technological dependence by backward citation intensity at the field level and finds that the news-phase acceleration propagates more strongly into post-implementation innovation in fields where new inventions more heavily cite prior same-field patents. Time-varying measures of technological dependence identify this channel as the primary driver of indirect post-implementation effects. Two alternative mechanisms — changes in technological competition and adjustments in patenting strategies — lack comparable empirical support. The finding is consistent with Hegde, Herkenhoff, and Zhu (2023), who document that permanent increases in knowledge diffusion speed permanently raise follow-on innovation rates.

Q: What are the policy implications of jointly considering anticipation and knowledge spillovers? A: Standard patent term analyses that abstract from anticipation effects and knowledge spillovers may substantially mischaracterize full welfare implications. The paper shows that innovation-policy interventions shape both short- and long-run outcomes, and that near-term variation in innovative activity can itself drive medium- to long-term effects through technological externalities. The estimated semi-elasticities of news, direct, and indirect effects provide empirical calibration targets for normative endogenous growth models used to derive optimal patent term, complementing prior normative recommendations ranging from zero protection (Boldrin and Levine, 2013) to infinite protection (Gilbert and Shapiro, 1990).

Effective patent term: The duration of legally enforceable monopoly granted by a patent, equal to 17 years after grant under the pre-TRIPs U.S. regime and 20 years after application minus the pending period under the post-TRIPs regime. Because enforcement begins only at grant, the pending period directly erodes effective protection.

Patent term reduction probability (PL_j): The field-specific fraction of pre-TRIPs patents with a pending period exceeding three years, representing the probability that individual patent applications in that field obtain a net reduction in patent term under the new 20-years-from-filing rule.

News effect: The incremental change in innovation or R&D at the time of policy announcement, induced by future anticipated changes in patent term, before the new policy enters into force. In this paper’s setting, the news effect is positive: higher reduction probability accelerates patenting as innovators rush to file under the favorable existing regime.

Direct implementation effect: The component of the post-implementation change in innovation attributable to the patent term change itself, isolated by controlling for field-specific innovation histories (i.e., abstracting from the indirect effects of anticipation-induced knowledge accumulation). It is negative for shorter patent term extensions, with a semi-elasticity of 1.7% per one-month increase.

Technology disclosure externality: The mechanism by which newly patented knowledge, disclosed through the patent system, enables subsequent inventors to build on prior innovations, generating follow-on inventive activity. In this paper, the transitory news-phase burst in patenting generates a persistent externality, particularly in fields with high backward citation intensity.

Policy anticipation: The phenomenon whereby forward-looking agents adjust behavior in response to credible news about future policy changes before those changes take effect. In this paper, anticipation induces a pre-implementation acceleration in patenting that temporarily pushes innovation in the opposite direction from the direct long-run effect and generates persistent indirect post-implementation effects through knowledge spillovers.

Pending period: The time between patent application and grant during which USPTO examines the application and during which full monopoly rights are not enforceable. Field-level heterogeneity in pending periods — arising from differences in examination complexity and USPTO unit congestion — is the source of cross-sectional identification in the DiD design.

Patents, News, and Business Cycles

Mon, 01 Jan 0001 00:00:00 +0000

This paper constructs an instrumental variable for technology news shocks using patent applications, relaxing all identifying assumptions traditionally used in the news-shock literature. The IV is the component of patent applications orthogonal to pre-existing beliefs (Survey of Professional Forecasters), contemporaneous and lagged monetary and fiscal policy changes (narrative accounts), and own lags. The instrument recovers news shocks that have no effect on aggregate productivity in the short run but are a significant driver of its trend component. The shock prompts a broad-based expansion in anticipation of the future TFP increase—output, consumption, and investment all rise well before any material increase in TFP is recorded. Despite these positive conditional co-movements, the news shock accounts for only a modest share of macroeconomic fluctuations at business cycle frequencies. Financial markets price in news shocks on impact, while most macro aggregates respond with some delay. Previously circulated as “When Creativity Strikes: News Shocks and Business Cycle Fluctuations.”

Summary of a forthcoming paper, AI-assisted and human-reviewed. See the linked original for the authoritative claims and full conditions.

In depth

Q1. What is the identification strategy and why does it relax traditional assumptions?

The paper constructs an IV for technology news shocks as the component of patent applications orthogonal to pre-existing beliefs (SPF), narrative accounts of monetary and fiscal policy, and own lags—the sole identifying assumption is that no structural disturbance other than contemporaneous technology news affects the U.S. economy through this IV. Traditional identification requires combining zero restrictions on the impact response of TFP with assumptions about its long-run drivers (e.g., Beaudry-Portier 2006 assumes news shocks are the sole long-run driver of TFP). The patent-based IV avoids all of these assumptions, relying only on the exclusion restriction that patent applications, after controlling for expectations and policy, capture news about future technological change and nothing else.

Q2. How do patent applications contain information about future technology?

Patent applications contain information about potential future technological change because exclusive rights create a powerful incentive to apply as early as possible, making patent applications lead TFP improvements by years, while controlling for contemporaneous economic conditions removes the endogeneity of patent filings to current booms. The length of time between application and the eventual diffusion of the innovation within the economy can be several years. The filing date serves as the first measurable time at which the news occurs, even though the underlying idea predates the application. The component of applications orthogonal to SPF forecasts and policy changes represents news about future technology not driven by current conditions.

Q3. What are the macroeconomic effects of technology news shocks?

Technology news shocks generate a broad-based expansion—output, consumption, and investment all rise well before any material increase in TFP is recorded—and financial markets price in news shocks on impact, while most macro aggregates respond with some delay. The positive conditional co-movements are consistent with optimism about future income and productivity generating pre-emptive expansion. Despite these theoretically attractive features, the news shock accounts for only a modest share of macroeconomic fluctuations at business cycle frequencies.

The finding that news shocks account for only a modest share of macro fluctuations at business cycle frequencies implies that, while identified news shocks behave consistently with the news-driven business cycle hypothesis in qualitative terms, they contribute only modestly to aggregate volatility—a finding that differs from models in which news shocks are a primary driver of cycles. This quantitative finding is informative precisely because the identification is instrument-based and free of the theoretical priors imposed by traditional sign-restriction and FEVD approaches, lending credibility to it as an estimate of the true importance of news shocks.

Key concepts

technology news shock : a shock that raises expectations about future aggregate TFP growth without any immediate change in current TFP; the paper’s IV identifies shocks that have no short-run effect on TFP but are a significant driver of its trend component. patent-based instrument : the component of patent applications orthogonal to pre-existing macroeconomic beliefs (SPF), contemporary monetary and fiscal policy changes (narrative accounts), and own lags; used as an IV for technology news shocks that avoids traditional identifying restrictions. news-driven business cycle hypothesis : the proposition that economic fluctuations can arise from changes in agents’ expectations about future fundamentals (particularly future productivity) even absent any current change in those fundamentals; the paper finds qualitative support but only modest quantitative importance.

Peer Effects in Consideration and Preferences

Mon, 01 Jan 0001 00:00:00 +0000

This paper develops a general nonparametric model of discrete choice in which peers influence agents through two distinct channels: (1) the set of alternatives an agent considers (consideration set effects) and (2) the agent’s preferences over those alternatives (preference effects). The framework embeds these peer mechanisms in a continuous-time Markov process where agents revise choices at Poisson alarm-clock rates. A peer is classified as a consideration peer, a preference peer, or both, and the network is encoded as two directed edge sets rather than one.

The central identification challenge is recovering network structure, consideration probabilities, and preferences simultaneously, without relying on exogenous variation in covariates or the menu of available options. The paper shows this is achievable using time-series variation in the choices made by connected agents. The key insight is that consideration peers who adopt alternative v change the probability that the focal agent considers v — entering only the “consideration” term of the conditional choice probability (CCP) — while preference peers who adopt alternatives other than v change only the “conditional-on-consideration” selection probability. These cross-alternative patterns in the CCPs allow the researcher to distinguish the two channels. Once consideration-only peers are isolated, their choices serve as exclusion restrictions that mimic artificial menu variation, enabling nonparametric recovery of preferences.

Identification proceeds in stages: (i) recover the full reference group of each agent from changes in CCPs; (ii) separate consideration-only peers from preference-affecting peers using cross-order effects across alternatives; (iii) distinguish preference-only peers from consideration-and-preference peers under an exclusion restriction (Assumption 4) requiring that an agent with a dual-channel peer also has at least one single-channel peer; (iv) recover consideration ratios Q(v|n+1)/Q(v|n) and then the full choice rule. The results allow arbitrary heterogeneity across agents and do not require exogenous menu variation or covariate shifters.

For continuous-time data (Dataset 1), the CCPs and Poisson rates are exactly identified from the observed revision history. For discrete-time panel data (Dataset 2), identification is generic under a mild eigenvalue condition on the transition rate matrix.

The empirical application studies store-opening decisions by China’s two dominant high-end tea chains — Heytea and Nayuki — across prefecture-level cities from their founding through end-2020. By that date, Nayuki had 485 stores in 57 cities and Heytea had 729 stores in 46 cities, in an industry whose total revenue grew from 42.2 to 83.1 billion yuan between 2017 and 2020. Each firm-market pair is modeled as an agent deciding whether to open a new store. The key exclusion restriction is that the cumulative store count of either firm in geographically neighboring markets shifts consideration probabilities but does not enter marginal profitability directly.

Estimation via maximum likelihood yields four substantive findings: (1) Firms exhibit limited consideration — consideration probabilities for markets with no prior presence by either firm are substantially below one. (2) Stores in neighboring markets significantly raise consideration probabilities for a given market, for both own-firm and rival stores; this peer effect in consideration is described as economically large. (3) Own-market store density raises marginal profitability (density economies) while rival presence lowers it (competitive effects). (4) A full-consideration model that omits the attention stage overestimates the negative competitive effect and underestimates positive density effects.

Counterfactual simulations show that removing attention constraints (full consideration) accelerates market penetration substantially: firms enter new markets earlier and achieve broader geographic coverage. Removing peer effects in consideration only — while retaining attention constraints — slows the diffusion of store openings across neighboring markets, because peer effects in consideration function as an informational cascade. Limited consideration also reduces competition by delaying rival entry into high-profitability markets, explaining a significant share of the geographic concentration in first- and second-tier cities during the early expansion phase. The paper’s scope is limited to settings with repeated, non-durable choices; it does not model forward-looking behavior or multiple equilibria, which the authors note as directions for future research.

Q: What are the two peer-effect channels in the model, and how do they differ structurally? A: A consideration peer influences whether an alternative enters the agent’s consideration set — specifically, the probability Q_a(v | n) that alternative v is considered is a function of the number n of consideration peers currently adopting v. A preference peer influences the choice rule R_a(v | y, C) — the probability that v is selected conditional on it being in the consideration set. Importantly, the paper models the two channels as affecting logically separate stages of the decision process, so the observed CCP factors into a consideration term and a conditional-selection term that respond to distinct sets of peers.

Q: Why does the standard identification approach of varying menus fail here, and how does the paper substitute for it? A: Menu variation requires the researcher to observe the same agent facing different sets of available alternatives, which is unavailable in many empirical settings. The paper replaces exogenous menu variation with endogenous variation generated by consideration-only peers: when a consideration-only peer adopts alternative v, the focal agent’s probability of considering v rises, effectively mimicking the removal of other alternatives from her consideration set. This peer-induced variation in consideration is then used to trace out the choice rule R_a over counterfactual menus without any actual menu changes.

Q: How does the paper separate consideration peers from preference peers in the data? A: The decomposition exploits an asymmetry in how the two peer types appear in the log-CCP. When a consideration peer switches to alternative v, the term ln Q_a(v | .) changes but the conditional-selection term ln D_a(v | .) remains unchanged, because the agent already considers v. Conversely, when a preference peer adopts an alternative other than v, only the conditional-selection term shifts. The paper formalizes this via cross-order effects of peers across alternatives in the CCPs (Propositions 3.1–3.3) and invokes Assumption 4 — requiring at least one single-channel peer when a dual-channel peer exists — to complete the separation.

Q: What is Assumption 4 and why is it necessary? A: Assumption 4 states that if agent a has a peer in N_CR_a (a peer affecting both consideration and preferences), then a also has at least one additional peer affecting only consideration or only preferences. Without this exclusion restriction, the consideration and preference effects of a dual-channel peer are not separately identified from each other; the single-channel peer provides the variation needed to pin down each component separately.

Q: What does Proposition 2.1 establish and what does it require? A: Proposition 2.1 establishes existence and uniqueness of an invariant equilibrium distribution mu over choice configurations, with full support. It requires Assumptions 1 (independent consideration), 2(i) (strictly positive consideration probability for every alternative), and 3(i) (strictly positive probability of selecting any non-default alternative from some reachable consideration set). The continuous-time Poisson structure ensures zero probability of simultaneous revisions, which rules out multiple equilibria in the data-generating process.

Q: How does the paper handle discrete-time panel data, where only periodic snapshots of choices are observed? A: The paper invokes results from Blevins (2017, 2026) to show that the transition rate matrix W of the continuous-time process is generically identified from the discrete-time transition matrix observed at interval Delta, provided the eigenvalues of W do not differ by integer multiples of 2pii/Delta. Once W is identified, the CCPs P and Poisson rates lambda_a are recovered. This result is described as generic, meaning it holds except on a measure-zero set of parameter values.

Q: What data does the empirical application use, and what are the key sample statistics? A: The application uses city-level store registration data sourced from the National Enterprise Credit Information Publicity System (via CnOpenData, 2021), supplemented by regional statistics from the China City Statistical Yearbook (2016–2021). The sample ends in 2020 to avoid COVID-19 demand shifts. By end-2020, Nayuki had 485 stores across 57 cities and Heytea had 729 stores across 46 cities. The high-end tea industry’s total revenue grew from 42.2 to 83.1 billion yuan between 2017 and 2020.

Q: What is the key exclusion restriction in the empirical specification, and why is it plausible? A: Stores in geographically neighboring markets (parameterized by distance bins d(m,m’)) enter the attention index pi_tilde but are excluded from the marginal profit index pi_bar. The rationale is that nearby store counts are informative signals that draw managerial attention to a market (an informational spillover) but do not directly alter the profitability of operating in that market — profitability depends on local demand, competition within the market, and own firm density, not on activity in adjacent markets. This restriction identifies the consideration-only peer channel.

Q: What does the paper find about biases from ignoring limited consideration? A: When the two-stage model (consideration + choice) is replaced by a single-stage full-consideration model, the estimated payoff parameters differ substantially. Specifically, the full-consideration model overestimates the negative effect of competition (rival presence in the same market) and underestimates the positive effect of own-store density. The intuition is that correlated entry patterns driven by shared consideration spillovers are misattributed to payoff interactions when the consideration stage is omitted.

Q: What do the counterfactual simulations show about the role of limited consideration in market dynamics? A: Three counterfactuals are compared against the baseline. Under full consideration (no attention constraints), market penetration is substantially faster — firms enter new markets earlier and achieve broader geographic coverage. Removing peer effects in consideration while retaining attention constraints slows geographic diffusion because the informational cascade that propagates entry to neighboring markets is eliminated. Limited consideration also reduces competition by delaying rival entry into high-profitability markets; markets with high potential demand remain underserved for longer. Collectively, limited consideration explains a significant portion of the geographic concentration of tea chain stores in first- and second-tier cities during the early expansion period.

Q: What forms of heterogeneity does the identification allow, and what does it not require? A: The nonparametric identification results accommodate arbitrary heterogeneity across agents in consideration mechanisms Q_a, choice rules R_a, Poisson revision rates lambda_a, and network positions. The identification requires neither exogenous covariates that shift preferences or consideration, nor variation in the set of available alternatives across observations. It relies solely on time-series variation in the choices made by connected agents, which are endogenous to the model and are themselves identified in the first stage.

Q: How does the paper model history dependence, and does it change the main identification results? A: Section 4.1 extends the model to allow consideration probabilities and choice rules to depend on the agent’s own choice history h_t in addition to the current configuration y. Proposition 4.1 states that under Assumptions 1–4 applied conditional on both y_{at} and h_t, all identification propositions from Section 3.1 remain valid. The extension also allows consideration probabilities to equal one, enabling nontrivial dynamics in consideration sets driven by past choices.

Q: How is the unobservable default handled in the empirical application? A: When the default alternative (e.g., “do not open a store”) is unobserved, the Poisson revision rate lambda_a cannot be separately identified from the CCPs without normalization. The paper normalizes lambda_a = 1 for each agent in the empirical application, treating the revision opportunity rate as fixed and recovering all remaining primitives under this normalization.

Consideration set: The subset C of the full menu Y that agent a actually attends to at the moment of revision; formed before the choice rule is applied. Alternative v enters C independently with probability Q_a(v | n), where n is the number of consideration peers currently adopting v. The default alternative is always in the consideration set.

Conditional choice probability (CCP): P_a(v | y), the ex-ante probability that agent a selects alternative v given choice configuration y; equal to the product of the consideration probability Q_a(v | .) and the conditional-selection probability D_a(v | .), integrated over all possible consideration sets.

Choice configuration: The vector y = (y_a)_{a in A} recording the current alternative selected by every agent in the network simultaneously; the state variable of the continuous-time Markov process.

Consideration-only peer: A peer a’ in N_C_a \ N_R_a whose choices enter the consideration probability Q_a but not the choice rule R_a. Variation in the choices of consideration-only peers serves as an exclusion restriction that mimics artificial menu variation for identifying preferences.

Preference-only peer: A peer a’ in N_R_a \ N_C_a whose choices enter the choice rule R_a but not the consideration probability Q_a.

Cross-order peer effect: The pattern in the CCP by which a consideration peer’s adoption of alternative v changes ln P_a(v | .) but not the conditional-selection component, while a preference peer’s adoption of a different alternative v’ changes the conditional-selection component but not the consideration component; this asymmetry is the key to separating the two channels.

Limited consideration: The situation in which Q_a(v | n) is strictly less than one for at least some alternatives v and peer counts n, so that the agent does not evaluate all available options before choosing; distinct from full rationality in which all alternatives are always considered.

Mean attention index (pi_tilde): The latent index governing the consideration probability in the empirical specification; it depends on own and rival store counts in the same and neighboring markets and on firm fixed effects, but is excluded from the marginal profit index — constituting the empirical exclusion restriction that separates the consideration and payoff channels.

Robot adoption and inflation dynamics

Mon, 01 Jan 0001 00:00:00 +0000

Robot Adoption and Inflation Dynamics

Research Question

Basso and Rachedi investigate how robot adoption influences inflation dynamics — specifically, whether the surge in automation during the 2000s and 2010s can explain the muted sensitivity of inflation to unemployment (the “flat Phillips curve”) observed in advanced economies prior to the Covid pandemic, and whether the same framework can account for the subsequent resurgence of steep inflation-unemployment co-movement.

Data and Methodology

The empirical analysis uses an annual panel covering 384 U.S. metropolitan statistical areas (MSAs) from 2008 to 2018. The dependent variables are non-tradable goods inflation (log-difference of services prices excluding rents and utilities, from BEA regional price parities) and wage inflation (log-difference of average compensation per job). Robot adoption at the MSA-year level is constructed following Acemoglu and Restrepo (2020a): industry-level robots per employee at the U.S. national level are weighted by industry employment shares in each MSA, yielding an MSA-year robot-per-employee ratio.

The regression specification extends Hazell et al. (2022) by adding an interaction term between the lagged unemployment rate and the (demeaned) robot-per-employee ratio, along with MSA and year fixed effects. Year fixed effects absorb common inflation expectations and the endogenous response of monetary policy to aggregate demand shocks. To address endogeneity, unemployment is instrumented with a Bartik shift-share variable of tradable demand spillovers, and robot adoption is instrumented with average industry-level robot penetration in the five largest European economies — under the identifying assumption that robot demand shocks are weakly correlated across advanced countries.

The theoretical framework is a New Keynesian model augmented with (i) directed search frictions in the labor market, and (ii) producer-level automation decisions in the spirit of Acemoglu and Restrepo (2020a). Producers pay a fixed entry cost, draw idiosyncratic efficiency for employing labor, and then choose between a robot technology (certain output at low efficiency) and a labor technology (uncertain hiring, higher potential efficiency). This generates an automation threshold: low-efficiency producers install robots, displacing low-wage jobs. A Taylor rule closes the model. Quantitative exercises compare two steady states calibrated to robot-per-employee ratios of 0.2% (low automation, targeting the U.S. in the early 2000s) and 0.6% (high automation, calibrated to one standard deviation of robot penetration variation across MSAs).

Main Findings

Empirical. In the baseline IV regression, a one standard deviation increase in robot adoption reduces the sensitivity of price inflation to unemployment by 17%, and the sensitivity of wage inflation to unemployment by 9%, relative to a MSA with the average robot penetration. The larger flattening effect on price inflation than on wage inflation implies that robot adoption also diminishes the pass-through from wages to prices. All three effects are statistically significant at the 5% level, and are robust to controls for demographic structure (age composition, gender/race/education participation rates, MPC heterogeneity), occupational structure (abstract, routine, manual, and offshorable occupations), and import competition exposure (Chinese and Mexican import shares).

Model quantification. Comparing the high-automation to the low-automation steady state, the model generates a 14% reduction in the slope of the price Phillips curve and a 13% reduction in the slope of the wage Phillips curve, conditional on the same-sized demand shocks in both economies. The price Phillips curve result accounts for 82% of the empirical estimate (17%). The model overstates the flattening of the wage Phillips curve (13% vs. 9% in the data), and therefore understates the reduction in the wage-to-price pass-through.

Mechanisms. Automation flattens the Phillips curve through two primary channels. First, the outside option of automating production reduces workers’ bargaining power and dampens the elasticity of wages to unemployment (the “Wage Setting Effect”). Second, a higher share of robot firms reduces the aggregate labor share, muting the pass-through from wages into prices (the “Steady State Effect”). A third channel — firms cyclically substituting workers for machines in response to a shock (the “Cyclical Effect”) — operates during the transition but the Wage Setting Effect accounts for the bulk of the flattening.

Non-linearity and the post-Covid resurgence. When robot-production is subject to convex adjustment costs, the threat of automation that underlies the Wage Setting Effect becomes inoperative during large expansionary shocks. When investment in machines surges, the marginal cost of producing robots rises sharply, raising the price of machines and pushing the automation threshold downward — more firms must use labor. Workers then negotiate higher wages, which pass into prices. Conditional on small demand shocks, the high-automation economy still exhibits a flatter Phillips curve than the low-automation economy. Conditional on large demand shocks (simulated as a 2 percentage point drop in unemployment), there is no difference in the inflation response between the low- and high-automation economies, so the Phillips curve reverts to steep.

Q&A

Q1: What is the exact empirical specification and how does it map to a structural object?

The regression is: non-tradable goods inflation = β × lagged unemployment + γ × (lagged unemployment × demeaned robot adoption) + ζ × lagged robot adoption + χ × relative non-tradable price + MSA fixed effects + year fixed effects + error. In a multi-region model without automation, Hazell et al. (2022) show that the coefficient β identifies the aggregate slope of the Phillips curve because year fixed effects absorb both common inflation expectations and the endogenous monetary policy response to aggregate demand shocks. Adding the interaction term extends this logic: γ identifies how robot adoption causally shifts the slope of the local Phillips curve, which maps into changes in the aggregate slope.

Q2: What are the first-stage instruments and why are they valid?

Unemployment is instrumented with local tradable demand spillovers — a Bartik variable weighting national industry value-added growth (excluding each MSA’s own contribution) by each MSA’s average industry value-added shares, so national supply disturbances uncorrelated with MSA-level heterogeneity generate plausibly exogenous unemployment variation. Robot adoption is instrumented with the implied robot-per-employee ratio obtained by replacing U.S. industry robot installations with the average across the five largest European economies, weighted by U.S. industry employment shares; this isolates the supply-side efficiency improvements in robot technology that drove global adoption, conditional on robot demand shocks being weakly correlated across countries. The correlation between the two instruments in the sample is 0.2, ensuring they do not strongly co-move.

Q3: What are the point estimates and their magnitudes in the baseline IV regression?

For price inflation (Panel A, Column 4), the base sensitivity β = −0.5069 (SE 0.1381, significant at 1%), and the interaction coefficient γ = 0.0066 (SE 0.0030, significant at 5%). For wage inflation (Panel B, Column 4), β = −0.9580 (SE 0.2450, significant at 1%), and γ = 0.0049 (SE 0.0024, significant at 5%). A one standard deviation increase in robot adoption reduces price inflation sensitivity by 17% and wage inflation sensitivity by 9% relative to the average-automation MSA.

Q4: What does the difference in flattening magnitudes (17% for prices vs. 9% for wages) imply about the wage-price pass-through?

Because automation reduces the price Phillips curve slope by proportionally more than the wage Phillips curve slope, each percentage-point change in wages translates into a smaller percentage-point change in prices in higher-automation areas. This indicates that robot adoption diminishes the influence of wage changes on price changes — i.e., it reduces the wage-to-price pass-through. In the model, this operates through the Steady State Effect: a larger share of production carried out by robot firms means that a given change in average wages applies to a smaller portion of total marginal costs, weakening the price response.

Q5: How is the automation threshold determined in the theoretical model, and what economic forces govern it?

A producer opts for the labor technology if and only if the expected value of a labor firm (= job-filling probability × (producer price × labor efficiency − posted wage) − entry cost) exceeds the value of a robot firm (= producer price × robot efficiency − machine price − entry cost). Since the value of a labor firm increases in labor efficiency, there is a unique cut-off efficiency level γ* at which a producer is indifferent. Producers with labor efficiency above γ* post vacancies; those below γ* install robots. The cut-off rises (more automation) when wages rise relative to machine prices, and falls (less automation) when machine prices rise due to costly robot production during large expansionary shocks.

Q6: How does the wage-posting equilibrium under directed search generate the Wage Setting Effect of automation?

Under directed search, each labor firm posts a wage to maximize its expected value, and workers sort into sub-markets offering higher wages but lower job-finding probabilities. The equilibrium posted wage for a firm with labor efficiency γj is Wγj,t = PP,t × γj × (1 − η), where η is the elasticity of matches to vacancies. The option to install a robot — available at any time — limits how much any individual firm needs to offer workers. When automation increases, the outside option becomes more attractive to more firms, which constrains wage offers industry-wide, reducing the elasticity of average wages to unemployment fluctuations.

Q7: How is the slope of the price Phillips curve characterized analytically?

Log-linearizing the model around the steady state and substituting labor market and wholesaler equilibrium conditions into the pricing equation yields: inflation = −[(ε−1)/φ] × Ψ(γ*; Θ) × unemployment gap + β × expected future inflation, where Ψ(γ*; Θ) is a function of the automation cut-off γ*, the elasticity of substitution ε, the matching function elasticity η, the efficiency bounds γM and γH, and the distribution shape parameter α. In contrast to standard New Keynesian models where the slope depends only on markup and nominal rigidity parameters, this expression depends directly on the degree of automation through the steady-state threshold γ*.

Q8: Across different structural parameter configurations, does automation always flatten the Phillips curve?

Yes. Numerical analysis of the closed-form Phillips curve expression (Figure 1) shows that robot adoption unambiguously decreases the slope of the price Phillips curve across all combinations of the key structural parameters — the distribution shape parameter α, the matching elasticity η, the upper bound of labor efficiency γH, and the steady-state unemployment rate ū. The flattening effect is more pronounced when η is low, when α implies a larger fraction of low-efficiency producers, and when the steady-state unemployment rate is low.

Q9: How do the three mechanism channels (Cyclical, Wage Setting, Steady State) compare quantitatively?

The paper isolates channels by comparing alternative model specifications: (i) Baseline directed search with endogenous automation, (ii) Directed search with fixed automation (removing Cyclical and Wage Setting Effects, leaving only the Steady State Effect), (iii) Random search with τ = 0.5 (efficient bargaining, retaining both the Cyclical and Wage Setting Effects), (iv) Random search with τ = 0.01 (near-zero worker bargaining power, removing the Wage Setting Effect but retaining the Cyclical Effect). Figure 5 shows that the Steady State Effect alone accounts for only a small portion of the total inflation differential between low- and high-automation economies. The Wage Setting Effect — isolated by comparing τ = 0.01 and τ = 0.5 economies with endogenous automation — accounts for the bulk of the flattening. The Cyclical Effect (isolated by comparing fixed and endogenous automation with τ = 0.01) contributes an intermediate amount.

Q10: What is the quantitative exercise comparing low- and high-automation steady states?

The low-automation economy targets the U.S. robot-per-employee ratio of 0.2% in the early 2000s (Acemoglu and Restrepo, 2020a), calibrated with robot-specific technological change ζ = 2. The high-automation economy features a 200% higher robot-per-employee ratio of 0.6%, calibrated to replicate one standard deviation of cross-MSA dispersion in robot penetration in the data. Both economies are simulated with 10,000 realizations of preference shocks, and the slopes of the price and wage Phillips curves are estimated from simulated inflation and unemployment outcomes. The price Phillips curve flattens by 14% and the wage Phillips curve by 13% moving from low to high automation, conditional on the same-sized shock in both economies.

Q11: How does the model account for the Covid-era resurgence of high inflation despite high automation?

The paper extends the machine manufacturer’s production function to include an asymmetric convex adjustment cost that activates when investment deviates more than 5% from its steady-state level (parameterized with δ = 0.0015 and ϱ = 100). Under a small expansionary shock (0.25 percentage point decrease in unemployment), inflation rises less in the high-automation economy, consistent with a flat Phillips curve. Under a large expansionary shock (2 percentage point decrease in unemployment), the surge in robot investment triggers sharply rising machine prices, eliminating the automation outside option for marginal producers and fully restoring workers’ bargaining power — so the inflation response is identical in the low- and high-automation economies, consistent with a steep Phillips curve. The paper interprets this as a proof-of-concept consistent with post-Covid wage compression evidence for low-wage workers documented by Autor, Dube, and McGrew (2023).

Q12: What do the robustness checks establish regarding alternative explanations?

The interaction of unemployment and robot adoption remains statistically significant at the 5% level across all the robustness checks (Appendix A). These include controlling for: (i) demographic heterogeneity — shares of young (below 30) and old (above 60) individuals, female/Black/Asian labor market participation, low-education attainment shares, overall participation, and MSA-level average marginal propensity to consume (MPC); (ii) occupational structure — shares of abstract, routine, manual, and offshorable occupations; and (iii) import competition — MSA exposure to Chinese and Mexican import competition. The coefficient on the robot-unemployment interaction term is stable across specifications, with the magnitude remaining close to that in the baseline (approximately 0.0140 across all demographic robustness columns in Table A.1).

Key Concepts

Automation threshold (γ):* The paper-specific level of idiosyncratic labor efficiency at which a producer is indifferent between installing a robot and posting a vacancy. Producers with labor efficiency below γ* choose the machine technology; those above choose the labor technology. The threshold is determined by the relative profitability of the two technologies, and it shifts endogenously with wages, machine prices, and job-filling probabilities. A higher γ* means more of the production sector is automated.

Wage Setting Effect of automation: The channel through which the existence of the outside option to install robots reduces workers’ bargaining power and dampens the elasticity of wages to unemployment fluctuations. Under directed search, firms’ ability to substitute machines for labor at a lower cost constrains the wage offers they need to post, so that a given decline in unemployment generates a smaller increase in average wages in higher-automation economies.

Steady State Effect of automation: The channel through which a larger steady-state fraction of robot firms reduces the aggregate labor share, so that even a given change in wages translates into a smaller change in aggregate marginal costs and prices. This channel operates even when automation cannot change upon a shock (fixed automation baseline).

Cyclical Effect of automation: The channel through which firms actively replace workers with machines in response to expansionary shocks that raise wages, generating an endogenous dampening of labor demand and putting downward pressure on the wage increase itself. This channel requires endogenous automation choices at business-cycle frequencies.

Robot-specific technological change (ζ): In the paper’s model, the parameter governing the efficiency with which machine manufacturers transform final goods into robots. A higher ζ reduces the relative price of machines (PM/P = 1/ζ), making automation more attractive to lower-efficiency producers and raising the automation threshold γ*. In quantitative exercises, variation in ζ across steady states drives differences in the degree of automation.

Price Phillips curve slope (Ψ): In the paper’s log-linearized model, the structural coefficient linking inflation to the unemployment gap. Unlike in standard New Keynesian models — where the slope depends only on the markup and nominal rigidity — Ψ is a function of the automation threshold γ*, the matching elasticity η, the efficiency distribution parameters (γM, γH, α), and the elasticity of substitution ε. Robot adoption shifts γ* and thereby changes Ψ.

Asymmetric investment adjustment cost: An extension of the machine manufacturer’s production function that imposes convex costs when robot investment deviates above 5% from its steady-state level (parameterized by δ and ϱ). This specification makes it increasingly costly to rapidly scale up automation in response to large demand shocks, causing the machine price to spike and the automation outside option to cease being effective for marginal producers, thereby restoring workers’ bargaining power and steepening the Phillips curve during large expansionary episodes.

Technology Transfer and Early Industrial Development: Evidence from the Sino-Soviet Alliance

Mon, 01 Jan 0001 00:00:00 +0000

This paper estimates the causal effect of technology and knowledge transfers on early industrial development using the Sino-Soviet Alliance of the 1950s as a natural experiment. Between 1950 and 1957, the Soviet Union supported the “156 Projects” — 139 approved civil projects for constructing technologically advanced, large-scale, capital-intensive industrial facilities in China. The intended program comprised two components: a “basic” transfer of Soviet state-of-the-art machinery and equipment (including blueprints, site surveys, and plant construction assistance), and an “advanced” know-how transfer involving Soviet experts residing in Chinese plants for roughly three years to train engineers and production supervisors in organizational, technological, and planning methods. Total investment amounted to approximately $80 billion in 2020 figures (45.7% of Chinese GDP in 1949).

Identification exploits idiosyncratic delays in project completion caused by Soviet production capacity constraints, insufficient experts, translator shortages, and miscommunication — factors documented in historical records as unrelated to project-specific characteristics. When the Sino-Soviet Split in 1960 abruptly ended the program, all 139 plants had been built but differed in what transfers they had received: 46 received both machinery and know-how (advanced), 46 received only machinery (basic), and 47 received neither (comparison). The paper verifies, via ANOVA tests, multinomial logit models, balancing regressions on 26 plant characteristics, pre-trend tests, and Oster (2019) selection-on-unobservables bounds, that the three groups were statistically equivalent prior to receiving the Soviet transfers.

The primary data source is plant-level annual reports from the Steel Association covering 94 steel firms (1,410 plants) from 1949 to 2000, matched to 304 steel plants across the 156 Projects. Supplementary sources include the declassified 1985 Second Industrial Survey (7,592 largest Chinese firms) and the China Industrial Enterprises database (1998–2013, over 1 million firms).

Three main results emerge. First, receiving only the basic (machinery) transfer had positive but short-lived effects: output of basic plants peaked at 14.7 percent above comparison plants six years after receiving Soviet machinery, then declined monotonically and became statistically insignificant after 20 years — consistent with the estimated 15–20 year life cycle of Soviet capital. Second, the advanced transfer had large and persistent effects: advanced plants’ output rose 8.4 percent relative to basic plants within two years, 19.7 percent within 20 years, and 49.5 percent cumulatively after 40 years. TFPQ of advanced plants reached 47.9 percent above basic plants after 40 years. These magnitudes held across industries in 1985 and 1998–2013 data, where value added of advanced firms was 41.4–52.0 percent higher and TFPR 39.5–49.3 percent higher than basic firms. Third, the program generated horizontal spillovers (12.9 percent higher output, 12.4 percent higher productivity for steel plants in counties hosting advanced plants) and vertical spillovers (16.4 percent productivity gain for supply-chain firms in counties of advanced nonsteel plants), with spillover effects conditional on post-1990s market liberalization to materialize in private firms.

The mechanism driving persistence is the accumulation of organizational and human capital during the advanced transfer, which enabled advanced plants — uniquely — to develop new production processes endogenously, home-fabricate continuous casting furnaces to replace obsolete Soviet open-hearth equipment, and produce export-quality steel. Advanced plants employed more engineers and high-skilled technicians, established professional schools, and their counties had 10.4 percent higher STEM university degree rates and 16.8 percent more technical schools.

Scope conditions: results apply to large-scale, capital-intensive state-planned industrial facilities in a country at an early stage of industrialization, under conditions of near-complete trade isolation (1960–1978) that prevented basic plants from compensating via imported foreign capital. The estimated aggregate contribution of the program is that, without both transfer types, Chinese real GDP per capita growth between 1953 and 1978 would have been 7 to 19 percent lower.

Q: What distinguishes the “basic” from the “advanced” Soviet transfer? A: The basic transfer involved duplication of whole Soviet plants through provision of state-of-the-art Soviet machinery, equipment, blueprints, geological surveys, and construction assistance. The advanced transfer added visits of Soviet experts — expected to stay approximately three years — to teach Chinese technicians how to operate the machinery and to provide within-firm training in engineering (math, physics, chemistry, organizational and planning methods) and supervisory management based on “scientific management” principles including quality-control methods.

Q: What caused plants to receive different levels of transfer, and why is this variation credible for identification? A: Delays arose from Soviet production capacity constraints (by 1955, one-third of annual Soviet steel-rolling output was destined for China), insufficient experts, translator shortages, and bilateral miscommunication — all documented in historical records as unrelated to project characteristics. When the 1960 Split ended the program, plants’ treatment status was determined by where they happened to be in the delivery queue. ANOVA tests find no significant differences in approval year, investment, workforce, equipment value, project length, or capacity across the three groups, and a multinomial logit on province and industry fixed effects shows no group had higher ex-ante probability of receiving either transfer type.

Q: What were the output effects of the basic transfer, and why did they fade? A: Output of basic plants was not significantly above comparison plants for the first two years, peaked at 14.7 percent higher six years after receiving Soviet machinery, then declined monotonically and became statistically insignificant after 20 years. This timing corresponds to the estimated 15-year life cycle of Soviet capital goods. TFPQ of basic plants followed the same pattern, peaking at 14.5 percent above comparison plants. Without the know-how component, basic plants could not develop new processes or home-fabricate replacement capital, so productivity advantages disappeared as Soviet equipment became obsolete.

Q: What were the output and productivity effects of the advanced transfer? A: Advanced plants’ output rose 8.4 percent relative to basic plants within two years of the Soviet transfer and 19.7 percent within 20 years, reaching a cumulative effect of 49.5 percent after 40 years. TFPQ of advanced plants increased from 8.3 percent above basic plants two years after the transfer to 47.9 percent after 40 years. These effects were driven by output growth rather than differential input use — the number of workers, coke, and iron were statistically indistinguishable across the three plant types — ruling out government input reallocation as an explanation.

Q: Did the advanced transfer affect steel quality? A: Advanced plants produced substantially more crude steel (higher quality, lower carbon content) and less pig iron than basic and comparison plants, and this quality advantage persisted well beyond the 20-year life cycle of Soviet capital. Basic plants also shifted toward crude steel initially but the quality advantage dissipated once Soviet machinery became obsolete, whereas advanced plants maintained the shift through adoption of the basic oxygen process and later continuous casting furnaces.

Q: What is the main mechanism through which the advanced transfer generated persistent effects? A: The advanced transfer equipped engineers and supervisors with organizational, technological, and planning knowledge, enabling advanced plants to develop and adopt the basic oxygen steelmaking process independently during China’s 1960–1978 period of trade isolation. Advanced plants had a 15.2 percent higher probability of using the basic oxygen process five years after the transfer and a 65.1 percent higher probability twenty years after, relative to basic plants. They also home-fabricated continuous casting furnaces, making them 26.7 to 78.4 percent more likely to use such furnaces 10 to 20 years after the transfer; basic plants showed no differential advantage over comparison plants on this measure.

Q: What role did trade openness play in the divergence between basic and advanced plants? A: Once China opened to international trade from 1978, advanced plants relied dramatically less on imported foreign capital than basic plants — likely because they had developed domestic production capabilities. At the same time, advanced plants exported 45.5 percent more steel and produced 51.1 percent more steel above international quality standards than basic plants. Basic plants showed no differential imports of foreign capital or differential exports relative to comparison plants, suggesting that once both types could access foreign machinery, basic plants lost any remaining productivity edge.

Q: What were the human capital effects of the advanced transfer? A: Over time, advanced plants opened training schools for high-skilled technicians and offered within-firm training programs for engineers. As a result, advanced plants employed more engineers and high-skilled technicians and fewer low-skilled workers than basic plants, while the human capital composition did not differentially change between basic and comparison plants. At the county level, universities hosting advanced plants were 10.4 percent more likely to offer STEM degrees, had 16.8 percent more technical schools, 14.3 percent more STEM college graduates, and 17.6 percent more high-skilled workers than counties hosting basic plants.

Q: Did the government differentially favor basic or advanced plants after the Split? A: The paper finds no evidence of special government favor. Government transfers and loans were not differentially allocated to basic or advanced plants in either the short or long run. Distance from railroads and roads did not change differentially across plant types. Measures of political connection and politician quality at the prefecture level showed no significant differences across the three groups in the 40 years after the Soviet transfer. County-level total investment and investments in related and unrelated industries were also statistically indistinguishable.

Q: What were the intra-firm spillover effects? A: Steel plants in the same firm as advanced plants increased their steel production by 24.9 percent and were 22.1 percent more productive relative to plants in the same firm as basic plants, after the Soviet transfer. Plants in the same firm as basic plants showed no differential performance relative to plants in the same firm as comparison plants. The within-firm spillovers appear driven by the transmission of new technologies and production methods through formal within-firm training programs, as supported by historical records.

Q: What were the horizontal spillover effects across firms? A: Steel plants in the same counties as advanced plants produced 12.9 percent higher output and were 12.4 percent more productive than those in counties hosting basic plants, after the transfer. They were more likely to adopt basic oxygen converters and continuous casting furnaces, and from 1978 they exported significantly more and produced more steel above international quality standards, mirroring the patterns of the advanced plants themselves.

Q: What were the vertical spillover effects? A: Steel plants in counties hosting nonsteel basic plants produced 14.2 percent more steel than those in counties hosting nonsteel comparison plants, suggesting some output spillover from basic machinery. However, only plants in counties of advanced nonsteel plants experienced a productivity increase — estimated at 16.4 percent — relative to plants in counties of basic nonsteel plants. These supply-chain firms were also the only ones to show increased adoption of basic oxygen and continuous casting furnace technology and differential engagement in trade.

Q: How did market liberalization reforms interact with the spillover effects? A: Starting in the late 1990s, privatized firms economically related to advanced plants outperformed their counterparts in terms of value added, TFPR, and exports, while state-owned firms in the same counties no longer showed a competitive advantage. New private firms locating in counties that had hosted advanced plants received an additional performance gain. At the county level, counties hosting advanced plants had on average 16.6 percent more private firms and 25.2 percent more privately-produced industrial output than counties hosting basic plants. The mechanism appears to be the stock of industry-specific human capital concentrated in those counties, which private firms could draw on once allowed to compete for workers.

Q: What is the estimated aggregate contribution of the Soviet transfer to Chinese growth? A: Province-level regressions show that each additional basic project increased province-level output by 1.1 percent per year on average, and each additional advanced project by 6.2 percent per year. A back-of-the-envelope calculation implies that without both transfer types, Chinese real GDP per capita growth between 1953 and 1978 would have been 7 to 19 percent lower.

Q: How does the paper rule out selection on unobservable characteristics? A: Using the Oster (2019) methodology, the paper finds that for the treatment effects to become statistically insignificant, selection on unobserved variables would need to be 8 to 19 times larger than selection on observed variables — a range the authors characterize as implausible given the strong balancing on observables and the historical documentation of delay causes.

Q: How does this paper differ from Heblich et al. (2020), which also studies Sino-Soviet technology transfer? A: Heblich et al. (2020) study long-run negative spillovers of the 156 Projects on counties that hosted them relative to counties that were geographically suitable but ultimately not selected, focusing on an outside-the-program comparison. This paper instead exploits within-program variation — differences across the three plant types — using plant-level data to assess short-, medium-, and long-run direct effects and spillover effects of different transfer intensities.

Basic Transfer: The provision of Soviet state-of-the-art machinery, equipment, blueprints, geological surveys, and plant construction assistance — duplicating a whole Soviet plant — without accompanying human capital or organizational training.

Advanced Transfer: The full Soviet technology and know-how package: basic machinery provision plus multi-year visits of Soviet experts who taught Chinese engineers and production supervisors organizational, technological, and planning methods based on “scientific management” principles.

Comparison Plants: Plants approved under the 156 Projects that received neither Soviet machinery nor technical assistance due to delays compounded by the Split, and which continued operating with traditional domestic technology.

156 Projects: An array of 139 approved, technologically advanced, large-scale, capital-intensive industrial facilities whose construction the Soviet Union agreed to support between 1950 and 1957 as part of the Sino-Soviet Alliance, representing 45.7% of Chinese GDP in 1949.

Tacit Knowledge: Industry- and firm-specific knowledge embodied in workers and organizations — including operational methods, quality-control procedures, and process innovation capabilities — that cannot be transferred through capital goods alone and requires extensive on-the-job training from foreign experts.

Basic Oxygen Process: A steelmaking process innovation that became predominant in the 1960s by blowing oxygen through molten pig iron to reduce carbon content; adopted by advanced plants through endogenous process development, while basic plants showed no differential adoption relative to comparison plants.

Source Text Origin: The paper’s classification scheme for the grounding of evidence — in this case, full working paper text obtained from NBER WP 29455, enabling comprehensive summary of quantitative results, mechanisms, and robustness tests.

The Power of Proximity to Coworkers

Mon, 01 Jan 0001 00:00:00 +0000

This paper studies how physical proximity to coworkers affects on-the-job training and productivity, using software engineers at a Fortune 500 online retailer observed from 2019 to 2024. The authors exploit two quasi-experimental shocks to proximity: the office closures of 2020, which eliminated proximity differentials that previously existed across team types, and the firm’s subsequent return-to-office (RTO) mandates in 2022 and 2023, which restored proximity for co-located teams while leaving geographically-distributed teams apart. The core identification strategy is a difference-in-differences design comparing engineers whose teams were co-located in a single headquarters building to those whose teams were split across two buildings a ten-minute walk apart — a distinction that became immaterial once offices closed.

The central finding is that sitting near teammates substantially increases the digital feedback engineers receive on their code. Before the office closures, engineers on co-located teams received 23.9% (1.92 comments per program) more code review feedback than engineers on multi-building teams. Once offices closed, this advantage narrowed by 18.3% (1.47 comments per program, p-value = 0.0026). The lost comments were disproportionately those predicted by a machine-learning classifier to be helpful, actionable, well-reasoned, and impactful, with high-quality comments declining by 21–23% — exceeding the overall volume decline. Face-to-face and digital communication are complements, not substitutes: proximate engineers drew on a wider pool of reviewers and asked 48.4% more follow-up questions, a differential that vanished once offices closed.

Proximity’s effects are highly heterogeneous. Gains in feedback are concentrated among less-tenured, younger, and female engineers — those with the most to learn. Junior engineers on co-located teams lost 2.03 more comments per program upon office closure than junior engineers already on distributed teams (p-value = 0.001); young engineers lost 2.47 more comments (p-value = 0.0001). Female engineers lost 38.9% more comments than their distributed female counterparts (p-value < 0.0001), partly because women stop asking as many people for feedback when they cannot do so in person.

Proximity improves code quality for inexperienced engineers. Around the second RTO (three days per week), engineers on co-located teams became 2.2 percentage points less likely to add files subsequently deleted — a measure of churn — and 1.4 pp less likely to introduce bugs, relative to distributed teams (p-values of 0.041 and 0.022 respectively). These gains were roughly twice as large for less-tenured and younger engineers. The benefits persist: engineers who spent more pre-closure time on co-located teams continued to write higher-quality code during the fully remote period.

However, mentorship is costly for those who provide it. Senior engineers on co-located teams wrote 0.76 fewer programs per month in the main codebase before closures (p-value = 0.0005), a gap that closed when offices did and widened again during the second RTO. The firm faces a fundamental tradeoff: proximity accelerates junior engineers’ human capital development while reducing experienced engineers’ immediate coding output.

These dynamics shape hiring. The firm shifted toward hiring older, more experienced engineers during closures — buying talent it could no longer build in-house — and back toward younger hires once offices reopened. Nationally, young college graduates in remotable occupations (classified per Dingel and Neiman, 2020) experienced a 0.88 pp increase in unemployment between 2017–2019 and 2022–2024, while older graduates saw a marginal decline of 0.11 pp. A triple-difference estimate finds a 0.65 pp greater increase in young workers’ unemployment in remotable versus non-remotable occupations (p-value = 0.029), a pattern that predates generative AI diffusion and is robust to controlling for AI exposure. Back-of-the-envelope, remote work accounts for an estimated 64% of the total unemployment increase among young college graduates over this period.

The paper also documents that proximity is fragile: a ten-minute walk between two buildings reduces feedback as much as being multiple states away, and even a single distant teammate imposes negative externalities on those who remain co-located, reducing their feedback by 1.71 comments per program (p-value = 0.095) via a “one Zoom, all Zoom” norm.

Q: What is the main identification strategy for the office-closure analysis, and what is the key parallel-trends evidence?

A: The authors compare engineers on co-located teams (all members in one headquarters building) to those on multi-building teams (split across two buildings a ten-minute walk apart), before and after the March 2020 office closures. Co-located teams lost more proximity when offices closed, while multi-building teams experienced a smaller shock, enabling a difference-in-differences design. Pre-closure trends in feedback are parallel across the two team types (Figure I), supporting the identifying assumption. Standard errors are clustered by team, the unit of treatment assignment.

Q: How large is the effect of proximity on total code review feedback, and how is it broken down by feedback source?

A: Before closure, co-located engineers received 23.9% (1.92 comments per program) more feedback than multi-building engineers. The DiD estimate indicates that losing proximity reduced feedback by 18.3% (1.47 comments per program, p-value = 0.0026, Column 3 of Table II). This decline stems entirely from reduced feedback from teammates; there is no detectable effect on feedback from engineers on other teams — a placebo check that supports the identification strategy and rules out explanations based on differential project complexity.

Q: How does proximity affect the quality — not just the quantity — of code review comments?

A: Using a gradient-boosted decision tree trained on 5,377 human-labeled comments, the authors predict comment quality across all 174,014 comments. Losing proximity reduced comments predicted to be helpful, well-reasoned, actionable, and likely to change the code by 21–23% — exceeding the 18.3% overall volume decline. The residual comments were lower quality: 2.9 pp fewer were helpful (p-value = 0.039), 1.7 pp fewer explained their reasoning (p-value = 0.094), and 1.9 pp fewer were likely to change the code (p-value = 0.072).

Q: What mechanisms drive the complementarity between face-to-face interaction and digital feedback?

A: Proximity increases feedback on both the extensive and intensive margins. On the extensive margin, co-located engineers draw on a wider pool of reviewers, returning less frequently to the same commenter. On the intensive margin, losing proximity reduces follow-up questions by 48.4% (0.12 questions per program, p-value = 0.0083), accounting for roughly half of the total feedback decline. The other half comes from reduced initial reviewer feedback. References to other communication channels (e.g., Slack) within code reviews also decline when proximity is lost, confirming that face-to-face and digital communication are complements.

Q: How small a physical barrier is sufficient to reduce feedback substantially?

A: A ten-minute walk between two buildings on the same headquarters campus reduces feedback by as much as being multiple states away — both groups receive significantly less feedback than engineers whose entire team sits in the same building (Figure Ib). This finding aligns with research on academics showing that different floors or buildings reduce coauthorship, and extends it to daily teammates sharing projects.

Q: What are the externality effects of a single distant teammate?

A: Through the firm’s implicit “one Zoom, all Zoom” norm, even one teammate in a different location shifts all team meetings to video calls. Engineers in the same building exchange 14.5% less feedback when even one teammate is in another building versus when all teammates are co-located (p-value = 0.037). When a new hire transforms a co-located team into a multi-building one, feedback between the original co-located teammates drops by 1.71 comments per program (p-value = 0.095); adding a new co-located hire produces no such decline.

Q: How does the effect of proximity on feedback differ by engineer tenure, age, and gender?

A: Less-tenured engineers on co-located teams lost 2.03 more comments per program upon closure than less-tenured engineers on distributed teams (p-value = 0.001). Young engineers (under 29) on co-located teams lost 2.47 more comments per program than young distributed engineers (p-value = 0.0001). Female engineers on co-located teams lost 38.9% (3.71) more comments than female engineers on distributed teams (p-value < 0.0001), partly because women draw feedback from 14.7% fewer people when proximity is lost (p-value = 0.0078), compared to a negligible 2.6% decline for men. The extra feedback women receive in person is of higher quality, not rude or condescending.

Q: How is the effect of proximity on code quality identified using the RTO design, and what are the magnitudes?

A: The RTO design compares engineers on co-located (same-city) teams to geographically-distributed teams across three periods: full closure, first RTO (two days per week), and second RTO (three days per week). The authors predict γ_closed ≈ 0 (office assignment irrelevant when closed) and γ_2nd_RTO > γ_1st_RTO (more in-office days means more proximity). Both predictions are confirmed. During the second RTO, co-located engineers were 2.2 pp less likely to add files later deleted (p-value = 0.041) and 1.4 pp less likely to introduce bugs (p-value = 0.022), with effects roughly twice as large for less-tenured and younger engineers.

Q: Does the benefit of co-location on code quality persist after remote work resumes?

A: Yes. After all engineers returned to remote work, those who had been on co-located teams pre-closure were 2.37 pp less likely to write disposable code (p-value = 0.013) and 3.09 pp less likely to introduce bugs (p-value = 0.0012). Code quality improves monotonically with the number of pre-closure months spent on co-located teams (Figure A.5). These gaps persist when including current team fixed effects, meaning within the same post-closure team, the previously co-located engineer writes higher-quality code.

Q: What is the cost of mentorship for senior engineers, and how does it manifest in coding output?

A: Senior engineers on co-located teams wrote 0.76 fewer programs per month in the main codebase when offices were open (p-value = 0.0005). Once offices closed, this gap disappeared, and senior engineers who lost proximity to their teammates saw a relative increase in output of 0.58 programs per month (p-value = 0.0014). During the second RTO, engineers with more than sixteen months of tenure on co-located teams wrote fewer programs, while no significant difference emerged for less-tenured engineers. Overall, the DiD estimate indicates losing proximity to teammates increases immediate output by 0.48 programs per month (p-value = 0.0002).

Q: How does the firm’s hiring age distribution respond to changes in proximity?

A: When offices were closed, the firm shifted toward hiring older engineers: the share of hires under age 29 fell from over half pre-closure to less than a third during the closure. After the RTOs, the firm shifted back toward younger hires. Geographic variation reinforces this: headquarters-campus hires were 7–10 years younger than those hired into distributed roles when offices were open; this gap narrowed substantially during closures when everyone was far from teammates.

Q: Does proximity affect which engineers are poached by other firms?

A: Yes. During the office closures, 1.2% of co-located engineers were poached per month, compared to 0.9% of multi-building engineers of similar tenure, age, and engineering group (p-value = 0.044). By the end of the closure period, nearly a quarter of co-located engineers had been poached versus a sixth of multi-building engineers. There is a dose response: more pre-closure time on co-located teams predicts higher poaching rates. The effect is concentrated among younger and female engineers, consistent with their feedback building more transferable general human capital. Tenure does not moderate the poaching effect, consistent with less-tenured engineers’ feedback being more firm-specific.

Q: What does national unemployment data show about the scarring effects of remote work on young workers?

A: Between 2017–2019 and 2022–2024, young college graduates (under 29) in remotable occupations experienced a 0.88 pp increase in unemployment (p-value < 0.00001), while older graduates in the same occupations saw a marginal decline of 0.11 pp (p-value = 0.053). A triple-difference regression finds a 0.65 pp greater increase in young workers’ unemployment in remotable versus non-remotable occupations (p-value = 0.029). Back-of-the-envelope, scaling this estimate by the 61% share of young graduates in remotable jobs predicts a 0.4 pp increase in young college graduates’ overall unemployment — equal to 64% of the realized 0.63 pp increase.

Q: Is the unemployment increase among young workers in remotable jobs driven by generative AI rather than remote work?

A: The authors argue against AI as the primary driver on two grounds. First, the uptick in young workers’ unemployment in remotable occupations predates the rapid diffusion of generative AI. Second, the differential increase is not concentrated among occupations with the highest AI task exposure. The triple-difference estimate is robust to controlling for occupational AI exposure using the Eisfeldt, Schubert and Zhang (2023) index. The authors acknowledge that AI may become more important as it diffuses further.

Q: How do young workers’ own office attendance decisions reflect the value of proximity?

A: At the partner firm, engineers under 29 were 8.8 pp (37.6%) more likely to come into the office during the RTOs than older engineers when on co-located teams (solid line in Figure VIIa). This difference was roughly halved on geographically-distributed teams (p-value of difference = 0.0085), indicating that the draw is specifically proximity to teammates. Co-located managers raised attendance by 2.6 pp, while co-located teammates raised it by 5.1 pp. Nationally, Stack Overflow survey data show nearly half of engineers under 25 are in the office each day, versus a quarter of older engineers (p-value < 0.00001).

Q: What does the paper imply about why remote work was rare before the pandemic despite workers’ stated preferences for it?

A: The paper offers a resolution: firms may have recognized that the value of the office lies in training for tomorrow and improving the quality — not the quantity — of work today. Remote work boosts immediate output, especially for experienced workers, but it reduces mentorship and long-run skill development. The tradeoff between current and future productivity, and between individual and collective returns to human capital, explains why firms historically resisted remote work even when workers preferred it and short-run output was unaffected.

Q: What are the implications for gender equity in remote work?

A: The findings suggest remote work has ambiguous gender effects. While remote work may help working mothers remain in the workforce, it appears costly for young women’s professional development, which is especially sensitive to physical proximity. Women receive substantially more high-quality feedback when co-located, draw feedback from a wider network in person, and lose disproportionately more feedback when proximity is lost. Young female engineers on co-located teams were also disproportionately poached — suggesting their human capital gains from co-location are more general and transferable.

Code review feedback: The digital comments engineers exchange when reviewing each other’s code before it is merged into the live codebase; the paper’s primary measure of on-the-job training and mentorship investment, distinct from mere volume because the authors also classify comments by helpfulness, reasoning, actionability, and expected impact using supervised machine learning.

Co-located team: A team in which all members are assigned to the same office building; the treatment group in the difference-in-differences designs, distinguished from multi-building teams (split across two headquarters buildings, a ten-minute walk apart) and geographically-distributed teams (members in different cities or permanently remote).

One Zoom, all Zoom norm: The implicit team practice of holding all meetings virtually if any single teammate cannot be physically present; the mechanism by which one distant colleague generates negative externalities for the remaining co-located teammates, reducing their in-person interaction and feedback.

Proximity fragility: The finding that even small physical barriers — a ten-minute walk between buildings — reduce feedback as much as being multiple states away, implying that the relationship between physical distance and mentorship is highly nonlinear near zero.

Churn (disposable code): Files that are added by an engineer but deleted within the subsequent six months, either because the code was poorly structured or because it introduced a feature later abandoned; used as one of two code quality proxies in the RTO analysis (occurring in 15% of programs).

Bugs (immediate reversions): Programs that are immediately and fully reverted after being merged, typically indicating the engineer’s changes precipitated an emergency requiring rollback to an earlier version; used as the more serious of the two code quality proxies (occurring in 3.5% of programs).

Scarring effects: The persistent adverse impact on young workers’ human capital and labor market outcomes from reduced mentorship during the remote work period; manifested both as lower code quality at the individual level and higher unemployment rates nationally among young college graduates in remotable occupations.

Remotable occupation: An occupation classified by Dingel and Neiman (2020) as feasibly performed from home; used to construct the national triple-difference analysis comparing age gaps in unemployment across remotable and non-remotable jobs before and after the pandemic.

O33 | Macro Paper Warehouse

AI and task efficiency

In depth

Q1. What is the theoretical framework linking AI to decisions and productivity?

Q2. What does the empirical evidence show about AI’s current productivity effects?

Q3. How does AI compare with prior general purpose technologies?

Q4. Why might AI effects differ across the hierarchy within firms?

Key concepts

Automation and Rent Dissipation

Heterogeneous innovations and growth under imperfect technology spillovers

Layer 1 — Overview

Layer 2 — Q&A

Key Concepts

Patent Term, Innovation, and the Role of Technology Disclosure Externalities

Patents, News, and Business Cycles

In depth

Q1. What is the identification strategy and why does it relax traditional assumptions?

Q2. How do patent applications contain information about future technology?

Q3. What are the macroeconomic effects of technology news shocks?

Q4. What does the modest share of variance explained imply?

Key concepts

Peer Effects in Consideration and Preferences

Robot adoption and inflation dynamics

Robot Adoption and Inflation Dynamics

Research Question

Data and Methodology

Main Findings

Q&A

Key Concepts

Technology Transfer and Early Industrial Development: Evidence from the Sino-Soviet Alliance

The Power of Proximity to Coworkers