R13 | Macro Paper Warehouse

An Equilibrium Analysis of the Effects of Neighborhood-Based Interventions on Children

Mon, 01 Jan 0001 00:00:00 +0000

Overview

Research question. How should governments design neighborhood-based policies to improve long-run outcomes for children, once one accounts for general equilibrium (GE) forces—endogenous rents, neighborhood quality, wages, and distortionary taxation—that small-scale experimental studies cannot identify?

Model. The paper embeds neighborhood effects into a quantitative, heterogeneous-agent overlapping-generations (OLG) model with endogenous location choice and child skill development. The economy has three building blocks: (1) a dynastic life-cycle structure in which parents choose a neighborhood (from two options: a disadvantaged n=1 and an advantaged n=2) and allocate time to child development, with child skills produced by a nested CES aggregator combining parental time and neighborhood quality (proxied by per-capita income in the tract); (2) a GE Aiyagari incomplete-markets framework with endogenous labor supply, wage uncertainty, and progressive labor taxation; and (3) a government that finances housing vouchers or place-based wage subsidies by adjusting the labor income tax parameter, with all additional net expenses fully offset by tax revenue. Housing supply is upward-sloping (elasticity 1.75, from Saiz 2010), so rents are endogenous.

Data and calibration. The model is estimated by simulated method of moments to match U.S. data from the 2000s, drawing on the PSID, NLSY, ATUS, the 2012–2016 ACS, and the Opportunity Atlas (Chetty et al. 2018). Neighborhoods are mapped to Census tracts divided into bottom-10-percent and top-90-percent median household income groups within each commuting zone. Key targeted moments include the income gap between neighborhoods (108 percent higher mean individual income in n=2), the 30 percent higher incomes for children from low-income families raised in the better neighborhood, and a 32 percent gap in weekly parental time with children across neighborhoods.

Validation. Before policy counterfactuals, the calibrated model is validated against two bodies of reduced-form evidence. First, a simulated small-scale, single-generation, partial-equilibrium voucher experiment generates 23 percent higher income for children—close to the 31 percent MTO experimental estimate from Chetty et al. (2016), with the difference largely explained by a smaller poverty-rate contrast (18 vs. 22 percentage points) in the simulation. Second, a simulated 20 percent place-based wage subsidy generates 17–21 percent earnings gains for adult residents of n=1, consistent with Busso et al.’s (2013) quasi-experimental EZ estimates of 17–24 percent.

Main findings — housing vouchers. The welfare-maximizing voucher program features a 100 percent subsidy rate, targets households with children and wages below the 80th percentile (fourth quintile), and is financed by progressive labor taxes. In the long-run steady state this policy raises 12.5 percent more children in the advantaged neighborhood, increases labor productivity by 1.1 percent, reduces income inequality (variance of log after-tax lifetime earnings) by 6.3 percent—comparable in magnitude to the Sweden–U.S. after-tax inequality gap—and raises upward mobility by 27.7 percent (roughly half its standard deviation across U.S. Census tracts). The average marginal tax rate must increase by 15.7 percent to fund the program. Despite this, long-run welfare rises by 3.4 percent in consumption equivalence units. A decomposition shows that intergenerational dynamics add 11.5 percentage points to welfare (relative to a short-run, single-generation scenario), while taxation subtracts 10.2 percentage points, and rent plus neighborhood-quality effects together subtract only 1.4 percentage points—leaving the net long-run GE gain similar to the short-run partial-equilibrium gain of 3.5 percent. Crucially, non-targeting children generates welfare losses of 5.0 percent, confirming that restriction to households with children is essential.

Main findings — place-based wage subsidies. A 12 percent wage subsidy to workers in the disadvantaged neighborhood yields the highest steady-state welfare gain of 0.7 percent. This is approximately one-fifth of the gain achievable with the optimal voucher. The subsidy induces substantial resorting toward n=1, reducing the share of children in n=2 by 6.7 percent while raising neighborhood quality in n=1 by 19.7 percent. Income inequality falls by 8.7 percent and upward mobility rises by 20.4 percent. However, in a short-run partial-equilibrium setup, the wage subsidy has a negative welfare effect of −1.0 percent because it draws parents (and their children) into the disadvantaged area; the positive net effect only emerges through long-run intergenerational channels (+2.5 percentage points) and equilibrium neighborhood-quality adjustments.

Political economy. Because voucher gains are concentrated among young cohorts (those aged 16–43 at introduction), only 33 percent of incumbent adults would rationally vote for the housing voucher program. In contrast, the place-based wage subsidy provides positive average welfare gains for all age cohorts alive at introduction, yielding estimated majority support from over 63 percent of adults. This creates a fundamental political economy tradeoff: the policy with the larger long-run social gains lacks majority democratic support, while the policy with broader support delivers smaller long-run gains.

In depth

Q1. What are the two market frictions that justify government intervention in the model?

A1: The first friction is the absence of intergenerational borrowing markets: parents cannot borrow against their child’s future income, which limits the parent’s willingness to pay the higher rent in n=2 to give their child a developmental advantage. Housing vouchers act as a tax-financed substitute for this missing contract by paying the rent premium and recovering the cost through taxes on the high-earning adults the children become. The second friction is a neighborhood externality: individuals do not internalize the effect of their own income on the neighborhood quality experienced by neighbors’ children. Place-based wage subsidies partially correct this externality by subsidizing work in the disadvantaged area, raising local income per capita and thereby improving the neighborhood quality index for all children resident there.

Q2. How is neighborhood quality defined and modeled, and why is this specification chosen?

A2: Neighborhood quality sn is defined as total income per capita (the sum of labor and capital income) for all residents of neighborhood n, including non-workers. This specification is intended to capture multiple mechanisms: school quality (which depends on local tax bases), role-model effects from productive adults, and social organization effects through adult supervision of children. The formulation includes retired and non-working residents, which means the arrival of children mechanically reduces neighborhood quality per capita in the model, partially capturing a crowding channel. Formally, the neighborhood spillover function takes the power form f(sn) = A * sn^ζ, where ζ governs the elasticity of child development to neighborhood quality.

Q3. How does the paper validate the model’s key mechanism — the neighborhood effect on children?

A3: The validation mimics the MTO RCT within the calibrated model: the government provides a 100 percent rent voucher usable only in n=2 to households in n=1 with incomes below the 10th percentile, holding prices and neighborhood qualities fixed (as in a small-scale experiment). The model generates 25 percent voucher take-up and a 23 percent increase in children’s income in their late 20s. This compares to the experimental MTO estimate of approximately 31 percent. The paper attributes most of the gap to the smaller poverty-rate contrast in the simulation (18 percentage points) relative to MTO (22 percentage points), and shows that plotting the simulated result against the site-specific MTO estimates in a scatterplot of child income gains against neighborhood poverty reductions places the model prediction on the fitted line through the experimental data.

Q4. What is the quantitative role of long-run intergenerational dynamics in the voucher program, relative to other GE channels?

A4: The decomposition in Table 5 isolates four GE channels. Starting from a short-run partial-equilibrium welfare gain of 3.5 percent (for the children of a single treated generation), allowing the economy to operate for the long run while holding prices and taxes fixed raises welfare to 15.0 percent — an increase of 11.5 percentage points — because improved skills in one generation create higher-skilled, higher-income parents who invest more in the next generation. Introducing housing market price adjustments (rents rise by 3.9 percent in n=2) reduces welfare by only 0.6 percentage points. Allowing neighborhood quality to adjust (quality in n=2 falls by 4 percent as lower-income families move in) reduces welfare by an additional 0.8 percentage points. Adding full taxation to balance the government budget reduces welfare by 10.2 percentage points, from 13.6 to 3.4 percent. The four channels nearly cancel, leaving the long-run GE steady-state gain close to the short-run single-generation gain.

Q5. Why does the optimal voucher program require targeting to families with children, and what happens without this restriction?

A5: When the voucher is extended to all households regardless of children (Column 6 of Table 4), nearly 82.6 percent of the population receives a subsidy, pushing almost everyone to n=2. Rents in n=2 rise by 5.3 percent. To finance this much broader program, the average marginal tax rate must increase by 44 percent, far exceeding the 15.7 percent required for the children-targeted program. The large tax increase suppresses labor supply and income, which reduces neighborhood quality in n=2 by 11.6 percent. The net effect is a welfare loss of 5.0 percent. The intuition is that the benefit of the voucher program flows primarily through child skill development, so subsidizing adults without children is fiscally expensive without producing the intergenerational gains that justify the cost.

Q6. What drives the difference in long-run welfare gains between vouchers (3.4 percent) and place-based wage subsidies (0.7 percent)?

A6: The primary channel is labor productivity. The optimal voucher program raises labor productivity by 1.1 percent by increasing the average neighborhood quality to which children are exposed by 1.2 percent. The wage subsidy raises productivity by only 0.2 percent because it induces resorting toward the disadvantaged neighborhood, meaning children’s average neighborhood quality actually decreases by 0.2 percent despite large improvements in n=1’s quality (up 19.7 percent), since fewer children reside in n=1 after the subsidy draws their parents there. Inequality reduction is not the source of the gap: the wage subsidy actually reduces inequality more (8.7–8.9 percent) than the voucher (6.3 percent), but this inequality effect does not translate into larger aggregate welfare because productivity effects dominate.

Q7. How does the wage subsidy produce positive long-run welfare when it generates negative welfare in the short run?

A7: In the short run, the wage subsidy draws parents into the disadvantaged neighborhood to exploit higher wages, which reduces the share of children in the advantaged neighborhood n=2 and lowers children’s late-life productivity (welfare of −1.0 percent for treated children in the single-generation scenario). Two long-run channels flip the sign. First, the subsidy is permanent, so children themselves receive it as adults, providing a direct wage income benefit. Second, the sustained presence of higher-income workers in n=1 raises neighborhood quality there durably (by 19.7 percent at the steady state), which benefits the children who reside in n=1. Together these intergenerational effects add 2.5 percentage points to welfare, while taxation costs reduce it by only 1.4 percentage points, yielding a net gain of 0.7 percent.

Q8. What determines the political economy divide between the two policies?

A8: For the housing voucher, welfare gains are concentrated among younger incumbent adults (ages 16–43), particularly those who are about to have or already have children, while older adults tend to lose because they face higher taxes without benefiting from improved neighborhood quality for their (now independent) children. This concentration implies only 33 percent of incumbent adults would support the voucher under the model’s welfare metric. For the place-based wage subsidy, average welfare gains are positive for every age cohort alive at introduction (though larger for younger cohorts), because the wage subsidy raises incomes for workers in n=1 immediately and benefits from equilibrium rent declines in n=1 that allow all residents to benefit. Over 63 percent of adults would support the wage subsidy. The paper notes that if the government could borrow to initially finance the voucher program and pay for it later (as in Daruich 2020 for early childhood programs), majority support for the voucher could potentially be achieved.

Q9. How sensitive are the welfare results to the key calibrated parameters?

A9: The sensitivity analysis (Table 9, following Andrews et al. 2017) shows that individual parameters would need to change substantially to overturn the conclusion that vouchers generate larger steady-state welfare gains than wage subsidies. For example, the altruism parameter β̃ would need to increase by 22 percent to eliminate the voucher welfare gain, which would require average parental transfers to rise to 198 percent of income — far from the empirical target of 125.4 percent. Using the more conservative tract-level housing supply elasticity from Baum-Snow and Han (2021) of 0.3–0.4 (about 80 percent below the baseline Saiz 2010 estimate of 1.75) would reduce the voucher welfare gain from 3.37 to approximately 2.57 percent, not reversing the qualitative conclusion. The parameters with the largest influence on welfare gains are the labor disutility parameter µ and the altruism parameter β̃; the housing supply elasticity matters more for the voucher than the wage subsidy because easier housing supply accommodates growth in n=2 without displacement under the voucher.

Q10. What does the transition path of the voucher program look like, and why do welfare gains initially dip before recovering?

A10: When the voucher is unexpectedly introduced, the first newborn cohort gains approximately 4 percent welfare, but gains for subsequent cohorts initially dip to around 3 percent before stabilizing at 3.4 percent by the 20th post-introduction cohort. The dip occurs because moving costs slow resorting: immediately after introduction, rents in n=2 begin rising and neighborhood quality there begins falling as low-income families move in, but the capital stock adjustment (which would counteract these effects by raising GDP) lags the resorting. The rebound comes as capital accumulates in n=2 over time and as intergenerational productivity gains build through successive cohorts of better-skilled parents. Labor productivity jumps noticeably for the first cohort born to parents who received the voucher (approximately 28 years after introduction) and again for the first cohort born to grandparents who received it, visibly demonstrating the intergenerational mechanism. In contrast, the wage subsidy’s welfare gains are approximately constant at 0.7 percent across all cohorts because the key channels (neighborhood quality improvement in n=1 and wage gains) materialize rapidly and remain stable throughout the transition.

Key Concepts

Neighborhood quality (sn): In this paper, neighborhood quality is not school quality or amenities in a generic sense but is explicitly defined as total income per capita — the sum of labor income and capital income — for all residents of neighborhood n, including non-workers. This endogenous measure rises when higher-income or more productive residents move in and falls when lower-income residents or additional children arrive.

Intergenerational borrowing constraint: The inability of parents to borrow against their child’s future income, modeled as a non-negativity constraint on the monetary transfer from parent to child (transfer ≥ 0). This is the paper’s first key market friction: without it, a poor parent who moved to a better neighborhood would smooth consumption across generations by having the high-earning child compensate the parent. The constraint prevents this, reducing parental investment below the socially efficient level.

Consumption equivalence (veil of ignorance): The welfare metric used throughout the policy analysis. It is defined as the percentage change in consumption that would make a newborn individual indifferent between the pre-policy and post-policy steady states, computed before knowing their position in the skill or income distribution. This is the paper’s measure of long-run steady-state welfare.

Parental investment aggregator (CES): A nested constant-elasticity-of-substitution function that determines how parental time τ and neighborhood quality sn combine to form the effective investment input I into child skill development: I = Ā[αI f(sn)^γ + (1 − αI)τ^γ]^(1/γ). The elasticity parameter 1/(1 − γ), estimated at 0.41, governs the degree of complementarity between time and neighborhood quality; a lower elasticity (γ = −1.43) implies the two inputs are complements, so parents with children in better neighborhoods also spend more time with them.

Place-based wage subsidy: A neighborhood-specific wage premium (denoted w̃s) paid to all workers who both live and work in the disadvantaged neighborhood n=1, raising their effective wage to w1 = (1 + w̃s)w2. This policy targets the neighborhood externality by increasing the income of residents in n=1, which raises neighborhood quality and provides an incentive for higher-skilled workers to relocate to (or remain in) the disadvantaged area.

Upward mobility: Measured in this paper as the probability that a child born to parents in the bottom 20 percent of the income distribution reaches the top 20 percent of the income distribution during the working stage of their own life. This is distinct from mean income rank measures; it specifically tracks cross-quintile transitions in the model’s stationary distribution.

Equilibrium decomposition: A simulation-based method in which GE channels are progressively activated. Starting from a short-run, partial-equilibrium, single-generation baseline (analogous to an RCT), the authors sequentially allow: (i) long-run intergenerational dynamics while holding prices fixed; (ii) housing market price adjustments; (iii) neighborhood quality adjustments; (iv) tax and production-price adjustments. Each step’s change in outcomes identifies the quantitative contribution of that specific channel.

The Dynamics of Internal Migration: A New Fact and its Implications

Mon, 01 Jan 0001 00:00:00 +0000

Howard and Shao document a new empirical regularity in U.S. internal migration: the t-year interstate migration rate — defined as the share of people living in a different state than they did t years ago — is approximately proportional to the square root of t. The fact is established using the Gies Consumer and Small Business Credit Panel (GCCP), a 15-year panel (2004–2018) covering approximately 1 percent of all Americans with a credit report, and is corroborated in the Panel Survey of Income Dynamics (PSID, 1969–1997), where the square root pattern holds out to a 25-year horizon. The fact is not an artifact of averaging across origins, destinations, cohorts, or age groups: most of the distribution across these cuts is concentrated close to the square root line. It holds for both people under 45 and over 45, and is robust to the choice of time period and inter-state distance.

The standard moving cost model — in which location choice is a Markov process with i.i.d. extreme-value utility shocks and large bilateral moving costs — is shown (Proposition 1) to imply that the t-year migration rate is approximately proportional to t, not sqrt(t), as moving costs tend to infinity. Simulations confirm the linear pattern persists in calibrated versions of the moving cost model even when adding state variables for prior location, home state, or age.

The paper’s main theoretical contribution is the SPACE model (Spatially and Persistently Autocorrelated Epsilons). Rather than imposing moving costs, the SPACE model assumes that person-location match-specific utility is (i) persistent over time, governed by an autocorrelation parameter rho, and (ii) spatially correlated across locations via a generalized extreme-value (cross-nested logit) structure. The model has no moving costs by default. Proposition 3 proves that as rho approaches 1, the ratio of t-year migration to 1-year migration is bounded below by sqrt(t) and above by sqrt(pi/3) * sqrt(t) — a tight bound, since sqrt(pi/3) is approximately 1.023. The calibrated rho-tilde is 0.892, implying a period-to-period autocorrelation of 1 − (1 − rho-tilde)^2 = 0.988.

The SPACE model replicates bilateral one-year migration flows, matches the decreasing hazard rate of migration conditional on duration of stay, reproduces the distribution of lifetime move counts (including the large fraction who never move and the few percent who move four or more times in 14 years), and outperforms the moving cost model at out-of-sample individual location forecasting: by 2018, the moving cost model’s mean Kullback-Leibler divergence reaches approximately 0.12 log-points per observation above the maximum-possible benchmark, versus only 0.014 log-points for the SPACE model.

Key divergences from the moving cost model arise in four areas. First, moving costs need not be large: the SPACE model rationalizes observed low migration without any moving costs, in contrast to Kennan and Walker’s (2011) estimate of average moving costs of $312,146 (2010 dollars), more than six times median household income; when moving costs are added to the SPACE model, they are roughly two orders of magnitude smaller. Second, long-run population elasticities differ sharply: in the SPACE model they remain proportional to bilateral gross migration rates, while in the moving cost model they converge to a static logit proportional to population shares — and population shares and gross migration rates have little empirical correlation, so the long-run elasticities of the two models are essentially uncorrelated across state pairs. Third, adjustment dynamics differ: in the SPACE model a permanent utility shock to Louisiana produces immediate, full population adjustment; in the moving cost model adjustment takes roughly 200 years, with Mississippi overshooting its new steady-state and New York adjusting implausibly slowly. Fourth, welfare inferences are almost reversed: the correlation between log utility changes implied by the two models using U.S. population data is −0.497, with the SPACE model attributing relative utility gains to the South and West and the moving cost model attributing gains to New York and New England.

Q: What is the square root fact, and which datasets confirm it? A: The t-year interstate migration rate scales approximately as sqrt(t). It is documented in the GCCP (2004–2018, ~1% of Americans with credit reports) and verified in the PSID (1969–1997), where the pattern holds out to a 25-year horizon. It is not driven by averaging across subgroups: the distribution of the fact across origin-destination pairs, age groups, cohorts, and starting years is concentrated close to the square root line.

Q: Why does the standard moving cost model fail to match the square root fact? A: In the moving cost model, location choice is a Markov process with i.i.d. extreme-value shocks. Proposition 1 proves that as the common component of moving costs tends to infinity, the t-year migration rate is proportional to t (linear). Because the model requires large moving costs to rationalize low migration rates, the linear prediction is unavoidable. Simulations of calibrated versions — including variants with home bias, prior-location state variables, or age — confirm the relationship remains approximately linear.

Q: What is the SPACE model, and why does it generate a square root? A: The SPACE model replaces moving costs with persistent and spatially correlated person-location match-specific utility. Utility shocks are drawn from a generalized extreme-value (cross-nested logit) distribution that allows spatial correlation, and they are autocorrelated over time with persistence parameter rho. Proposition 3 shows that as rho → 1, the ratio of t-year to 1-year migration is bounded in [sqrt(t), sqrt(pi/3)*sqrt(t)], a tight interval since sqrt(pi/3) ≈ 1.023. The intuition is that when rho is close to 1, the idiosyncratic utility process resembles a random walk, whose standard deviation grows as sqrt(t), causing migration thresholds to be crossed at a sqrt(t) rate.

Q: What is the calibrated persistence parameter, and what does it imply? A: The calibrated rho-tilde is 0.892, close enough to 1 to generate the square root fact in simulations. The implied period-to-period autocorrelation of match-specific utility is 1 − (1 − 0.892)^2 = 0.988. This calibration is achieved by solving for the largest eigenvalue of an I×I matrix of conditional migration rates.

Q: How do the two models compare on individual-level forecasting accuracy? A: Performance is evaluated using mean Kullback-Leibler divergence from the maximum-achievable log likelihood. Both models perform similarly in 2005, but by 2018 the moving cost model’s KL divergence reaches approximately 0.12 log-points per observation, while the SPACE model’s reaches only 0.014 log-points — roughly an order of magnitude better — leaving little room for improvement.

Q: How large are implied moving costs under each model? A: Kennan and Walker (2011) estimate average moving costs of $312,146 in 2010 dollars, exceeding six times the median household income. The baseline SPACE model requires zero moving costs to match observed migration levels. When an augmented SPACE model with both persistence and moving costs is calibrated to match the one-year and ten-year migration rates, the estimated moving costs are approximately two orders of magnitude smaller than those from a moving-cost-only model.

Q: How do short-run population elasticities compare across models? A: In both models, the short-run cross-elasticity of population in state i with respect to utility in state j is approximately proportional to the gross migration rate between them. Corollary 1 formalizes this for the SPACE model: dp_i/du_j = −(1/(1−rho)) * m_{i→j} for i ≠ j. This means that in the short run, both models deliver similar predictions for how populations respond to local shocks.

Q: How do long-run population elasticities differ? A: In the SPACE model, long-run elasticities remain proportional to bilateral gross migration rates — the same relationship as in the short run. In the moving cost model, Proposition 4 shows that the long-run elasticity converges to the static logit: d(log p_i)/d(v_j) = −2*p_j for i ≠ j, depending only on population shares. Since population shares and gross migration rates are empirically uncorrelated, the long-run elasticities of the two models are essentially uncorrelated across state pairs.

Q: What do the models predict about the speed of regional adjustment? A: In the SPACE model, a permanent utility shock to Louisiana causes full, immediate population adjustment in the first period with no further dynamics. In the moving cost model, the same shock generates adjustment lasting roughly 200 years. Mississippi overshoots its long-run steady state in the moving cost model due to high bilateral migration with Louisiana, while New York adjusts especially slowly due to low bilateral migration — a pattern the authors describe as potentially counterintuitive.

Q: How do the models handle events involving rapid population change, such as Hurricane Katrina? A: The SPACE model accommodates fast adjustments by assuming rapid utility changes, consistent with the observed sharp decline in Louisiana’s population share followed by a small rebound. The moving cost model requires implausible utility assumptions to match these dynamics: it implies that Louisiana utility two years after Katrina was higher than before the hurricane.

Q: What do the two models infer about which U.S. states have gained or lost relative utility over time? A: Using exact-hat algebra applied to observed U.S. population changes, the SPACE model infers that the South and West have the largest relative utility gains, while New England and the Rust Belt have the largest relative declines. The moving cost model produces nearly the opposite inference: New York and New England show relative utility gains, while the South and West show declines. The correlation between the log utility changes implied by the two models is −0.497.

Q: Why do the authors argue that spatially and temporally correlated utility is realistic, not merely a mathematical convenience? A: Surveys (Jia et al., 2023) show that people primarily cite family and employment considerations as reasons for interstate moves — both are persistent and geographically concentrated. Proximity to family is spatially correlated: if state i is close to one’s family, nearby states are also relatively close. Job opportunities in specific industries or skills are geographically clustered. Natural amenities and regional cultures are spatially correlated as well. The authors argue it is harder to defend the i.i.d. assumption of the moving cost model than the SPACE model’s correlated structure.

Q: What is the distinction between moving costs and persistent match-specific utility? A: A moving cost is a one-time irreversible cost paid upon leaving a location. Persistent match-specific utility implies that the utility change from moving is ongoing, partially reversible upon return, and decays with time away from the original location. The authors argue that many factors labeled “moving costs” in the literature — such as distance from friends or amenities — are more accurately characterized as persistent and partially reversible utility losses, a distinction previous models could not draw.

Q: Does the SPACE model replicate the gravity equation for bilateral migration? A: Yes. Proposition 2 shows that migration from i to j in the SPACE model is given by m_{i→j} = (1 − rho) * p_i * p_j * (1 + tau_ij), where tau_ij captures spatial correlation. This resembles a gravity equation: more spatially correlated location pairs have higher bilateral migration, and higher persistence (higher rho) implies lower overall migration levels.

Q: Can the SPACE model be embedded in broader quantitative spatial models? A: Yes. The SPACE model admits closed-form solutions for state populations and bilateral migration flows, is compatible with exact-hat algebra for dynamic counterfactuals, and supports computationally feasible individual-level simulations. Appendix E embeds the SPACE model in a housing model with durable local housing production and shows that slow population adjustment can emerge from housing durability rather than slow migration per se, providing an alternative explanation for regional divergence persistence.

SPACE model: A model of internal migration featuring Spatially and Persistently Autocorrelated Epsilons — person-location match-specific utility that is both autocorrelated over time (with persistence parameter rho) and spatially correlated across locations via a generalized extreme-value (cross-nested logit) distribution. The model contains no moving costs by default.

Square root fact: The empirical regularity that the t-year interstate migration rate (share of people living in a different state than t years ago) is approximately proportional to sqrt(t). Documented in GCCP data (2004–2018) and PSID (1969–1997) up to a 25-year horizon.

Moving cost model: The standard dynamic discrete-choice model of migration in which an agent living in state i chooses location j to maximize u_j − delta_ij + epsilon_j + beta*E[V’], where delta_ij is a bilateral one-time irreversible moving cost and epsilon_j is i.i.d. extreme-value. Low migration rates are rationalized by large moving costs (e.g., $312,146 average in Kennan and Walker 2011).

Persistence parameter (rho): In the SPACE model, rho governs the autocorrelation of match-specific utility over time. The calibrated value is rho-tilde = 0.892, implying period-to-period autocorrelation of 0.988. As rho → 1, the model generates a square root relationship between the t-year migration rate and t.

Population cross-elasticity: The elasticity of population in state i with respect to utility in state j. In both models it is proportional to gross bilateral migration in the short run. In the long run, the SPACE model retains this proportionality to migration rates, while the moving cost model converges to a static logit proportional to population shares.

Exact-hat algebra: A solution method for computing counterfactual equilibria in terms of ratios of new to old values (hats), without requiring knowledge of levels. The SPACE model admits simple exact-hat formulas for population changes; the moving cost model’s exact-hat algebra additionally requires tracking past population changes.

Kullback-Leibler divergence (in this context): The mean divergence between a model’s predicted distribution over future locations and the empirical distribution, used as a measure of forecasting accuracy. By 2018, the SPACE model achieves KL divergence of 0.014 log-points per observation versus approximately 0.12 for the moving cost model.

The Geography of job creation and job destruction

Mon, 01 Jan 0001 00:00:00 +0000

This paper asks why unemployment rates differ so persistently across local labor markets, and what role job creation and job destruction play in generating those differences. The authors document a comprehensive set of spatial labor market facts using administrative and survey microdata from Germany, the United States, and the United Kingdom, then build and calibrate a quantitative theoretical framework that accounts for all documented regularities.

Data and scope. For Germany, the authors use administrative data from the German employment office (universe of vacancies and unemployed, 1999–2020) and the IAB social security sample (SIAB, 2% of all workers, 2000–2017) aggregated to 194 commuting zones. For the U.S., they use BLS Local Area Unemployment Statistics (2000–2019) at commuting zones, CPS worker flows at metropolitan areas, and JOLTS vacancy data for the 18 largest MSAs (covering roughly 40% of the U.S. labor force). For the UK, they use Nomis data and Jobcentre Plus vacancy records (2004–2006) for 378 Local Authority Districts.

Empirical findings. Spatial unemployment rate differences are large and highly persistent. In Germany, the correlation of local unemployment rates across commuting zones over a 19-year span is 0.84 (West) and 0.77 (East). In the U.S., the correlation between 2000 and 2019 unemployment rates is 0.81; in the UK it is 0.76. In all three countries, local labor markets with lower unemployment are tighter (more vacancies per unemployed worker) and less productive. Firms in low-unemployment markets fill vacancies more slowly — in Germany, vacancy duration ranges from approximately 35 days in high-unemployment locations to approximately 65 days in low-unemployment locations, roughly an 85% difference.

A formal steady-state decomposition reveals that across all three countries, differences in job-separation rates account for approximately two-thirds of the cross-sectional variation in unemployment rates, while differences in job-finding rates account for roughly one-third. Specifically: Germany 62.4% separations / 33.2% job-finding; U.S. 72.0% / 32.8%; UK 64.3% / 35.8%. This primacy of separation rates in the cross-section stands in stark contrast to business-cycle dynamics, where job-finding rates account for 50–60% of unemployment fluctuations (Fujita and Ramey, 2009).

Theory. The authors embed a Diamond-Mortensen-Pissarides (DMP) model with endogenous separations — following Den Haan, Ramey, and Watson (2000) — into a Rosen-Roback spatial equilibrium framework. Locations differ in exogenous productivity; workers and firms are freely mobile; cost-of-living differences sustain the spatial equilibrium. The model is calibrated to the U.S. median-unemployment labor market (separation rate 0.0128, job-finding rate 0.2368, vacancy-filling rate 0.7365) plus the productivity differential between the 5th and 95th percentile unemployment locations (4.8% higher and 3.0% lower productivity than median, respectively). The baseline model, imposing the Hosios condition, matches the spatial patterns of separation rates, job-finding rates, tightness, vacancy duration, wages, and cost of living without targeting most of these. The decomposition in the calibrated baseline model attributes 33.5% of spatial unemployment variation to job-finding rates, compared to 32.8% in the data.

The baseline model generates a counterfactual upward-sloping Beveridge curve and cannot explain why job-finding rates dominate business-cycle fluctuations. Introducing on-the-job search (with 12% of employed workers searching each period, calibrated from Faberman et al., 2017) resolves both problems. In the extended model, job-to-job transition rates are virtually constant across local labor markets (matching the data) but strongly procyclical over the business cycle. This asymmetry amplifies the response of vacancies and job-finding rates to aggregate productivity shocks while muting the cyclical variation in separation rates. The extended model’s business-cycle decomposition attributes 54.4% of unemployment volatility to job-finding rates, within the empirical 50–60% range.

Policy implications. Under the Hosios condition, the decentralized equilibrium is efficient — large spatial differences in unemployment, tightness, and wages are efficient outcomes, not signs of mismatch. The relevant policy benchmark is not deviation of tightness from the national average but deviation from the model’s location-specific prediction conditional on local productivity.

Q: What is the central empirical puzzle the paper addresses? A: Spatial unemployment differences are large and persistent — in Germany, unemployment rates ranged from 1.9% to 11.9% across commuting zones even after 15 years of decline. These differences are not well understood theoretically, and the crucial missing empirical piece was data on job creation and vacancy filling across locations, which this paper provides for three countries.

Q: How large and persistent are cross-sectional unemployment differences in each country? A: In Germany, commuting-zone unemployment ranged from 3.6% to 24.0% in 2000 and persisted with a 19-year correlation of 0.84 (West) and 0.77 (East). In the U.S., the 2000–2019 correlation is 0.81, with unemployment as low as 1.5% and as high as 16.9% in 2000. In the UK, the 2004–2018 correlation is 0.76, with 2004 unemployment ranging from 1.8% to 13.1%.

Q: What do the data show about the relationship between unemployment and labor market tightness across locations? A: In all three countries, lower-unemployment labor markets are tighter — they have more vacancies per unemployed worker. This is documented for Germany using the universe of registered vacancies, for the U.S. using JOLTS data for 18 large MSAs, and for the UK using Jobcentre Plus administrative data. The relationship holds after controlling for local labor market composition (age, gender, education, occupation, industry shares).

Q: What do vacancy-filling rates look like across locations, and how large are the differences? A: Vacancy-filling rates are lower in low-unemployment (tight) labor markets. In Germany, the monthly probability of filling a vacancy is approximately 50% higher in high-unemployment markets than in low-unemployment markets. Completed vacancy duration ranges from about 35 days in high-unemployment locations to about 65 days in low-unemployment locations — a difference of approximately 85%. The UK data show a strikingly similar elasticity of vacancy-filling rates with respect to unemployment rates to Germany.

Q: What does the formal decomposition reveal about the sources of spatial unemployment differences? A: In a steady-state two-state decomposition, separation rates account for 62.4% (Germany), 72.0% (U.S.), and 64.3% (UK) of cross-sectional unemployment variation, while job-finding rates account for 33.2%, 32.8%, and 35.8%, respectively, with small residuals. This consistently assigns primary importance to separation rates across all three countries.

Q: Why is the primacy of separation rates in the cross section surprising, and what literature does it contrast with? A: The business-cycle literature (Fujita and Ramey, 2009; Shimer, 2012) finds that job-finding rate variation accounts for 50–60% of unemployment fluctuations over the cycle, roughly twice the contribution of separation rates. The spatial pattern is the mirror image: separations dominate. Any credible theory of spatial unemployment must rationalize both patterns simultaneously — a challenge the paper explicitly takes up.

Q: How does the baseline DMP model with endogenous separations generate the spatial patterns? A: Higher-productivity locations feature higher match surpluses. Higher surplus induces more vacancy creation and tighter markets, raising job-finding rates and lowering vacancy-filling rates. Crucially, a higher surplus means idiosyncratic shocks must be more negative to make the joint surplus negative, so fewer matches dissolve — separation rates are lower. The calibrated model reproduces the 32.8% job-finding / ~67% separation decomposition without targeting it (model yields 33.5% job-finding).

Q: What are the calibration targets and key parameter values in the baseline model? A: The model is calibrated monthly to the U.S. economy. Median-unemployment-location targets: separation rate 0.0128, job-finding rate 0.2368, vacancy-filling rate 0.7365. Productivity targets: the 5th-percentile-unemployment location is 4.8% more productive than median, and the 95th-percentile-unemployment location is 3.0% less productive. Key calibrated values include matching elasticity alpha = 0.4711 (equal to worker bargaining power under Hosios), matching efficiency m = 0.4371, vacancy posting cost kappa = 0.3070, and flow nonmarket value z = 0.9072.

Q: What are the two shortcomings of the baseline model, and how does on-the-job search resolve them? A: The baseline model generates a counterfactual upward-sloping Beveridge curve and cannot generate the asymmetry between cross-sectional and business-cycle drivers of unemployment. Adding on-the-job search (fraction phi = 0.12 of employed workers searching, calibrated from Faberman et al., 2017) resolves both. It corrects the Beveridge curve by allowing the model to match the spatial vacancy-unemployment relationship, and it introduces procyclical job-to-job mobility that amplifies the cyclical response of job-finding rates while dampening cyclical separation rate variation.

Q: How do job-to-job transition rates differ across space versus over the business cycle, and why does this matter? A: Job-to-job rates are virtually constant across the cross-section of local labor markets (the extended model is calibrated to match this). But they are strongly procyclical — high in booms, low in recessions, about as volatile as job-finding rates over the cycle. In a boom, more employed workers search, spurring vacancy creation, which raises both vacancy-filling probability (making vacancies easier to fill) and job-finding probability for the unemployed, amplifying the cyclical job-finding rate response while muting the cyclical separation rate response.

Q: What does the extended model predict for business-cycle dynamics? A: The model with on-the-job search and aggregate productivity shocks (parameterized following Hagedorn and Manovskii, 2008) generates unemployment and vacancy rates that are an order of magnitude more volatile than productivity — matching the data. Labor market tightness is about twice as volatile as unemployment, as in the data. The Fujita-Ramey decomposition in the model attributes 54.4% of unemployment volatility to job-finding rates, which falls within the empirical range of 50–60%.

Q: What is the paper’s efficiency result and its policy implication? A: Under the Hosios condition (imposed in calibration), the decentralized equilibrium is efficient: job creation and destruction are privately efficient in each market, and free mobility of workers and firms ensures efficient spatial allocation. Therefore, large observed differences in unemployment, tightness, and wages across locations are not evidence of inefficiency. The relevant signal for policy is not deviation from the national average but deviation from the model’s location-specific prediction conditional on productivity. Locations where data deviate from model predictions are candidates for policy intervention.

Q: Do the spatial patterns survive controls for worker and firm composition? A: Yes. The authors regress labor market tightness and vacancy-filling rates on local unemployment rates and a full set of composition controls (age, gender, education, occupation, and industry shares) derived from the IAB microdata for Germany, along with year fixed effects. The relationship between local unemployment and both tightness and job-filling rates remains highly statistically and economically significant after these controls, for both Germany and the U.S.

Q: How does the model handle wages and cost of living, and does it match the data? A: Wages are determined by state-contingent generalized Nash bargaining with worker bargaining power eta. Cost-of-living differences are backed out as the values needed to sustain the spatial equilibrium (Rosen-Roback). Neither wages nor costs of living are calibration targets in the cross section, yet the model closely matches the empirically observed wage gradient across local labor markets and the negative correlation between cost of living and local unemployment (using Economic Policy Institute Family Budget Calculator data).

Labor market tightness: The ratio of vacancies posted in a local labor market to the number of unemployed workers in that market; the paper documents that tightness is systematically higher (more vacancies per unemployed worker) in lower-unemployment locations across all three countries.

Job-separation rate (EU rate): The share of employed workers who transition from employment to unemployment in a period; in the paper’s framework, this is endogenously determined by the idiosyncratic match productivity threshold below which the joint match surplus turns negative, and it is the primary driver of spatial unemployment differences (accounting for roughly two-thirds of cross-sectional variation).

Job-finding rate (UE rate): The share of unemployed workers who transition from unemployment to employment in a period; in the paper’s framework, this is higher in tighter (lower-unemployment) markets, but accounts for only roughly one-third of spatial unemployment variation — the opposite of its dominant role in business-cycle fluctuations.

Spatial Beveridge curve: The cross-sectional relationship between vacancy rates and unemployment rates across local labor markets; in the data it is downward sloping (low-unemployment locations have both high vacancies and low unemployment), which the baseline model fails to capture but the extended model with on-the-job search reproduces.

Endogenous separation threshold: The location-specific minimum idiosyncratic match productivity below which the joint match surplus becomes negative and the worker-firm pair dissolves; this threshold is lower (tolerates a wider range of idiosyncratic shocks) in higher-productivity locations because the average surplus is larger, generating lower separation rates in more productive locations.

Spatial equilibrium (Rosen-Roback): The equilibrium condition in which differences in local costs of living adjust to make workers and firms indifferent across locations, sustaining persistent productivity-driven differences in wages and unemployment as equilibrium outcomes rather than disequilibrium phenomena.

Procyclical on-the-job search: The mechanism by which the fraction of employed workers actively searching — and thus the rate of job-to-job transitions — is approximately constant across the cross-section of local labor markets but strongly procyclical over the business cycle. This asymmetry is the key to reconciling why job-finding rates drive business-cycle unemployment variation while separation rates drive spatial unemployment variation.

Hosios condition: The parametric restriction equating the unemployment elasticity of the matching function (alpha) and the workers’ Nash bargaining weight (eta); when satisfied, job creation is efficient in every local labor market. The paper imposes this condition deliberately to demonstrate that the decentralized equilibrium is efficient despite large spatial differences in outcomes.