Macro Paper Warehouse Forthcoming macro & monetary research
Forthcoming [Review of Economic Studies] doi:10.1093/restud/rdaf099 Online 1 Nov 2025 · Issue forthcoming

Life-Cycle Wages and Human Capital Investments: Selection and Missing Data

Laurent Gobillon

Thierry Magnac

Sebastien Roux

What this paper finds — and why it matters

Layer 1 – Overview

Research Question

This paper asks how wage inequalities build up over the life cycle when individual wage trajectories are plagued by interruptions in private-sector participation, and when the standard Missing At Random (MAR) assumption used to handle those gaps may be violated. Specifically, it asks: what is the causal effect of career interruptions on both the level and the dispersion of wages after twenty years of potential experience, and does endogeneity of those interruptions matter for the dispersion result?

Data and Sample

The empirical analysis uses the 2011 DADS Grand Format-EDP panel, a French administrative dataset merging social security records (DADS) and census extracts (EDP). The working sample covers males who entered the private sector between 1985 and 1992, aged 16-30 at entry, and observed through 2011. The authors require at least 15 years of observed private-sector wages, yielding a working sample of 7,004 males and 137,315 person-year observations. Education is grouped into four levels (high-school dropouts, high-school graduates, some college, college graduates). Participation outside the private sector – including public-sector employment, self-employment, unemployment, and non-employment – constitutes the “alternative sector” and generates missing wage observations. On average, cumulative duration outside the private sector is 3.7 years, and the average number of interruptions is 1.44.

Model and Methodology

The paper builds on a structural Ben Porath (1967) human capital model extended to two sectors (private sector and an alternative sector), yielding a reduced-form log-wage equation with five individual-specific coefficients: an intercept (initial human capital), a linear trend in potential experience (growth rate), a curvature term in potential experience (Mincer concavity), the cumulative years of interruptions, and a curvature term in interruptions. Because parameters are individual-specific, the wage equation is a random-coefficient model estimated with a fixed-effects approach.

Selection into the private sector is addressed not by a standard MAR assumption but by a weaker “Missing At Random Conditionally On Factors” (MARCOF) assumption. Sector-preference shocks, human capital prices, and depreciation rates are each decomposed into a common factor (time-varying) and an individual factor loading, plus a residual that is mean-independent of factors and loadings. Conditional on factors and factor loadings, wage residuals and sector choices are independent, making covariates – including the interruption variables – exogenous. The preferred specification includes two unobserved factors, selected by four of six Bai-Ng (2002) information criteria.

Estimation proceeds via an Expectation-Maximization (EM) algorithm adapted from Bai (2009) and Song (2013), with initial values from Moon and Weidner (2018)’s nuclear-norm convex estimator. Because individual parameters converge at rate sqrt(T) and summary statistics of their distributions suffer from incidental-parameter bias, the authors use bias-correction methods from Jochmans and Weidner (2019) for quantiles and inter-decile ranges, and from Arellano and Bonhomme (2012) for variances. Monte Carlo experiments confirm that variances remain poorly corrected even when T > 20, so the paper focuses on inter-decile ranges as the dispersion measure.

Counterfactual “average structural functions” (Blundell and Powell, 2003) are constructed by holding individual parameters fixed and manipulating the history of interruptions. These compare four scenarios: the observed benchmark, the counterfactual with no interruptions (potential wage), the counterfactual with no current-period selection, and both combined.

Main Findings

  1. Downward bias from omitting interruptions and factors. Omitting interruption variables and unobserved factors strongly downward biases estimated returns to experience after 20 years. Most of this bias is attributable to interruptions rather than to the interactive factor effects: selectivity is mainly captured through the interruption channel, not through residual factor structure.

  2. Effect on mean wages. Potential experience increases log wages by approximately 65% over 20 years, consistent with cross-country evidence from homogeneous Mincer equations. The average cost of interruptions after 20 years is approximately 10% of log wages. Reassigning interruptions to the beginning of the working life has a persistent negative effect on mean log wages that never fully recovers over 20 years, while reassigning them to the end increases mean wages above the no-interruption benchmark at every experience level.

  3. Effect on wage dispersion – a new stylized fact. Interruptions decrease, not increase, the inter-decile range of log wages after 20 years. After 20 years, with an average interruption duration of 2.47 years, interruptions decrease the inter-decile range by 0.52 log points (approximately 38%). This compression operates differentially: the 90th percentile falls by 0.34 and the 10th percentile rises by 0.18.

  4. Endogeneity explains the dispersion compression. When years of interruption are randomly reassigned across time (holding total interruption years fixed), the inter-decile range diverges upward from the observed benchmark after about 5 years. This shows that the dispersion-reducing effect of actual interruptions is due to the endogenous timing of those interruptions – specifically to the negative correlation between the timing of interruptions and potential log wages – rather than to the correlation between the structural coefficients on interruptions and potential wages (which is also negative, with a Spearman rank correlation of -0.32 between eta_i1 and eta_i3). Endogenously chosen interruptions smooth inequality over time.

  5. Current-period selection is negligible. Current-period selection into private-sector employment has no statistically significant effect on median, mean, variance, or inter-decile range of wages at any experience level, as confirmed by the small inter-decile range of the interactive factor component.

Scope Conditions

Results pertain to cohorts of French males entering the private sector between 1985 and 1992, restricted to those with at least 15 observed private-sector years. The French context is distinctive: wage inequality in the working population was stable over 1985-2011, driven in part by minimum wage policy and payroll tax exemptions for lower-skilled workers, in contrast to rising inequality in the United States and Germany. Results on timing of interruptions (eta_i3 and eta_i4) are identified only for individuals with at least two interruptions followed by re-entry (roughly those with K_T >= 2). The paper does not analyze female wages.

Layer 2 – Q&A

Q1: What is the structural model and how does it generate a reduced-form wage equation?

The model is a Ben Porath (1967) two-sector human capital model in which individuals divide time between investing in human capital and earning wages in either the private sector (e) or an alternative sector (n). Human capital accumulation in each sector has a sector-specific return rate (rho^s) and depreciation (lambda^s_t). Period utility is log income minus a quadratic investment cost, plus a sector preference shock. Solving the dynamic program backwards (because of log-linearity) yields closed-form optimal investments that are linear in the individual-specific terminal value of human capital (kappa). The resulting log-wage equation (Proposition 5) is a function of five terms: an intercept (eta_i0), a linear trend in potential experience t (eta_i1), a geometric curvature term beta^{-t} (eta_i2), cumulative years of interruptions x^(3)_it (eta_i3), and a curvature in interruptions x^(4)_it (eta_i4), all with individual-specific coefficients. This provides a tractable random-coefficient structure.

Q2: What is the MARCOF assumption and why is it weaker than MAR?

MARCOF – Missing At Random Conditionally On Factors – posits that sector-preference shocks, human capital prices, and depreciation rates each follow factor structures: a common time-varying factor (phi_t) multiplied by an individual loading (theta_i) plus an i.i.d. residual. The residuals are assumed mean-independent of factors and loadings, and independent over time. Under standard MAR, missingness is assumed independent of outcomes conditional on observables alone. Under MARCOF, residuals in the wage equation and the sector choice equation are independent conditional on (unobserved) factors and factor loadings. This is weaker than MAR because it allows the unobservable determinants of wages and participation to share common factors, accommodating the high persistence observed in human capital stocks (20-year lag correlation of 0.28, far above the geometric decay benchmark of 0.024).

Q3: How are the individual-specific parameters identified?

Under exogenous selection (or, under MARCOF, conditional on factors), identification of eta_i0, eta_i1, and eta_i2 requires variation in potential experience within the individual’s time series. Identification of eta_i3 and eta_i4 separately requires individuals to experience at least two spells out of the private sector each followed by re-entry (at least four transitions, so K_T >= 2). An individual with only one interruption spell generates proportional variation in x^(3) and x^(4), so only a linear combination of eta_i3 and eta_i4 is identified. The “flat spot” approach – using the observed fact that individuals aged 50-55 have stopped investing in human capital – separately identifies time, cohort, and age effects and provides the restriction that factors are orthogonal to the level, trend, and curvature in potential experience.

Q4: What do the distributions of estimated individual-specific coefficients look like?

Focusing on the main (two-factor) specification with bias correction: the median of the growth parameter eta_i1 is positive (consistent with rising wages with experience) and the median of the curvature parameter eta_i2 is negative (consistent with concavity). However, heterogeneity is substantial: the 90th percentile of eta_i1 is 6.2 times the median, and the first quartile of eta_i1 is negative (implying declining potential wages for a non-negligible share). For the interruption coefficients eta_i3 (year of interruptions) and eta_i4 (curvature), bias-corrected medians are close to zero in the sub-sample with >=2 interruptions, but dispersion is large and symmetric around zero. Bias correction reduces the 90th percentile of eta_i1 by approximately 20% and reduces the absolute 10th percentile of eta_i3 by approximately 27%.

Q5: How important are interruptions relative to potential experience and factors in explaining wage variation?

A wage decomposition using inter-decile ranges (preferred over variance due to bias) shows that the potential experience component is the largest contributor to wage dispersion, followed by the interruption component (described as “sizable”), while factors play a minor role. Crucially, the potential experience and interruption components are highly negatively rank-correlated: the Spearman rank correlation between the growth coefficient eta_i1 and the interruption coefficient eta_i3 is -0.32. This negative correlation is central to understanding why interruptions compress dispersion rather than expanding it.

Q6: What is the finding on the effect of interruptions on mean wages, and what does the timing experiment show?

After 20 years, the average cost of interruptions (relative to a counterfactual of no interruptions) is approximately 10% of log wages. The timing of interruptions matters: reassigning interruptions to the beginning of the working life causes a persistent loss in mean log wages that does not fully recover over the 20-year horizon, while reassigning them to the end raises mean log wages above the no-interruption level at every experience level. For median wages, the early-interruption loss is eventually recovered (median log wages do catch up), but the mean does not catch up. These asymmetries are consistent with early interruptions having a larger negative effect on human capital accumulation due to the geometric structure of investment returns.

Q7: What is the key finding on wage dispersion and what explains it?

Interruptions compress the inter-decile range of log wages by 0.52 log points (approximately 38%) after 20 years, with average interruption duration of 2.47 years. This compression is asymmetric: the 90th percentile of wages falls by 0.34 and the 10th percentile rises by 0.18. The dispersion-reducing effect is established by comparing the benchmark (observed interruptions) to the counterfactual of no interruptions. When interruptions are instead randomly reassigned across time (holding total interruption duration fixed), the inter-decile range diverges upward from the benchmark starting around 5 years of experience. This demonstrates that the compression is due to the endogenous timing of interruptions – individuals who have high potential wages tend to time their interruptions in ways that reduce the measured spread of actual wages – rather than to the negative structural coefficient (eta_i3 < 0 for high-wage workers on average).

Q8: How does the paper handle the incidental parameter problem for distributional statistics?

Because individual parameters are estimated at rate sqrt(T) and the panel is unbalanced (some individuals observed for as few as 15 years while the model has up to 7 individual parameters), standard distributional statistics like the variance suffer from substantial incidental parameter bias. Monte Carlo experiments show that bias-corrected variance estimates remain strongly biased even at T > 20. Inter-decile ranges are better behaved and the Jochmans and Weidner (2019) bias-correction procedure reduces their bias satisfactorily. This is why the paper reports inter-decile ranges as its primary dispersion measure rather than variances. The bias in corrected inter-decile ranges is at most approximately 10% of the uncorrected estimate.

Q9: What does the paper show about the MAR assumption in the context of this data?

The results directly challenge the MAR assumption that is standard in the life-cycle earnings literature. Under MAR, interruptions would be treated as random conditional on observables, and their endogeneity would be ignored. The paper shows that treating interruptions as endogenous (through the MARCOF + structural model approach) substantially changes estimated returns to experience (there is a strong downward bias when interruptions and factors are omitted) and reverses the sign of the effect of interruptions on dispersion (under exogenous interruptions, randomly reassigned, dispersion would be higher than observed; the actual compression is an artifact of endogenous timing). The conclusion is that MAR assumptions produce systematically misleading pictures of life-cycle wage inequality dynamics.

Q10: What are the robustness and external validity considerations?

The working sample excludes individuals observed fewer than 15 years. A robustness exercise compares the subsample observed 10-14 years to a censored version of the 20+ subsample with matched marginal distributions of observation counts. Median profiles for the uncensored and censored 20+ samples are similar, and inter-decile ranges are slightly more dispersed in the censored sample only for potential experience greater than 7. However, the 10-14 year sample shows substantially different patterns – larger median gaps between benchmark and no-interruption cases, and a larger inter-decile range – consistent with lower private-sector returns to human capital for that group. The authors conclude that selection into the 15+ working sample matters, and results are explicitly restricted to that working sample. The French context (stable aggregate wage inequality, minimum wage policy) limits direct comparability to countries with rising inequality.

Key Concepts

MARCOF (Missing At Random Conditionally On Factors): The paper’s central identifying assumption, weaker than standard MAR. It posits that sector-preference shocks, human capital prices, and depreciation rates follow factor structures (common time-varying factor x individual loading + i.i.d. residual), and that residuals are mean-independent of factors, loadings, and their own histories. Conditional on factors and loadings, wage residuals and sector-choice residuals are independent, making selection exogenous.

Interactive effects / factor structure for selection: An approach in which unobserved confounders are modeled as a bilinear product of time-varying common factors (phi_t) and individual factor loadings (theta_i). This allows flexible correlation between wage processes and participation choices without requiring exclusion restrictions or instrumental variables. The paper’s preferred specification uses two unobserved factors identified by Bai-Ng information criteria.

Average structural functions: Objects defined by Blundell and Powell (2003) that integrate counterfactual outcomes (wages evaluated at a manipulated interruption history) over the distribution of individual-specific parameters. They allow estimation of the causal impact of a change in interruption timing or presence while holding individual structural parameters fixed, under identification conditions analogous to those of Chernozhukov et al. (2013).

Individual-specific coefficients (random coefficients): The five parameters (eta_i0, eta_i1, eta_i2, eta_i3, eta_i4) governing each individual’s wage equation, with structural interpretations: initial log human capital, return to potential experience, curvature (Mincer concavity), effect of cumulative interruption years, and curvature in interruptions. Their individual-specificity is the source of the incidental parameter problem for distributional statistics.

Flat spot approach: An identification device (from Heckman, Lochner, and Taber, 1998; Bowlus and Robinson, 2012) that uses median wages of workers aged 50-55 – who are assumed to have stopped investing in human capital – as consistent estimates of human capital prices by education group and year. This separates the volume of human capital from its price, and provides the restriction identifying the level, trend, and curvature factors from the time-varying unobserved factors phi_t.

Interruption variables x^(3) and x^(4): Reduced-form variables derived from the structural model summarizing the history of private-sector participation gaps. x^(3)_it is the cumulative number of periods spent in the alternative sector prior to date t; x^(4)_it is a geometric-weighted version of those interruptions that reflects the timing (early vs. late) through the discount factor beta. They enter the wage equation with individual-specific coefficients that are identified only for workers with at least two complete interruption spells.

Mincer dip: A U-shaped profile in wage variance (or inter-decile range) over potential experience, predicted by the Ben Porath model because high-return workers invest more at the start of their careers (reducing current wages), causing their wage profile to cross below then above low-return workers. Estimated in this paper at approximately 5 years of potential experience under the main specification.

Incidental parameter bias in distributional statistics: The bias that arises when estimating moments or quantiles of the distribution of individual-specific parameters that converge at rate sqrt(T) rather than sqrt(N). The paper shows through Monte Carlo experiments that variance estimates remain substantially biased even after Arellano-Bonhomme (2012) correction when T >= 20, while inter-decile ranges corrected by Jochmans-Weidner (2019) are more reliable.

How this summary was made. Bibliographic fields are pulled from Crossref and OpenAlex and are not model-generated. The summary was drafted from the open-access manuscript , checked by a claim-grounding and calibration review pass, and approved before publishing. Found an error or a misrepresentation? Flag it here — corrections are welcome, especially from the authors.