The (In)effectiveness of Targeted Payroll Tax Reductions
What this paper finds — and why it matters
Layer 1: Overview
This paper studies the cost-effectiveness of targeted payroll tax reductions as a tool for stimulating labor demand among marginalized workers, using a natural experiment from Italy. The motivation is policy-relevant: governments routinely deploy targeted payroll tax cuts to combat youth and low-skill unemployment, but such subsidies risk subsidizing inframarginal hiring — employment that would have occurred without the incentive — rather than creating net new jobs. Rigorous evaluation requires two features that are rarely satisfied simultaneously: (1) the subsidy must target genuinely marginalized workers so estimates pertain to the population of interest, and (2) variation in incentives across firms must be quasi-random so firm responses are causally identified. This paper exploits a policy that satisfies both.
The data are confidential matched employer-employee records from the Italian Social Security Institute (INPS), covering the universe of private non-agricultural firms with at least one employee from January 2003 to December 2009. The main analysis sample comprises 1,015,619 firms with policy-relevant firm size between 3 and 15 employees — the stratum containing the policy threshold. The study period spans 84 months.
The policy variation is the Italian 2007 Budget Bill (Law 296/2006), which raised employer social security contributions (SSCs) on apprenticeship contracts from a flat rate of 148 euros per year to 10 percent of annual earnings (approximately 1,200 euros per year for an average apprentice earning 12,000 euros). However, firms with at most 9 full-time-equivalent employees (excluding apprentices) received a graduated discount: 1.5 percent of earnings in the first year (180 euros) and 3 percent in the second year (360 euros). This generated a clean discontinuity in incentives at the 9-employee threshold. The discount is equivalent to roughly two months of earnings per apprentice, or about 8 percent of the cost of a typical 19-month apprenticeship.
The empirical strategy is a difference-in-discontinuities design. For each calendar month, the authors estimate a regression discontinuity specification comparing firms just above and just below the 9-employee threshold, then subtract the estimated baseline discontinuity from January 2006 (before the policy existed). This normalizes away pre-existing size-related differences in outcomes, yielding reduced-form estimates of how the policy-induced difference in SSC costs between small and large firms changed over time. The policy variation is used as an instrument for actual SSC payments to compute IV estimates of jobs supported per euro of foregone revenue.
The main finding is a precise zero: the SSC discount does not increase the number of apprenticeship contracts. The reduced-form estimates of the policy’s effect on apprentice hiring are not statistically different from zero and are tightly estimated. Firms below the threshold pay approximately 25 euros less per month in SSCs than firms above, confirming the policy has fiscal bite (first-stage F-statistic = 230), but this differential generates no detectable behavioral response in employment.
The policy also does not increase the rate at which apprentices are converted to permanent contracts (“transformations”). Firms do not adjust apprentice wages, do not substitute toward other contract types, do not churn through more apprentices, do not re-label existing contracts, and do not lower hiring standards for apprentices.
For cost-effectiveness, the IV estimates imply that each 1 million euros of foregone SSC revenue supports the employment of 29 apprentices for one year — a point estimate not statistically different from zero. The point estimate for supported permanent-contract transformations is negative (point estimate: -2), also indistinguishable from zero. By comparison, directly hiring apprentices at their prevailing wage of 1,050 euros per month would employ 79 apprentices per million euros, making direct hiring 2.7 times more cost-effective than the subsidy. The paper surveys the broader literature and finds that once existing studies’ employment effects are normalized against fiscal costs, targeted subsidies rarely appear cost-effective; hiring credits that require a new hire may outperform payroll tax cuts because they are harder to claim for inframarginal employment.
The underlying mechanism is inelastic labor demand for apprentices. Survey evidence from the RIL firm survey confirms that when firms do not hire apprentices, cost is rarely the stated reason — the most common answer is that they do not need more people. When firms do hire apprentices, the most common reason is to provide training before converting them to permanent employees, not to economize on labor costs.
Layer 2: Deep Dive
What is the identification strategy and what are the main threats to it?
The identification strategy is a difference-in-discontinuities design. In each month, a regression discontinuity (RD) specification compares firms just above and just below the 9-employee SSC eligibility threshold; the authors then subtract the baseline (January 2006, pre-policy) discontinuity estimate to remove pre-existing size-related level differences. The key identifying assumption is a ‘weak parallel trends’ assumption: the curvature of the conditional expectation function of untreated potential outcomes at the threshold is time-invariant. Threats and the evidence against them: (1) Manipulation of firm size at the threshold — addressed by showing that the CDF of policy-relevant firm size is virtually identical across all 84 months with no bunching at 9 employees before or after the reform; (2) Pre-existing trends — no pre-trends are found in the estimated discontinuity in outcomes for the four years before January 2007; (3) Compositional shifts — covariate balance tests show that firm characteristics (age, type, industry, region) at the threshold do not change over time relative to baseline; the covariate index (predicted apprentice hiring based on time-invariant firm characteristics) fluctuates between -0.0005 and +0.0005 — nearly two orders of magnitude smaller than the employment estimates; (4) Imperfect compliance — handled explicitly: the design estimates an intention-to-treat effect, which is attenuated relative to the treatment on the treated; (5) Measurement error in running variable — addressed by excluding firms within one unit of the threshold in the preferred specification; null results are robust to varying the exclusion window.
Why is the difference-in-discontinuities design superior to a standard difference-in-differences design in this context?
The paper provides a formal and empirical case that standard difference-in-differences applied to a continuous firm-size running variable produces spurious results. When the conditional expectation function of outcomes with respect to firm size rotates over time (i.e., the slope changes), a DiD estimator that discretizes firms into treated and control groups will detect this rotation as a treatment effect, even if the true policy effect is zero. This is because the DiD constrains the slopes of the conditional expectation function above and below the threshold to be zero, making them implicit omitted variables. In the Italian data, the conditional expectation function of apprentice hiring with respect to firm size rotates clockwise between 2007 and 2009, coinciding with a general slowdown in hiring during the Great Recession. This rotation would cause a naive DiD analysis to conclude, spuriously, that the subsidy supported hiring. The difference-in-discontinuities design controls flexibly for the running variable in each period and isolates only the variation near the threshold, where firm size cannot proxy for trends unrelated to the policy.
What are the main mechanisms considered for why the subsidy has no employment effect, and how does the paper distinguish among them?
The paper considers and rules out seven alternative explanations before concluding that demand for apprentices is simply inelastic: (1) Measurement error — ruled out because the null holds across specifications with different exclusion windows, and measurement error does not prevent finding significant effects on fiscal outcomes; (2) Subsidy too small — ruled out because the 8% subsidy (960 euros per apprentice per year, up to 1,460 euros at the 95th percentile of earnings) is comparable in magnitude to subsidies that generate large employment effects in Cahuc et al. (2019) and Guo (2024); (3) Low awareness — ruled out because 80% of eligible firms that hire apprentices receive the discount, confirming they must claim it actively; (4) Firms restricting hiring to maintain eligibility — ruled out because apprentices are excluded from policy-relevant firm size, so hiring an apprentice does not risk crossing the threshold; the firm-size distribution also remains stable; (5) Temporary nature of subsidy — ruled out because most apprenticeships last 19 months and the subsidy covers the first two years; moreover, the literature suggests temporary subsidies should be at least as effective as permanent ones; (6) Training requirements — ruled out because training requirements are poorly enforced, and no effects are found even among firms that previously employed apprentices (lower marginal training costs) or firms that rarely cite training costs as a deterrent; (7) Great Recession — ruled out because no effects appear in the year before the recession began, and effects are not larger or smaller for liquidity-constrained firms.
What heterogeneity analyses are conducted and what do they show?
The authors estimate pooled post-reform difference-in-discontinuities coefficients separately across multiple dimensions and find consistently null effects with no evidence of heterogeneous treatment effects: (1) by industry — estimates across manufacturing, transportation and construction, trading, services, and other sectors are all tightly centered on zero; (2) by region — null across all Italian regions; (3) by baseline apprentice earnings quartile — null across Q1 through Q4 and for firms with no apprentices at baseline; (4) by contemporaneous apprentice earnings quartile — null; (5) by three measures of liquidity constraints (liquid assets to total assets, cash flow to total assets, revenues above/below median) — null in all six groups; and (6) by prior apprenticeship training status — null for both firms that employed at least one apprentice in 2006 and those that did not. The authors note the scope condition: estimates are internally valid for firms in a neighborhood of 9 employees, and effects for substantially larger firms cannot be ruled out to differ.
What robustness checks are conducted beyond the main heterogeneity analysis?
The main robustness checks are: (1) sensitivity of apprentice hiring effects to the amount of excluded data around the threshold (the ‘donut bandwidth’) — the null holds across all exclusion windows (Appendix Figure A.2); (2) placebo tests using the pre-reform periods (January 2003 through December 2006) — no pre-trends in the estimated discontinuity for any outcome; (3) covariate stability tests — the discontinuity in a covariate index predicting apprentice hiring from time-invariant firm characteristics shows no change over time, with point estimates between -0.0005 and +0.0005 versus employment estimates between -0.01 and +0.01; (4) comparison of results to a standard DiD specification — the DiD produces spurious positive effects driven by rotation of the conditional expectation function, while the difference-in-discontinuities estimate remains precisely zero; (5) examination of other outcomes (contract churn, re-labeling, worker quality, contract type substitution, temporary worker stocks) — all null.
How is cost-effectiveness formally measured and what does the IV estimate imply?
Cost-effectiveness is defined as the number of jobs supported per unit of foregone revenue: omega = E[L(1) - L(0)] / E[R(0) - R(1)], where L is employment and R is tax payments. Rather than back-of-the-envelope calculation, the authors estimate this with 2SLS, instrumenting for actual SSC payments with the interaction of being below the eligibility threshold and the post-2007 indicator. This allows them to compute standard errors, which back-of-the-envelope methods do not provide. The first-stage F-statistic is 230, confirming instrument strength. Point estimates from Table 4: 29 apprentice-years supported per 1 million euros of foregone SSC (standard error 58, not significant); 647,237 euros of apprentice compensation supported per 1 million euros (standard error 921,320, not significant); and -2 permanent-contract transformations per 1 million euros (standard error 21, not significant). For context, directly hiring apprentices at 1,050 euros per month would generate 79 apprentice-years per million euros — 2.7 times more than the point estimate from the subsidy.
How does the paper benchmark its cost-effectiveness estimates against the broader literature?
The authors normalize employment effects from nine other studies against their fiscal costs to produce a common metric of jobs or job-years per 1 million dollars of foregone revenue. The studies span payroll tax cuts (Egebark and Kaunitz 2013; Saez, Schoefer, and Seim 2021), hiring credits (Cahuc, Carcillo, and Le Barbanchon 2019; Neumark 2013), and fiscal stimulus programs (Bartik 2001; Bartik and Erickcek 2010; Dupor and Mehkari 2016; Dupor and McCrory 2018; Feyrer and Sacerdote 2011; Wilson 2012). The conclusion is that most wage subsidies, including those that generate positive reduced-form employment effects, produce very high costs per job. With two exceptions (Bartik 2001 and Cahuc et al. 2019), cost-effectiveness estimates across the literature are extremely low. The paper argues that hiring credits may be more cost-effective than payroll tax cuts because the requirement to make a new hire makes it harder to subsidize inframarginal employment. Importantly, the Italian study’s cost-effectiveness estimates — though imprecisely estimated — are broadly consistent with the cross-study pattern once fiscal costs are accounted for.
What are the welfare and public finance implications of the null employment effects?
Because the behavioral response is zero and the fiscal cost is non-zero, the policy functions as a pure transfer from the government to firms. The paper invokes the framework of Hendren and Sprung-Keyser (2020) to note that the marginal value of public funds is essentially 1 — there is no distortion introduced but also no welfare gain from resource reallocation. This interpretation cuts in two directions: (1) the pre-reform apprentice SSC subsidies (which were larger than the post-2007 discount) were also essentially transfers with large fiscal costs and no employment-creation value; and (2) the SSC increase imposed on larger firms (those with more than 9 employees) effectively raised revenue without causing meaningful employment losses, since labor demand for apprentices is inelastic. The policy is thus deemed inefficient in the sense that taxpayer revenue is lost without generating the intended social return of increasing employment of marginalized workers.
What are the scope conditions and limitations of the estimates?
The difference-in-discontinuities design provides internally valid estimates only for firms in a neighborhood of 9 employees, which in Italy means firms with 3 to 15 employees (90% of Italian firms and 65% of all apprentices). The paper cannot rule out that larger firms respond differently to similar subsidies. The analysis is partial equilibrium: it cannot measure spillovers, general equilibrium effects on wage-setting across the firm-size distribution, or displacement effects between firms. Cost-effectiveness estimates reflect only the direct fiscal cost of foregone SSCs and do not include fiscal externalities (e.g., effects on income tax revenues or social insurance outlays) or administrative and political costs. The exclusion of workers from the public sector means the results pertain solely to private-sector apprenticeships.
How does this paper relate to prior studies on payroll tax cuts, and what distinguishes it methodologically?
Prior national studies (e.g., Saez et al. 2019, 2012, 2021; Egebark and Kaunitz 2013; Huttunen et al. 2013; Bozio et al. 2020; Rubolino 2021) estimate labor demand responses by comparing employment of targeted versus untargeted workers, which can overstate policy effectiveness if firms substitute targeted for untargeted workers (a SUTVA violation that would not be detected by parallel pre-trend tests). Cross-regional studies (e.g., Bennmarker et al. 2009; Benzarti and Harju 2021a; Bohm and Lind 1993; Guo 2024) study firms but typically do not target genuinely marginalized workers, so estimates reflect average rather than marginal labor demand. This paper satisfies both requirements simultaneously: the discontinuity in incentives provides quasi-random variation across firms (avoiding SUTVA), and the policy specifically targets apprentices — a non-random, marginalized group — so the estimated elasticities pertain to the actual population of interest. The paper is also the first (to the authors’ knowledge) to use a formal IV strategy to estimate cost-effectiveness with standard errors, enabling statistical precision comparisons across the distribution of estimates.
What does survey evidence from the RIL data contribute to the interpretation?
The RIL (Rilevazione Longitudinale su Imprese e Lavoro), a representative firm survey collected in 2005, provides direct evidence on firms’ stated reasons for their apprenticeship hiring decisions. Among firms that do not hire apprentices, the most common reason by far is ‘we don’t need more people,’ with cost cited rarely. Among firms that do hire apprentices, the dominant reason is to train workers prior to hiring them as permanent employees; ’lower labor costs’ is a secondary consideration. This corroborates the paper’s interpretation that demand for apprentices is driven by training-for-retention motives rather than cost arbitrage, which explains why a cost reduction leaves hiring behavior unchanged.
What is the policy recommendation and its scope?
The paper urges caution in using payroll tax credits to stimulate employment, particularly for targeted groups with inherently low or inelastic labor demand. The results suggest that, for apprentices, firms hire based on training-and-conversion needs rather than cost considerations, so subsidizing cost does not expand hiring. More broadly, the cross-study cost-effectiveness comparison suggests that hiring credits — which require a new hire as a prerequisite for receiving the subsidy — may be more efficient than payroll tax cuts precisely because they screen out inframarginal firms. The paper does not rule out effectiveness for other worker types or for much larger subsidies, but the documented uniformity of null effects across industries, regions, and firm types suggests the inelasticity finding is robust within the studied population.
Key Concepts
Inframarginal hiring: Employment that would occur absent the subsidy; when a policy subsidizes inframarginal hiring, it transfers resources to firms without generating net new jobs, making it fiscally costly but behaviorally inert.
Difference-in-discontinuities: An empirical design that combines regression discontinuity with difference-in-differences: in each period a discontinuity at the policy threshold is estimated, and the pre-policy baseline discontinuity is subtracted to remove pre-existing size-related level differences and time-invariant non-linearities in the conditional expectation function.
Policy-relevant firm size: As defined by INPS under the 2007 Budget Bill: total full-time equivalent employment minus apprentices, temporary agency workers, workers on leave (unless replaced), and workers on specific on-the-job training contracts; this is the running variable determining SSC eligibility.
Cost-effectiveness (jobs per foregone revenue): The number of job-years supported per unit of foregone tax revenue (here, per 1 million euros of lost SSCs), formally estimated via instrumental variables to allow statistical inference — as opposed to back-of-the-envelope calculations that provide no standard errors.
Inelastic labor demand for apprentices: In this paper’s sense: firms’ demand for apprenticeship contracts does not respond to changes in their labor cost, because hiring decisions are driven by training-and-conversion motives (hiring to eventually retain as permanent employees) rather than by cost minimization at the margin.
Rotation of the conditional expectation function: A change over time in the slope of the relationship between an outcome (e.g., apprentice hiring) and the running variable (firm size); when the slope changes, standard DiD specifications that discretize firms into treated/control groups will spuriously detect a treatment effect even when the true policy effect is zero.
Transformation (apprentice to permanent contract): The event of a firm converting an existing apprenticeship contract into an open-ended (permanent) employment contract at the end of the apprenticeship; used as an alternative outcome to evaluate whether the subsidy increased the ultimate goal of permanent employment, not just temporary apprenticeships.