Forthcoming [Econometrica] doi:10.3386/w32495

Double Robustness of Local Projections and Some Unpleasant VARithmetic

José Luis Montiel Olea

Mikkel Plagborg-Møller

Eric Qian

Christian Wolf

Canonical DOI Free to read · GREEN Open access ↗

What this paper finds — and why it matters

This paper provides formal theoretical results on the relative robustness of local projection (LP) and vector autoregression (VAR) confidence intervals for impulse response inference when the data generating process (DGP) is locally misspecified. The research question is whether the widely held belief that LP estimators are more robust to misspecification than VARs is theoretically justified, and if so, precisely under what conditions and with what consequences for VAR inference.

The analytical framework models the DGP as a stationary structural VARMA(1, ∞) that is local to an SVAR(1), of the form y_t = Ay_{t-1} + H[I + T^{-ζ}α(L)]ε_t, where the MA component T^{-ζ}α(L)ε_t represents misspecification that vanishes at rate T^{-ζ} as sample size T grows. The key rate parameter is ζ ∈ (1/4, 1/2), which corresponds to misspecification large enough to be detected with probability approaching 1 by conventional Hausman-type specification tests, yet small enough that the bias-variance trade-off between LP and VAR remains non-trivial asymptotically. The framework encompasses under-specification of lag length, omitted variables, temporal aggregation, measurement error, and failure of shock invertibility — essentially all sources of dynamic misspecification relevant to linearized DSGE models.

The main finding on LP is a “double robustness” result: the conventional LP confidence interval achieves correct asymptotic coverage for all ζ > 1/4, even when misspecification is large enough to be detected with certainty. The mechanism is that the omitted-variable bias in the LP regression is of order T^{-2ζ} = o(T^{-1/2}) when ζ > 1/4, because both the direct effect of omitted lags on the outcome and the covariance of the residualized regressor with omitted lags are each of order T^{-ζ}, so their product is negligible relative to the T^{-1/2} standard deviation. This is formally analogous to double robustness in partially linear regression and debiased machine learning: LP is consistent if either the outcome-equation controls or the first-stage controls are correctly specified.

In stark contrast, the VAR estimator carries asymptotic bias of order T^{-ζ}, which is non-negligible relative to its T^{-1/2} standard deviation for ζ ≤ 1/2. This causes the conventional VAR confidence interval to severely undercover: for ζ ∈ (1/4, 1/2) the coverage converges to zero, and for ζ = 1/2 it converges to a level strictly below the nominal level.

The “no free lunch” result formalizes the trade-off. Setting ζ = 1/2 and bounding the noise-to-signal ratio at M²/T, the worst-case scaled VAR bias equals M√(aVar(β̂_h)/aVar(δ̂_h) − 1). This worst-case bias is small if and only if the VAR asymptotic variance is close to that of LP. When the VAR standard error is less than half that of LP — which is typical in applied practice — worst-case coverage falls below 48% even for M = 1. Moreover, the least favorable misspecification takes the form of exponentially decaying MA coefficients peaking at horizon h, a pattern consistent with standard economic theories of adjustment costs, learning, or overshooting, and is difficult to rule out on prior grounds. The Hausman test also provides weak protection: when M = 1, the odds of the test failing to reject are nearly 3-to-1 at the 10% significance level.

Simulations using the Smets and Wouters (2007) model with T = 240 observations confirm these results. With lag length selected by AIC (median selected p = 2), VAR confidence intervals materially undercover at all but very short horizons while LP achieves close to nominal coverage throughout. Increasing lag length to p = 4 or p = 8 ameliorates VAR undercoverage at short horizons but at the cost of making VAR confidence intervals essentially as wide as LP intervals, with substantial undercoverage persisting at longer horizons. For p = 4 the total misspecification measure is M ≈ 3.23; for p = 8, M ≈ 1.89.

Scope conditions: results are pointwise asymptotic in fixed model parameters and horizon; they abstract from order-T^{-1} small-sample biases from persistence or the nonlinearity of the impulse response transformation. The LP robustness result requires controlling for lags that are strong predictors of the outcome or impulse variables; omitting lags with small-to-moderate predictive power does not threaten coverage.

Q: What is the precise sense in which LP confidence intervals are “doubly robust”?

A: LP is doubly robust in the sense of partially linear regression: its bias from misspecified MA dynamics is the product of two errors, the estimation error in the outcome-equation lag controls γ̂ − γ_0 and the estimation error in the first-stage lag controls ν̂ − ν_0. In the local-to-SVAR model each error is of order T^{-ζ}, so their product is of order T^{-2ζ} = o(T^{-1/2}) whenever ζ > 1/4, making the omitted-variable bias negligible relative to the T^{-1/2} standard deviation. This means the asymptotic distribution of the LP estimator is completely invariant to the misspecification parameters α(L) and ζ.

Q: How large does misspecification need to be before LP coverage is threatened?

A: The LP double robustness result holds for all ζ > 1/4 regardless of the magnitude parameter M of the MA misspecification. Misspecification with ζ ∈ (1/4, 1/2) can be detected with probability approaching 1 asymptotically by standard specification tests — in particular, the Hausman test is consistent for this range — yet LP coverage remains exactly correct. There is no threshold M below which LP fails; robustness is structural, not contingent on misspecification being small.

Q: Under what conditions does the VAR estimator have zero asymptotic bias?

A: The VAR asymptotic bias is zero if and only if the lagged shocks ε_{j*,t-ℓ} for ℓ = 1, …, h lie in the span of the lagged data used for estimation. Two sufficient conditions from Corollary 3.2 are: (i) the true model is SVAR(p_0) and the estimation lag length p satisfies h ≤ p − p_0, so the extra lags absorb the residual MA structure; or (ii) the shock of interest is directly observed and ordered first, and h ≤ p. In these cases the VAR estimator is asymptotically equivalent to LP, with equal variance.

Q: What is the “no free lunch” result for VARs?

A: For ζ = 1/2 and noise-to-signal ratio bounded by M²/T, the worst-case scaled VAR bias equals M√(aVar(β̂_h)/aVar(δ̂_h) − 1) (Proposition 4.1). This quantity is small if and only if aVar(δ̂_h) ≈ aVar(β̂_h), meaning the VAR has little efficiency advantage over LP. Put differently, the only way to guarantee robust VAR coverage is to include enough lags that the VAR confidence interval becomes as wide as the LP interval. There is no procedure that simultaneously offers narrower intervals than LP and reliable coverage.

Q: How severe is the worst-case undercoverage of conventional VAR confidence intervals?

A: From Corollary 4.3, even for M = 1 (a noise-to-signal ratio of just 1/T), worst-case VAR coverage falls below 48% whenever the VAR asymptotic standard deviation is less than half that of LP — a configuration typical in applied practice. For larger M the undercoverage is worse: the formula 1 − r(M√(aVar(β̂_h)/aVar(δ̂_h) − 1); z_{1-α/2}) can approach zero. Furthermore, the worst-case probability that VAR fails to cover AND the Hausman test fails to reject misspecification simultaneously exceeds 46% when the VAR standard deviation is less than half that of LP (Corollary 4.4).

Q: Can the researcher detect the problematic misspecification using a Hausman test before it causes undercoverage?

A: Only weakly. When M = 1, the Hausman test fails to reject misspecification with probability approximately 74% (odds of nearly 3-to-1) at the 10% significance level, since r(1; z_{0.95}) = 26%. At the 5% level the odds of non-rejection are nearly 5-to-1, since r(1; z_{0.975}) = 17%. The least favorable misspecification also cannot be ruled out on economic-theory grounds: the least favorable MA polynomial has exponentially decaying coefficients peaking at horizon h, consistent with adjustment costs, learning, or overshooting.

Q: Does using a bias-aware critical value (Armstrong-Kolesár approach) resolve the VAR undercoverage problem?

A: The bias-aware VAR confidence interval CI_B(δ̂_h; M) achieves correct asymptotic coverage by inflating the critical value based on the known bound M on misspecification. However, the bias-aware VAR interval tends to be wider than the LP interval. Specifically, M must be quite small — apparently below 1 — for the bias-aware VAR to dominate LP in width regardless of DGP and horizon. For M ≥ 2 (noise-to-signal ratio above 4/T), bias-aware VAR is dominated by LP in interval width. The practical conclusion is that the simpler LP interval is preferable in most empirically relevant settings.

Q: What does the minimax model-averaging result say about optimal weighting of LP and VAR?

A: From Corollary 4.2, the minimax optimal weight on LP when estimating a convex combination of LP and VAR estimators is M²/(1 + M²). For M = 1 (equal noise-to-signal threshold), the optimal weight is 50% on each. For M = 2, the LP estimator receives 80% weight. In the Smets and Wouters simulations, M ≈ 3.23 for p = 4 lags, corresponding to an optimal LP weight of approximately 91%, and M ≈ 1.89 for p = 8 lags, giving an optimal LP weight of approximately 78%.

Q: What do the Smets and Wouters simulations show about AIC-selected VARs?

A: In 5,000 simulated samples of T = 240 observations from the Smets and Wouters (2007) model, the AIC selects a median lag length of p = 2. At all but very short horizons, VAR confidence intervals materially undercover while LP confidence intervals throughout achieve close to nominal coverage. A bootstrap correction for VARs somewhat improves coverage but leaves large distortions. Increasing lag length to p = 4 or p = 8 moves coverage closer to nominal at short horizons (h ≤ p) but makes VAR confidence intervals essentially as wide as LP, and substantial VAR undercoverage persists at longer horizons.

Q: Is the no-free-lunch result specific to univariate impulse responses?

A: No. Proposition 4.2 extends the result to simultaneous inference on multiple impulse responses. For any k × 1 linear combination R of the impulse response vector, the worst-case squared bias is M² λ_max(R[aVar(β̂) − aVar(δ̂)]R’), where λ_max denotes the largest eigenvalue. Because VAR impulse response estimates are often highly correlated across horizons, undercoverage can be particularly severe in the multivariate (joint confidence ellipsoid) case. The no-free-lunch principle holds: the VAR ellipsoid offers non-negligible worst-case bias as long as it offers any efficiency gain relative to LP for any linear combination of horizon-specific impulse responses.

Q: What is the practical recommendation for lag selection in LP and VAR?

A: The paper offers three practical guidelines. First, LP researchers should control for those lags of the data that are strong predictors of the outcome or impulse variables, using conventional information criteria (such as AIC) applied to a VAR in all variables to select the number of lags for LP control — omitting lags with small-to-moderate predictive power does not threaten coverage. Second, VAR researchers should increase the lag length until the VAR confidence interval is no longer substantially narrower than the corresponding LP interval. Third, conventional specification tests do not suffice to guard against VAR coverage distortions.

Local Projection (LP) Estimator: The LP estimator for the impulse response at horizon h is the OLS coefficient on the shock variable y_{j*,t} in a direct regression of y_{i*,t+h} on y_{j*,t}, the variables ordered before it, and lagged data. It is a “direct” estimator in that it does not iterate a one-step VAR forward.

Double Robustness: A property of LP whereby its asymptotic bias from MA misspecification equals the product of two estimation errors — in the outcome-equation lag controls and in the first-stage residualization controls — each of order T^{-ζ}, making their product of order T^{-2ζ} = o(T^{-1/2}) for ζ > 1/4. This is the LP analogue of the double robustness of partially linear regression estimators in debiased machine learning.

Local-to-SVAR Misspecification: A DGP of the form y_t = Ay_{t-1} + H[I + T^{-ζ}α(L)]ε_t in which the MA term T^{-ζ}α(L)ε_t represents misspecification that vanishes at rate T^{-ζ}. The rate parameter ζ governs the magnitude; ζ ∈ (1/4, 1/2) is the empirically relevant range where bias is detectable by specification tests yet the bias-variance trade-off between LP and VAR remains non-trivial.

No Free Lunch (for VARs): The result that the worst-case scaled VAR bias equals M√(aVar(β̂_h)/aVar(δ̂_h) − 1), implying that the VAR confidence interval has reliable (robust) coverage if and only if the VAR asymptotic variance is close to that of LP — i.e., there is no way to simultaneously have shorter confidence intervals than LP and guaranteed coverage robustness.

Noise-to-Signal Ratio: The quantity T^{-1}||α(L)||² = trace{Var(T^{-1/2}α(L)ε_t) Var(ε_t)^{-1}}, which measures the total magnitude of the MA misspecification relative to the variance of the shocks. The paper bounds this at M²/T and uses M as the sufficient statistic for worst-case bias and coverage.

Bias-Aware Critical Value: An inflated critical value cv_{1-α}(b) solving r(b; cv_{1-α}(b)) = α, used to construct a VAR confidence interval CI_B(δ̂_h; M) that achieves correct asymptotic coverage by accounting for the worst-case bias M√(aVar(β̂_h)/aVar(δ̂_h) − 1). The paper shows this approach typically produces intervals at least as wide as LP for M ≥ 2.

Asymptotic Bias of VAR (aBias): The scaled bias term T^{ζ}E[δ̂_h − θ_{h,T}] converging to aBias(δ̂_h) = trace{S^{-1}Ψ_h H Σ_{ℓ=1}^∞ α_ℓ D H’(A’)^{ℓ-1}} − e’{i*,n} Σ{ℓ=1}^h A^{h-ℓ} H α_ℓ e_{j*,m}. This term is structurally absent from the LP asymptotics due to the double robustness mechanism.

How this summary was made. Bibliographic fields are pulled from Crossref and OpenAlex and are not model-generated. The summary was drafted from the open-access manuscript , checked by a claim-grounding and calibration review pass, and approved before publishing. Found an error or a misrepresentation? Flag it here — corrections are welcome, especially from the authors.