Forthcoming [Review of Economic Studies] doi:10.1093/restud/rdag050

Jackknife Standard Errors for Clustered Regression

Bruce E Hansen

Canonical DOI Free to read · GREEN Open access ↗

What this paper finds — and why it matters

Hansen (2025) makes a theoretical case for replacing the conventional cluster-robust variance estimator (CRVE) and heteroskedasticity-consistent (HC) standard errors with a specific jackknife variance estimator, V5, in linear regression with heteroskedastic and/or cluster-dependent observations.

The paper identifies two fundamental problems with conventional CRVE1 and CRVE2 estimators. First, these estimators can be fully downward biased: Theorem 2 establishes that the infimum of E[v̂1²]/v² and E[v̂2²]/v² over all admissible regressor and covariance matrix configurations equals zero, meaning expected variance can be arbitrarily close to zero relative to the true variance. This pathology arises from extreme regressor leverage — specifically when one cluster dominates the sample — and holds even under homoskedasticity and clusterwise invertibility. Second, Theorem 5 shows that confidence intervals constructed from CRVE1 and CRVE2 standard errors have worst-case coverage probability equal to zero for any finite critical value c, making them unable to achieve any target coverage level uniformly over regression designs.

Crucially, Hansen shows that even the conventional jackknife estimators V3 and V4, which are already in use (e.g., via Stata’s vce(jackknife) option), share these pathologies when clusterwise noninvertibility is present. Clusterwise noninvertibility occurs when deleting a single cluster renders the regressor matrix singular — as in regressions with cluster-level fixed effects, a single treated cluster, or sparse dummy variables. Stata’s existing fix of simply dropping noninvertible clusters is shown to be insufficient: under clusterwise noninvertibility, the infimum of E[v̂3²]/v² and E[v̂4²]/v² over the broader model class equals zero (Theorem 2, equations 19–20), and the corresponding confidence intervals also achieve worst-case coverage of zero.

The proposed estimator V5 resolves these problems through three modifications to the conventional jackknife: (1) it uses a generalized (Moore-Penrose) inverse rather than dropping noninvertible clusters, ensuring all clusters are included; (2) it centers at the full-sample estimator β̂ rather than the mean of delete-one estimates; and (3) it omits the (G−1)/G degrees-of-freedom correction. Theorem 1 proves that E[V̂5] ≥ V in the positive semidefinite sense for all sample sizes, regressor matrices, and covariance structures — the estimator is never downward biased. Theorem 3 then shows that jackknife-based confidence intervals C̃5(c) have coverage probability bounded below by the Cauchy distribution for any c ≥ 1. With the conventional critical value c = 1.96, this guarantees finite-sample coverage of at least 70% and test size of at most 30%, regardless of regression design or error variance structure.

To improve upon the conservative Cauchy bound in practice, the paper proposes a Satterthwaite adjusted t approximation for the jackknife t-ratio. The adjustment derives degrees of freedom K and a scale factor a from the eigenvalue structure of a design-dependent matrix D. Theorem 7 shows that a → 1 and K → ∞ as n → ∞ under mild regularity conditions (no single cluster dominates). Simulation evidence across six regression designs — varying regressor distributions (Normal, LogNormal with cluster dependence, sparse Dummy) and error structures (clustered normal, heteroskedastic) — with G ∈ {6, 12, 40, 100} clusters confirms that the Satterthwaite jackknife interval achieves coverage rates uniformly above 93% at the nominal 95% level even with G = 6, while CRVE1 intervals fall as low as 57% coverage in the LogNormal/heteroskedastic design. The empirical application extends Meng, Qian, and Yared (2015) on Chinese TV access and redistribution preferences, finding that the jackknife standard error for the TV access coefficient exceeds the CRVE1 standard error and the Satterthwaite interval is wider, affecting conclusions about statistical significance.

The theory holds under Assumptions 1–4: correctly specified linear regression with zero conditional mean errors, full rank X, finite second moments, arbitrary cluster sizes and within-cluster covariance structure, and (for Theorem 3) normal errors. Results hold for fixed k and G, arbitrary n, and allow clusterwise noninvertibility subject to Assumption 3 (inference targets the well-identified regressors).

Q: What is the central claim of the paper? A: Conventional CRVE and HC variance estimators should be replaced by the jackknife estimator V5 in all linear regression contexts with heteroskedastic or clustered errors. V5 is never downward biased (its expectation weakly exceeds the true variance matrix), whereas CRVE1 and CRVE2 can be arbitrarily downward biased. The Satterthwaite-adjusted V5 confidence interval has excellent finite-sample coverage.

Q: What is the worst-case bias of CRVE1? A: The infimum of E[v̂1²]/v² over all admissible regressor matrices and covariance matrices equals zero (Theorem 2, equation 15). This means that for some data-generating process, the expected CRVE1 variance estimate is arbitrarily close to zero relative to the true variance — full downward bias. Importantly, this pathology holds even under homoskedasticity (Σ = Iₙ) and clusterwise invertibility; it is driven entirely by extreme regressor leverage.

Q: Why is CRVE2 also fully downward biased, and how does its failure differ from CRVE1’s? A: Theorem 2 (equation 16) shows that the infimum of E[v̂2²]/v² over F* also equals zero. The difference is that the proof for CRVE2 requires non-i.i.d. errors, meaning CRVE2’s failure requires manipulation of the covariance matrices in addition to extreme leverage, whereas CRVE1 can fail under i.i.d. errors from leverage alone.

Q: What is clusterwise noninvertibility and why does it matter? A: Clusterwise noninvertibility occurs when deleting a single cluster renders the regressor design matrix X’X − Xg’Xg singular. This happens in regressions with cluster-level fixed effects, with a cluster-level treatment indicator when only one cluster is treated, or with sparse dummy variables. The paper shows that the conventional jackknife estimators V3 and V4 become fully downward biased (infimum of expectation ratio equals zero) under clusterwise noninvertibility, even though Stata’s existing fix of dropping noninvertible clusters was explicitly designed to handle this case.

Q: What is the key innovation in V5 that makes it robust to clusterwise noninvertibility? A: V5 uses the Moore-Penrose generalized inverse in the delete-one-cluster estimator β̂₋g, ensuring all G clusters are included in the sum rather than discarding noninvertible clusters. It also centers at the full-sample β̂ rather than the mean β̄ of delete-one estimates, and omits the (G−1)/G degrees-of-freedom correction. The paper shows these three differences together imply V̂5 ≻ V̂4 ≻ V̂3 in the positive semidefinite ordering.

Q: What does Theorem 1 establish about V5? A: Theorem 1 proves E[V̂5] ≥ V in the positive semidefinite sense for all sample sizes, all regressor matrices, all covariance matrices, and under clusterwise noninvertibility. This conservative property holds without any assumption on cluster sizes, regressor leverage, within-cluster correlation, or heteroskedasticity beyond Assumption 1 (correct specification and finite second moments). The infimum of E[v̂5²]/v² equals 1 (equation 21), meaning the inequality is sharp.

Q: What does the Cauchy distribution bound say, and how useful is it in practice? A: Theorem 3 shows that for any c ≥ 1, the jackknife confidence interval C̃5(c) has coverage probability at least P[|ζ| ≤ c] where ζ is Cauchy. With c = 1.96, this guarantees coverage of at least 70% and test size of at most 30% uniformly over all regression designs and error structures (under normality). The bound is not tight in typical applications — actual coverage is much higher — but it provides the first generally applicable uniform guarantee for clustered/heteroskedastic regression. The Cauchy critical value at 5% is 12.7, far too large for practical use, so the bound is more useful as a theoretical guarantee than as a practical inference tool.

Q: What does Theorem 5 establish about confidence intervals from CRVE1–CRVE4? A: Under normality, the worst-case coverage probability of confidence intervals constructed from any of the four estimators v̂1 through v̂4 equals zero for any finite critical value c (equations 26–29). For v̂1 and v̂2, this holds over the clusterwise-invertible model class F*; for v̂3 and v̂4 it holds over the broader class F allowing noninvertibility. Zero worst-case coverage cannot be fixed by enlarging c, since the result holds for all finite c. This is not an impossibility result in the Bahadur-Savage sense; it is a statement that specific commonly-used intervals fail, while V5-based intervals succeed.

Q: What is the Satterthwaite approximation and how is it implemented? A: The Satterthwaite adjustment replaces the jackknife t-ratio’s exact finite-sample distribution — a ratio of a normal to the square root of a weighted sum of chi-squares — with a scaled t distribution with K degrees of freedom, where K and a scale factor a are matched by moment conditions on the eigenvalues of a design matrix D. The confidence interval is θ̂ ± v̂5 · t^{1−α/2}_K / a, and the p-value uses a Student t or F distribution with the same K and scale. These quantities can be computed without explicit eigendecomposition using trace formulas (equations 38–39), which are preferred computationally when G > k.

Q: What do the simulations show about coverage rates? A: Across six designs (three regressor types × two error types) and G ∈ {6, 12, 40, 100}, CRVE1 falls as low as 57% coverage in the LogNormal regressor/heteroskedastic error design with G = 6. CRVE2 has somewhat better but still substantially undercovering intervals. The conventional jackknife interval undercovers (as low as 85%) in leveraged/heteroskedastic designs. The Satterthwaite jackknife interval achieves coverage uniformly exceeding 93% across all designs, though it can be excessively conservative (100%) in some cases. All simulation estimates have standard errors less than 0.003 (20,000 replications).

Q: Does the Satterthwaite adjustment vanish in large balanced samples? A: Yes. Theorem 7 shows that if the design matrix is uniformly non-singular and no single cluster dominates (maxg ||Xg||² = o(n)), then a → 1 and K → ∞ as n → ∞. Consequently, the Satterthwaite interval converges to the standard normal interval in well-balanced large samples.

Q: How does V5 relate to the classical HC3 estimator? A: Under independent sampling (no clustering, ng = 1), V5 reduces to the HC3 estimator of Andrews (1991) and Davidson and MacKinnon (1993), which uses the Moore-Penrose inverse. The conventional jackknife V3/V4 reduce to the HC3 of MacKinnon and White (1985). The paper’s results thus provide a formal theoretical basis for the longstanding recommendation (by Efron-Stein 1981, MacKinnon-White 1985, Andrews 1991, and others) to use HC3/jackknife standard errors.

Q: What is the practical recommendation for empirical researchers? A: Replace all CRVE1/CRVE2/HC standard errors with V5, computed via the Moore-Penrose generalized inverse including all clusters. Report V5-based standard errors (which are never downward biased) alongside Satterthwaite-adjusted confidence intervals and p-values using equations (30)–(31). The adjustment parameters a and K differ per coefficient and must be computed separately for each. The paper advises against reporting a/v̂5 as an “adjusted standard error” since that quantity loses the never-downward-biased property.

Q: What is the empirical application and what does it find? A: The paper extends Meng, Qian, and Yared (2015), which studies the effect of TV access on demand for redistribution in China using provincial household survey data (30 provinces, multiple years), and Canay, Santos, and Shaikh (2021), who found CRVE1 standard errors may be unreliable in that setting. Applying V5, the jackknife standard error for the TV access coefficient exceeds the CRVE1 standard error, the Satterthwaite interval is wider than the conventional interval, and conclusions about statistical significance are affected.

Q: What are the scope conditions and limitations? A: The bias results (Theorems 1–2) require only correct specification (zero conditional mean) and finite second moments. The Cauchy bound (Theorem 3) additionally requires normal errors; whether a similar bound holds without normality or in G → ∞ asymptotics is left open. The Satterthwaite adjustment applies only to inference on real-valued (scalar) parameters and does not extend to joint hypothesis tests. Assumption 3 limits inference to “well-identified” regressors (those whose leave-cluster-out coefficients are uniquely defined after partialling out controls).

V5 (jackknife variance estimator): The paper’s proposed estimator, defined in equation (10) as the sum over all G clusters of outer products of (β̂₋g − β̂), where β̂₋g uses the Moore-Penrose generalized inverse. Unlike conventional jackknife estimators, V5 includes all clusters (no dropping), centers at the full-sample β̂, and omits the (G−1)/G correction. Its key property is E[V̂5] ≥ V for all regression designs.

Never-downward-biased (conservative) estimator: A variance estimator whose expectation is weakly greater than the true variance in the positive semidefinite sense, for all admissible regressor matrices and covariance structures. V5 has this property; CRVE1, CRVE2, and conventional jackknife estimators do not.

Full downward bias: The worst-case property that the infimum of E[v̂²]/v² equals zero over the model class — meaning the expected variance estimate can be arbitrarily close to zero relative to the true variance. CRVE1 is fully downward biased under clusterwise invertibility alone; CRVE2 requires non-i.i.d. errors; conventional jackknife estimators become fully downward biased under clusterwise noninvertibility.

Clusterwise noninvertibility: The condition where deleting a single cluster g renders the matrix X’X − Xg’Xg singular, so the standard delete-one-cluster estimator β̂₋g is undefined. This occurs in regressions with cluster-level fixed effects, a single treated cluster, or sparse dummy variables. V5 handles this via the Moore-Penrose generalized inverse; Stata’s existing fix of dropping such clusters is shown to be non-robust.

Cauchy distribution bound: Theorem 3’s result that the jackknife confidence interval C̃5(c) has coverage probability at least P[|ζ| ≤ c] for all c ≥ 1, uniformly over all regression designs and error variances (under normality). With c = 1.96, this gives a guaranteed coverage floor of 70%. This is the first generally applicable uniform coverage guarantee for clustered/heteroskedastic regression.

Satterthwaite adjusted t approximation: A data-dependent distributional approximation for the jackknife t-ratio that approximates the denominator’s weighted chi-square distribution by a scaled chi-square with K degrees of freedom, where K and scale factor a are computed from trace formulas involving the design matrix. The resulting confidence interval θ̂ ± v̂5 · t^{1−α/2}_K / a converges to the standard normal interval in well-balanced large samples.

Regressor leverage: The degree to which variation in a coefficient of interest is concentrated in a small number of clusters. High leverage (when one cluster dominates the regressor of interest) is the mechanism by which CRVE1/CRVE2 achieve worst-case downward bias even under homoskedasticity.

How this summary was made. Bibliographic fields are pulled from Crossref and OpenAlex and are not model-generated. The summary was drafted from the open-access manuscript , checked by a claim-grounding and calibration review pass, and approved before publishing. Found an error or a misrepresentation? Flag it here — corrections are welcome, especially from the authors.