Misspecified Expectations among Professional Forecasters
What this paper finds — and why it matters
Analyzing panel data from the U.S. Survey of Professional Forecasters (SPF, 1992Q1–2019Q4, 77 forecasters, 1,520 forecaster-quarter observations), Julio Ortiz finds that a “misspecified expectations” model — in which forecasters perceive an AR(2) data-generating process to be an AR(1), causing them to misperceive its underlying persistence — tends to outperform a noisy-information rational benchmark and two leading non-FIRE alternatives (overconfident and diagnostic expectations) when fit to forecast errors and revisions. The models are estimated by maximum likelihood and ranked using forecast-encompassing weights; for the baseline real GDP growth case, misspecified expectations earns the largest encompassing weight (0.539 vs. 0.462 for diagnostic, ~0 for rational and overconfident) and the highest log-likelihood. Across 14 macroeconomic variables, misspecified expectations provides the best fit for most series both in-sample and out-of-sample, though diagnostic expectations fits better for some (e.g., GDP deflator, industrial production, real residential investment) and rational expectations fits the unemployment rate best. The author argues misspecified expectations succeeds in part because its bias enters both the prediction and updating equations, producing overreaction to new information plus overextrapolation across horizons, which makes forecast errors longer-lived; he concludes it can serve as a “suitable approach” / useful benchmark to model professional-forecaster expectation formation, while emphasizing the results are specific to the context of professional forecasting and may not carry over to household or firm expectations.
Summary of a forthcoming paper, AI-assisted and human-reviewed. See the linked original for the authoritative claims and full conditions.
Q1. What question does the paper address?
The paper undertakes a formal comparison of competing non-FIRE theories of expectation formation to move toward establishing a benchmark non-FIRE model in the context of professional forecasting. Ortiz motivates this with the observation that survey forecast errors are predictably correlated with real-time information — a violation of full-information rational expectations (FIRE) — but that, as noted in Reis (2020), the literature “has not yet settled on a benchmark non-FIRE model.” The paper offers “a partial answer to this question.”
Q2. What models are compared?
Four models are estimated: a noisy-information rational expectations baseline plus three biased non-FIRE models — overconfident expectations (Daniel et al., 1998), diagnostic expectations (Bordalo et al., 2020), and misspecified expectations (in the spirit of Fuster et al., 2010). All are embedded in a common noisy-information environment where the latent variable is unobservable and forecasters update via a Kalman filter from a noisy private signal. Overconfidence has forecasters misperceive their signal noise as smaller than it is; diagnostic expectations introduces a representativeness distortion ϕ > 0 generating overreaction to recent news; misspecified expectations has forecasters treat an AR(2) process as an AR(1).
Q3. What exactly is “misspecified expectations” in this paper?
Misspecified expectations is a model in which the underlying state follows an AR(2) process but forecasters treat it as an AR(1), so they misperceive the true persistence of the data-generating process. The author notes this version is “closest to natural expectations as modeled in Fuster et al. (2010),” with forecasters neglecting longer lags. Importantly, forecasters still understand the information structure. If the perceived persistence loads excessively onto the first lag, forecasters overextrapolate. The author flags three technical differences from Fuster et al. (2010): he does not model an AR(2) in levels with AR(1)-in-growth-rates forecasting; the perceived persistence is estimated from the data rather than defined as a function of the true autocorrelation parameters; and he does not define expectations as a weighted average of rational and naive AR(1) expectations.
Q4. What data and sample are used?
The estimation uses U.S. SPF panel data from 1992Q1 to 2019Q4, yielding 77 unique forecasters and 1,520 forecaster-quarter observations for the baseline. The 1992 start is chosen to avoid spanning different regimes and because the survey redefined output from GNP to GDP in 1992. The procedure requires unbroken observation sequences, so only each forecaster’s longest spell is kept, with a minimum spell length of eight quarters (because entry/exit may be non-random, per Engelberg et al., 2011). Real GDP growth is the baseline variable; 13 other macroeconomic variables are also estimated. Real-time forecast errors (not errors based on revised figures) are used, following the literature.
Q5. How are the models estimated and compared?
The models are estimated via a three-step maximum likelihood procedure, and their relative fit is compared using forecast-encompassing weights (West, 2001; Harvey et al., 1998; West, 2006), supplemented by AIC and a Vuong (1989) non-nested likelihood-ratio test. Step 1 estimates the fundamental process parameters (ρ₁, ρ₂, σ_w) from the macro time series and fixes them across models; step 2 estimates the signal-noise dispersion σ_v from the rational model and calibrates it across the other three; step 3 estimates each bias parameter (α_v, ϕ, ρ̂) by MLE on SPF data. This keeps fundamental and information parameters consistent across biased models so they are evaluated solely on the biases they generate, and makes identification transparent (notably, σ_v and α_v cannot be jointly identified in the overconfidence model). Encompassing weights are obtained from a constrained linear regression of realizations on model-based one-quarter-ahead forecasts, with weights summing to 1.
Q6. What are the baseline real GDP growth results?
For real GDP growth, the misspecified expectations model produces the highest log-likelihood and the largest encompassing weight, 0.539, versus 0.462 for diagnostic expectations and approximately 0.000 for both rational and overconfident expectations. The fundamental process estimates imply relatively low persistence (first-order autocorrelation ρ₁ ≈ 0.434, second-order ρ₂ ≈ −0.006). The estimated bias parameters are: overconfidence ≈ 0.72, diagnosticity ≈ 0.23, and perceived persistence ρ̂ ≈ 0.564. Because ρ̂ ≈ 0.56 exceeds the estimated ρ₁ ≈ 0.43, the misspecified model implies forecasters overestimate the first-order autocorrelation and neglect the partial reversal in the second lag, generating overreactions. The signal-to-noise ratio implied by the estimated private noise dispersion is σ_w/σ_v ≈ 1.09. AIC rankings (and BIC) do not change the ordering relative to the maximized likelihoods.
Q7. Does the result hold across other macroeconomic variables?
Across the 14 SPF macroeconomic variables, misspecified expectations provides the best in-sample fit for most series, but not all. Diagnostic expectations registers larger encompassing weights for certain series — the GDP deflator (0.771), industrial production (1.000), and real residential investment (0.624). Rational expectations provides the best fit for the unemployment rate (0.745) and housing starts (in-sample). For the bulk of the remaining variables (e.g., CPI 0.859, payroll employment 1.000, real consumption 0.777, real federal spending 1.000, real GDP 0.539, real nonresidential investment 1.000, real state/local spending 1.000, 3-month Treasury bill 0.713, 10-year bond 0.746), misspecified expectations carries the largest weight. Overconfident expectations “does not yield particularly large encompassing weights for any variable.”
Q8. Why does misspecified expectations fit better, and for which variables especially?
The author finds that, among variables exhibiting overreactions, misspecified expectations tends to offer a better fit for less persistent series, because the scope for it to generate overreaction (ρ̂ − ρ₁) is greater when ρ₁ is low. Unlike the alternatives, the persistence bias ρ̂ − ρ₁ can be positive or negative, allowing the model to account for both overreacting and underreacting variables; the alternative models cannot generate forecaster-level underreaction. Figure 2 plots the encompassing weight on misspecified expectations against the sum of autoregressive coefficients and suggests (with some exceptions) that less persistent variables have higher weight on misspecified expectations.
Q9. Does the model perform out of sample?
The misspecified expectations model also provides a better out-of-sample fit for more of the variables, estimated on 1992Q1–2005Q4 and evaluated on the latter half of the sample. However, out of sample diagnostic expectations now outperforms for the GDP deflator (0.987), industrial production (0.959), payroll employment (0.813), and real federal government expenditures (0.591); overconfident expectations outperforms for the 10-year government bond (0.653); and rational expectations outperforms for housing starts (0.502) and the unemployment rate (1.000). The author cautions that these results do not imply forecasters could improve their forecasts in real time, because the MLE observations include contemporaneous individual and consensus forecast errors that are not known to forecasters when they issue forecasts; for the same reason, the results are “not inconsistent with” Eva and Winkler (2023) on the poor out-of-sample performance of error-predictability regressions.
Q10. Could the apparent advantage of misspecified expectations just reflect learning?
The author argues that learning about the data-generating process does not appear to drive the relative model rankings in favor of misspecified expectations, based on two exercises. First, using the full pre-COVID sample (1968Q4–2019Q4) over 25-year rolling windows (three-year roll), the misspecified model outperforms diagnostic expectations in six of ten sub-samples and all models in five of ten, while diagnostic expectations wins four of ten — patterns that “do not indicate that learning over time favors misspecified expectations.” Second, splitting forecasters by “age”/tenure (a proxy for experience), misspecified expectations outperforms the others among experienced (above-median age) forecasters (encompassing weight 0.766, with overconfidence 0.234) and is dominant among inexperienced ones (1.000). The author concedes learning “is likely reflected in professional forecasts” but does not appear to drive the rankings.
Q11. What additional moments does misspecified expectations match?
Beyond overall fit, the author shows in the appendix that misspecified expectations matches five features of the data — overreaction, underreaction, overshooting, persistent disagreement, and updating behavior — and is the only model generating delayed overshooting. All three non-rational models generate individual-level overreaction (Bordalo et al., 2020 errors-on-revisions regression) and aggregate underreaction (Coibion-Gorodnichenko, 2015 consensus regression). But when simulating impulse responses, “only the misspecified expectations model generates a sign switch in the forecast error,” indicating delayed overshooting (Angeletos et al., 2020). The author reports “stronger evidence” favoring misspecified expectations on two further moments: it better generates persistent disagreement across horizons, and it better matches the relative weights forecasters place on priors versus news — because its bias also enters the prediction equation (not just the update equation), producing longer-lived errors.
Q12. What are the scope conditions and limitations the author stresses?
The author emphasizes that the results are specific to the context of professional forecasting and that the relative model rankings “may be different” for household or firm expectations, or for micro-level expectations rather than aggregate forecasts. He notes professional forecasters are arguably the most well-informed agents, so the literature has treated their predictions as informative about a lower bound on economy-wide information frictions and biases. The paper abstracts away from learning in the model setup and from theories that generate only underreaction. Models excluded from the comparison (e.g., imperfect memory, multi-frequency forecasting, asymmetric attention, learning) are set aside mainly because they cannot be flexibly nested into the common setting and would introduce additional parameters posing identification challenges.
Q13. What does the author conclude and recommend?
Ortiz concludes that misspecified expectations “can serve as a suitable approach” / useful benchmark to model expectation formation among professional forecasters for a variety of macroeconomic aggregates, while framing this as only “a partial answer” to the search for a non-FIRE benchmark. He highlights a practical advantage: embedding this form of misspecified expectations into a quantitative model “only requires introducing two parameters into an otherwise standard model.” He also notes misspecification can arise either from a behavioral bias or because adopting parsimonious forecasting models is optimal (Branch and Evans, 2006; Pfajfar, 2013). A promising avenue for future research is whether evidence favors misspecified expectations in other settings.
Key concepts
- Full-information rational expectations (FIRE)
- The benchmark in which forecast errors are uncorrelated with any information in the forecaster’s time-t information set; the orthogonality conditions it implies “tend to be violated in the data,” motivating non-FIRE models.
- Misspecified expectations
- The paper’s focal bias — the true state follows an AR(2) process, xₜ = ρ₁xₜ₋₁ + ρ₂xₜ₋₂ + wₜ, but forecasters treat it as an AR(1), xₜ = ρ̂xₜ₋₁ + uₜ, misperceiving its persistence; forecasters retain the correct information structure. The bias enters both the predict and update equations.
- Persistence bias (ρ̂ − ρ₁)
- The gap between perceived AR(1) persistence and true first-order autocorrelation; positive values generate overextrapolation/overreaction, negative values generate underreaction, and its overreaction scope is larger when ρ₁ is low.
- Overconfident expectations
- Forecasters misperceive their private signal noise as smaller (σ̃_v = α_v σ_v, α_v ∈ [0,1]) than it truly is, placing excessive weight on new private information.
- Diagnostic expectations
- A representativeness-based distortion (Bordalo et al., 2020; Gennaioli-Shleifer, 2010) in which, with diagnosticity ϕ > 0, forecasters overweight outcomes representative relative to a “no news” reference scenario, generating overreaction to recent news.
- Encompassing weight
- The model-comparison metric — a weight wₖ from a constrained linear regression of realized one-quarter-ahead values on competing models’ forecasts, with weights summing to one; a larger weight indicates a better-fitting model.
- Delayed overshooting
- The Angeletos et al. (2020) pattern of initial underreaction followed by later overreaction to a shock; in this paper, only misspecified expectations produces the sign switch in the forecast-error impulse response that signals it.
- Overreaction vs. underreaction
- Individual-level overreaction is measured via the Bordalo et al. (2020) errors-on-revisions regression; aggregate/consensus-level underreaction via the Coibion-Gorodnichenko (2015) regression — the data exhibit both, and a successful non-FIRE model must reproduce both.