C14 | Macro Paper Warehouse

Consistent Evidence on Duration Dependence of Price Changes

Mon, 01 Jan 0001 00:00:00 +0000

Layer 1 — Overview

Research Question. This paper asks two related questions. First, can one develop a robust, distribution-free estimator for the discrete-time mixed proportional hazard (MPH) model of duration with unobserved heterogeneity? Second, what does that estimator reveal about the shape of the hazard of price changes, the role of heterogeneity in shaping aggregate price dynamics, and the distinction between regular price changes and sales?

Methodology. The authors develop a linear generalized method of moments (GMM) estimator for the discrete-time MPH model, building on identification results in Honoré (1993). The model specifies that the probability a price spell ends at duration t, conditional on surviving to t, equals the product of a product-specific frailty parameter θ (unobserved, fixed over time) and a common baseline hazard bt. The estimator exploits repeated price spells per product via moment conditions that are linear in bt, making estimation and inference straightforward. It accommodates right- and left-censored data, competing risks, and spell-specific observable characteristics, without requiring any parametric assumption on the frailty distribution. The estimator is consistent as the number of products grows, even with a short time dimension. A Hansen-Sargan J-test of overidentifying restrictions and a test of the monotone-average-type prediction are also developed.

The estimator is applied to two datasets: (1) IRI weekly store data (2001–2011), covering 30 product categories and more than 21 million products, yielding 684,919,778 pairs of durations; and (2) Online Micro Price data from Cavallo (2018), comprising approximately 250,000 products at daily frequency.

Main Findings with Quantitative Magnitudes.

Baseline hazard and heterogeneity. In the pooled IRI data, the Kaplan-Meier hazard is steeply declining throughout the entire range from 2 to 60 weeks. In contrast, the estimated baseline hazard is roughly constant until week 4 and then declines only modestly, with a noticeable spike at week 52. The ratio of the Kaplan-Meier hazard to the baseline hazard — the average type, E[θ|t] — drops by approximately 60 percent within the first 20 weeks, and continues to decline, reaching roughly 0.3 of its initial value after one year. This decomposition reveals substantial unobserved heterogeneity that accounts for a large fraction of the observed decline in the Kaplan-Meier hazard.

Implications for structural models. The finding of a decreasing baseline hazard is inconsistent with canonical state-dependent pricing models (Golosov and Lucas, 2007), which predict an increasing hazard, conditional on a given firm’s type. The decreasing baseline hazard is instead broadly consistent with time-dependent pricing models, though not with a constant-hazard (Calvo, 1983) specification.

Monetary policy impulse response. In a calibrated time-dependent pricing model with strategic complementarity (α = 0, 0.5, 0.95), the aggregate price level dynamics in the estimated heterogeneous-firm MPH economy are close to those of a homogeneous-firm economy that uses the Kaplan-Meier hazard as the common price-change hazard. The homogeneous-firm approximation is substantially closer to the MPH economy than a Taylor (1979, 1980) staggered-contract economy with the same Kaplan-Meier hazard, particularly when strategic complementarity is strong (α = 0.95). The Calvo economy provides a poor approximation due to its exponential (constant-speed) price convergence structure.

Regular versus temporary price changes. Using the competing-risks extension with spell-specific observables — classifying spells by whether they start and end with a price increase (+) or decrease (−) — the authors separately estimate four baseline hazards. The baseline hazard for consecutive price increases (b++t) is relatively flat, especially for the first 6 weeks, then flat until week 45, with a spike near one year, consistent with price-plan models. The baseline hazard for reversals (particularly b−+t, price decreases followed by price increases, associated with sales) is steeply declining. The J-test statistics are substantially lower for price trends (J++ = 3,920; J−− = 3,401) than for reversals (J+− = 8,737; J−+ = 7,910), and markedly lower than the pooled-model J = 10,498, indicating that the MPH structure fits regular price changes considerably better than sales.

Scope Conditions. Results are conditional on weekly store-level price data for mostly packaged consumer goods (30 IRI product categories). The analysis focuses on price spells of at least 2 weeks to avoid spurious duration-one spells from mid-week price changes. The maximum duration examined is 60 weeks. The comparison of estimation methods relies on the IRI data only; the Online Micro Price data confirm weekly decision-making through a spike in the daily hazard every 7 days. Comparisons with maximum likelihood estimates show that GMM recovers more heterogeneity (average type declines to 0.37 at 6 months by GMM versus 0.48 by continuous-time MLE), and that time aggregation explains most of the discrepancy between the two methods.

In depth

Q1. What is the mixed proportional hazard (MPH) model as used in this paper, and what does the estimator identify?

A1. The MPH model specifies that the hazard that a price spell ends at duration t, conditional on surviving to t, equals θ·bt, where θ is a product-specific frailty parameter drawn from an unknown distribution G and bt is a baseline hazard common to all products. The estimator, which is linear in bt, identifies the baseline hazard up to a multiplicative constant using moment conditions derived from repeated spell data, without restricting the shape of the frailty distribution. Identification relies on comparing the joint survival probabilities of two consecutive spells for the same product and exploits the symmetry implied by the MPH structure across spells.

Q2. How does the Kaplan-Meier hazard relate to the baseline hazard, and what does this relationship imply about heterogeneity?

A2. The paper proves that the Kaplan-Meier hazard Ht equals bt times E[θ|t], the mean frailty among spells surviving to duration t. Because higher-type products (those with a higher propensity to change prices) exit the pool of surviving spells earlier, E[θ|t] is strictly decreasing in t — a form of dynamic selection. The ratio Ht/bt, normalized to 1 at the start, falls to approximately 0.4 by week 20 in the pooled IRI data and to approximately 0.3 after one year, documenting that a large share of the decline in the Kaplan-Meier hazard reflects heterogeneity rather than structural negative duration dependence.

Q3. What does the estimated baseline hazard imply about structural models of price setting?

A3. A decreasing baseline hazard is inconsistent with the canonical state-dependent model of Golosov and Lucas (2007), in which a firm’s hazard of price change is increasing in the time since the last change, because larger deviations from the desired price accumulate with duration. The decreasing baseline hazard is instead consistent with time-dependent pricing models and with price-plan models where within-plan switches are costless. The mild spike at week 52 in the baseline hazard is consistent with Taylor-type annual pricing rules.

Q4. What is the approximate aggregation result for monetary policy, and how quantitatively accurate is it?

A4. In the time-dependent pricing model without strategic complementarity (α = 0), the impulse response of the aggregate price level to a monetary shock in a heterogeneous-firm economy is exactly the same as in a homogeneous-firm economy whose single firm uses the Kaplan-Meier survival function. This extends Carvalho and Schwartzman (2015) to an approximation in the case with strategic complementarity (α = 0.5 and α = 0.95). Numerically, the path of aggregate prices in the estimated MPH economy is close to that in the homogeneous-firm Kaplan-Meier economy, and substantially closer to it than to the Taylor-contract economy — the difference is most pronounced at horizons beyond about half a year when α = 0.95, where the Taylor economy shows notably slower initial convergence and faster later convergence relative to the MPH and homogeneous economies.

Q5. How do the paper’s results differ from those obtained using maximum likelihood estimation of the continuous-time MPH model?

A5. The GMM estimator recovers substantially more heterogeneity than maximum likelihood (MLE) applied to the continuous-time model with continuous records (assumed gamma frailty). The average type falls from 1 to 0.37 at six months under GMM, versus only 0.48 under MLE. The authors investigate two sources of this discrepancy: the assumed frailty distribution family (gamma) and time aggregation. They conclude that time aggregation is quantitatively more important in the IRI weekly data — that is, the continuous-time MLE approach fails to properly account for the discrete nature of the data-generating process, leading it to understate heterogeneity and recover a steeper baseline hazard.

Q6. How does the paper distinguish regular price changes from sales without directly observing a sales flag?

A6. The competing-risks extension classifies each spell by whether it starts with a price increase or decrease (observable characteristic χ ∈ {+, −}) and by whether it ends with a price increase or decrease (competing risk ρ ∈ {+, −}). Price trends — spells where the direction is the same at both the start and end (++ or −−) — are interpreted as regular price changes; price reversals (especially −+, i.e., price decrease followed by increase) are associated with sales. This approach is consistent with the statistical model used for estimation, avoids the bias from simply dropping suspected sales spells before estimation, and allows the MPH structure to hold only for the risks of interest even if it fails for others.

Q7. How well does the MPH model fit regular price changes versus sales?

A7. The J-test of overidentifying restrictions yields test statistics of J++ = 3,920 for consecutive price increases and J−− = 3,401 for consecutive price decreases, compared with J = 10,498 for the pooled model and J+− = 8,737 and J−+ = 7,910 for the reversal hazards. All rejections are at conventional significance levels (critical value 1,749 at 5%), but the rejection is substantially milder for price trends than for price reversals. For individual product categories, the model cannot be rejected for 8 categories (out of 30) for b++ and 21 categories for b−−, suggesting the MPH structure is a much better description of regular price changes than of sales.

Q8. What role do one-week price spells play in the data, and why are they excluded?

A8. In the IRI data, prices are measured as the ratio of weekly revenue to quantity, so a price change occurring mid-week generates a spurious price spell of duration one week. If all spells including one-week spells are retained, the autocorrelation of spell durations is only 0.029 in levels and even negative (−0.042) in logs, which is inconsistent with a mixture model. Once one-week spells are excluded, the autocorrelation rises to 0.235 in levels and 0.233 in logs, and is stable when two-week spells are also excluded (0.248 and 0.256). The paper therefore sets the lower duration bound at T̲ = 2 weeks.

Q9. What does the daily Online Micro Price data add relative to the weekly IRI data?

A9. The daily data reveal a sharp spike in the price-change hazard every seven days, suggesting that even when prices are observed daily, the decision to change prices is made at the weekly frequency. This justifies the use of a discrete-time model with a one-week period. The estimates from daily and weekly aggregations of the same data are broadly similar, though weekly data recovers somewhat less heterogeneity than daily data. Aggregating IRI weekly data to monthly frequency understates heterogeneity even more, confirming that frequency matters for measuring heterogeneity.

Q10. What are the computational advantages of the GMM estimator relative to maximum likelihood?

A10. Because the moment conditions are linear in the baseline hazard bt, the GMM estimator is obtained in closed form, making estimation fast and inference straightforward. On the pooled IRI sample, GMM estimation (including standard errors) required 70 minutes on a machine with 60 GB memory, whereas the maximum likelihood estimator required 15 hours on a machine with 256 GB memory and failed entirely on the 60 GB machine. The GMM approach also avoids the need to specify the frailty distribution family and guarantees a global solution (proved by the identification result), whereas the likelihood function is non-linear in bt and may have multiple local maxima.

Q11. What is the shape of the b++ baseline hazard for regular price increases, and what models does it support?

A11. The baseline hazard for spells starting and ending with a price increase (b++) is decreasing during the first 6 weeks — dropping by almost 50% — and then flat until approximately week 45, with a pronounced spike at around one year. This shape is consistent with price-plan models (Eichenbaum, Jaimovich, and Rebelo, 2011) with Calvo-type switching between plans, where within-plan changes are costless and the hazard of between-plan switching is approximately constant. The annual spike is consistent with Taylor-type pricing. Approximately 76.8% of complete spells starting after a price increase last at most 6 weeks.

Key Concepts

Baseline hazard (bt). The component of the MPH hazard that is common to all products and may vary arbitrarily with elapsed duration t. It represents structural duration dependence — the tendency for a given product to be more or less likely to change price as a function of how long its current spell has lasted — net of heterogeneity. It is identified only up to a multiplicative constant.

Frailty parameter (θ) / frailty distribution (G). The product-specific scaling factor in the MPH model, fixed over all spells for a given product, that captures permanent unobserved differences in price-change frequency across products. The paper treats G as a nuisance parameter and does not require a parametric assumption on its shape. A higher θ means the product has a higher baseline propensity to change its price.

Average type (E[θ|t]). The mean frailty parameter among spells that have survived to at least duration t. Because high-type products change price earlier and exit the pool of surviving spells first, the average type is provably strictly decreasing in t under the MPH model. It is measured as the ratio of the Kaplan-Meier hazard to the baseline hazard, and its rate of decline measures the importance of dynamic selection.

Kaplan-Meier hazard (Ht). The probability that a randomly drawn spell ends at duration t, conditional on having lasted at least t periods. It mixes together structural duration dependence (captured by bt) and dynamic selection (captured by changes in the average type). It can be estimated without imposing the MPH structure, requiring only stationarity of the duration process.

Competing risks. The framework in which a price spell can end for multiple distinct reasons — here, ending with a price increase or a price decrease — each with its own hazard function. The paper’s GMM approach allows the MPH structure to hold for only a subset of risks and observables, without imposing any structure on the remaining risks.

Price trends vs. price reversals. A classification of spells based on the direction of the surrounding price changes. Price trends are spells where the direction of the price change at the start and end of the spell is the same (++ or −−), interpreted as regular price changes. Price reversals are spells where the direction switches (e.g., −+, a price decrease followed by a price increase), associated with sales and other temporary price changes.

Strategic complementarity in pricing (α). The degree to which a firm’s target price responds to the average price set by other firms. Parameterized by α ∈ [0, 1), where α = 0 yields the exact aggregation result (only the Kaplan-Meier hazard matters) and higher α increases aggregate price stickiness by making firms reluctant to deviate from the average price when few others are adjusting.

Dynamic selection. The mechanism by which the composition of the pool of surviving price spells shifts toward lower-type (more price-sticky) products as duration increases, because higher-type products change price sooner and exit the pool. This is the source of the gap between the steeply declining Kaplan-Meier hazard and the more modestly declining baseline hazard.

The Effect of Omitted Variables on the Sign of Regression Coefficients

Mon, 01 Jan 0001 00:00:00 +0000

Masten and Poirier demonstrate a previously unrecognized asymmetry in the coefficient stability literature: depending on how omitted variable bias is measured, it can be substantially easier for omitted variables to flip a regression coefficient’s sign than to drive it to zero. The paper focuses specifically on Oster (2019b), a widely used robustness framework with approximately 5,500 Google Scholar citations as of December 2025, and shows that Oster’s sensitivity parameter δ — commonly interpreted as the ratio of selection on unobservables to selection on observables — exhibits a structural problem when used to assess sign robustness.

The core theoretical result (Theorem 2) is that, in Oster’s sensitivity analysis, the sign change breakdown point is bounded above by 1 for any value of R²_long. Since researchers typically treat |δ| = 1 as the cutoff for a robust result, this implies that no empirical result is robust to sign changes under Oster’s framework, even when the explain away breakdown point is far larger than 1. The mechanism is a vertical asymptote in the identified set for βlong that occurs precisely at δ = 1, arising from near multicollinearity between the treatment X and the covariates. At this asymptote, the bias-adjusted estimand becomes discontinuous: βlong can jump from a positive to a negative value as δ crosses 1, even when δ is changed by a negligible amount.

The paper illustrates this with the bias-adjusted estimand formula. Under Oster’s Proposition 1 (which requires δ = 1 plus an auxiliary proportionality assumption), the point estimate for the social capital application is 0.532. But if δ = 1 without the auxiliary assumption, the identified set becomes {−0.0855, 1.8947}. For δ = 0.99, the identified set includes {−18.66, −0.0868, 1.736}. The baseline OLS estimate is 0.17, and the explain away breakdown point (correct) is −32.0, while the sign change breakdown point is only 0.586 — well below the conventional robustness threshold of 1.

The authors propose a modified robustness measure that adds Assumption A5: an explicit bound M on the magnitude of omitted variable bias (|βlong − βmed| ≤ M). Under this restriction, the sign change breakdown point can exceed 1, making robust sign conclusions possible. The choice of M requires substantive justification by the researcher.

Two meta-analyses covering 58 empirical papers document the practical extent of the problem. For papers published in top-five journals from 2019–2021 that cite Oster (2019), the median explain away breakdown point is 2.65, while the median sign change breakdown point (with M = 10|β̂med|) is 1.15 and without the M restriction is 0.96. At the 90th percentile, the explain away point is 13.22, while the sign change point (M = 10|β̂med|) is only 1.66. Across both meta-analytic samples, more than 50% of regressions require that the sign of βlong must be assumed a priori in order to interpret the explain away breakdown point as evidence of sign robustness.

Scope conditions: The results apply specifically to Oster’s linear regression coefficient stability framework under the assumption of exogenous controls (cov(W1, W2) = 0, Assumption A4). The authors note this exogeneity assumption is strong in many applications. The paper does not claim the results extend to other sensitivity analysis frameworks (e.g., Cinelli and Hazlett 2020). The methods are implemented in the companion Stata module regsensitivity.

Q: What is the central finding of the paper?

A: The sign change breakdown point for Oster’s δ is bounded above by 1 (Theorem 2), regardless of how large the explain away breakdown point is. Since |δ| = 1 is the conventional robustness threshold, this implies that, under Oster’s framework, no result is ever robust to a sign change. The explain away breakdown point can simultaneously be very large — e.g., −32.0 in the social capital application — while the sign change breakdown point is only 0.586.

Q: What are the two kinds of breakdown points the paper distinguishes?

A: The explain away breakdown point answers: what is the smallest |δ| required for the data to be consistent with a zero causal effect? The sign change breakdown point answers: what is the smallest |δ| required for the data to be consistent with a causal effect of opposite sign? These two quantities are often equal but are not generally equivalent, and the sign change breakdown point can be strictly smaller than the explain away breakdown point.

Q: What is the mechanism behind the sign change breakdown point being bounded above by 1?

A: The identified set for βlong has a vertical asymptote precisely at δ = 1, arising because the sensitivity analysis allows treatment X and the covariates (W1, W2) to approach near multicollinearity. Near this asymptote, omitted variable bias can be arbitrarily large while δ remains close to 1. This discontinuity allows the bias-adjusted estimand to jump across zero — changing sign — even as δ is changed by an infinitesimal amount near 1.

Q: How sensitive is Oster’s bias-adjusted point estimator near δ = 1?

A: Extremely sensitive. In the social capital application, Oster’s Proposition 1 formula (which assumes δ = 1 with the auxiliary proportionality condition) yields an estimate of 0.532. But without the auxiliary assumption, at δ = 1 the identified set is {−0.0855, 1.8947}; at δ = 0.99 it includes {−18.66, −0.0868, 1.736}; at δ = 1.01 it includes {−0.0843, 2.133, 15.64}. These are not minor perturbations — the estimand is discontinuous in δ at the value that Oster’s formula evaluates it.

Q: What modification do the authors propose to recover sign robustness?

A: They propose adding Assumption A5, which bounds the magnitude of omitted variable bias: |βlong − βmed| ≤ M for a researcher-specified M ≥ 0. Under this restriction, the identified set BI(δ, R²_long, M) is intersected with [βmed − M, βmed + M], and it becomes possible for the sign change breakdown point to exceed 1. The practical difficulty is that M must be chosen with substantive justification, and the authors show via meta-analysis that the conventional choice M = |βmed| (equivalent to assuming the sign of βlong is already known) applies to more than 50% of regressions in their sample.

Q: What do the meta-analyses show about the gap between explain away and sign change breakdown points in practice?

A: For 34 primary regressions from top-five journal papers (2019–2021) with R²_long = 1, the median explain away breakdown point is 2.65 while the median sign change breakdown point (M = 10|β̂med|) is 1.15 and without the M restriction is 0.96. At the 90th percentile, the explain away point is 13.22 versus a sign change point (M = 10|β̂med|) of only 1.66. The second meta-analysis (141 regressions from 55 papers, 2008–2013) produces qualitatively similar results.

Q: Why does the paper flag the implicit sign assumption embedded in many applications of Oster’s method?

A: Using the explain away breakdown point as evidence of sign robustness implicitly requires that M = |βmed|, which is equivalent to constraining βlong ∈ [0, 2βmed] — that is, assuming the sign of βlong is the same as the sign of βmed. The paper shows (Table 4) that across both meta-analytic samples, more than 50% of regressions make this implicit sign assumption in order to interpret the explain away breakdown point as informative about sign robustness.

Q: What is δ, and what are its interpretive limitations?

A: δ is the ratio of (cov(X, γ′2,long W2)/var(γ′2,long W2)) to (cov(X, γ′1,long W1)/var(γ′1,long W1)), measuring the relative magnitude of selection on unobservables versus observables. As Cinelli and Hazlett (2020) show, it is a double ratio: the ratio of the treatment-unobservable association to the treatment-observable association, divided by the ratio of their outcome effects. This double-ratio structure leads to counter-intuitive behavior: a single omitted variable that is only modestly related to treatment can produce δ values far from 1 if the observable control is also only weakly related to treatment, even if the omitted variable is not strongly confounding in an absolute sense.

Q: What assumption is required for the entire sensitivity analysis framework, and how restrictive is it?

A: Assumption A4 requires that all observed covariates W1 are uncorrelated with all unobserved covariates W2 (exogenous controls). The authors note this is a strong assumption in many empirical settings. A companion paper (Diegert, Masten, and Poirier 2025a) addresses the case where controls are endogenous.

Q: What do the authors recommend as best practice?

A: They recommend two practices: (1) plotting the full estimated identified set for the coefficient of interest across a range of assumptions about omitted variables, rather than relying on a single bias-adjusted point estimate; and (2) reporting sign change breakdown points as robustness summary statistics in addition to (or instead of) explain away breakdown points. Both are implemented in the companion Stata module regsensitivity.

Explain Away Breakdown Point: The smallest value of the sensitivity parameter |δ| required for the data to be consistent with a zero causal effect (βlong = 0). This is the quantity computed by Oster’s Proposition 2 and commonly reported as “Oster’s delta.”

Sign Change Breakdown Point: The smallest value of |δ| required for the data to be consistent with a causal effect of opposite sign from the baseline estimate. The paper proves this is bounded above by 1 in Oster’s framework, regardless of the magnitude of the explain away breakdown point.

Oster’s δ: The ratio of the regression of treatment X on the omitted variable index (γ′2,long W2) to the regression of X on the observed covariate index (γ′1,long W1), measuring relative selection on unobservables versus observables. Interpreted as a double ratio: (treatment-unobservable association / treatment-observable association) ÷ (outcome effect of unobservable index / outcome effect of observable index).

Identified Set BI(δ, R²_long): The set of values of βlong consistent with the observed data and a given value of δ and R²_long. Characterized as roots of a cubic polynomial. Has a vertical asymptote at δ = 1, meaning the set can include arbitrarily large or small values of βlong as δ approaches 1.

Bias Magnitude Restriction (Assumption A5): A bound M ≥ 0 on the magnitude of omitted variable bias: |βlong − βmed| ≤ M. Adding this assumption intersects the identified set with [βmed − M, βmed + M], allowing the sign change breakdown point to potentially exceed 1 and making sign robustness conclusions possible.

Coefficient Stability Analysis: A class of empirical methods that assess omitted variable bias by comparing regression coefficients across specifications that include different sets of covariates. The intuition is that if adding observed controls substantially raises R² but barely moves the coefficient, further omitted variable bias is likely small. Formalized by Altonji, Elder, and Taber (2005) and extended by Oster (2019b).

Near Multicollinearity (in this context): The situation in which treatment X and the combined covariate vector (W1, W2) are nearly collinear. In Oster’s framework, this arises precisely at δ = 1 and produces the vertical asymptote in the identified set, making the bias-adjusted estimand discontinuous and potentially unbounded near this value.